Skip to content

Conversation

@fastio
Copy link

@fastio fastio commented Feb 11, 2026

Does this PR closes an open issue or discussion?

No existing issue — this introduces a new integration crate.

What changes are included in this PR?

Add a new vortex-clickhouse crate that enables ClickHouse to natively read and write Vortex files through FORMAT Vortex. The crate compiles to a C static library (staticlib) and exposes an opaque-handle-based C FFI that ClickHouse links against via its IInputFormat / IOutputFormat framework.

Crate structure:

  • scan.rs — Read path. VortexScanner opens a Vortex file (local or remote via S3/GCS/Azure/HTTP), exposes schema metadata, supports column projection, and yields batches as VortexExporterHandle for zero-copy export into caller-owned buffers.
  • copy.rs — Write path. VortexWriter accepts batches via a bounded channel (32-batch buffer) and streams them to a Vortex file on a background async task, keeping memory usage bounded.
  • convert/ — Bidirectional type mapping between ClickHouse type strings and Vortex DType, plus column-level and scalar data conversion (VortexColumnBuilder).
  • exporter/ — Per-type data exporters (primitive, bool, string/varbinview, decimal, bigint, list, struct) that copy Vortex arrays into caller-owned C buffers with zero-copy where possible.
  • ext_types/ — ClickHouse-specific Vortex extension types: BigInt (Int128/UInt128/Int256/UInt256), UUID, IPv4/IPv6, Geo (Point/Ring/Polygon/...), Enum8/Enum16, DateTime/DateTime64, Date/Date32, FixedString(N), LowCardinality.
  • error.rs — Thread-local last-error pattern for C FFI error reporting.
  • cpp/include/ — C header files (clickhouse_vx.h) generated via cbindgen.

Supported ClickHouse types

Int8–Int256, UInt8–UInt256, Float32/64, Decimal32/64/128/256, String, FixedString(N), Bool, Date, Date32, DateTime, DateTime64, Array(T), Tuple(...), Map(K,V), Nullable(T), LowCardinality(T), Enum8/16, IPv4, IPv6, UUID, and Geo types.

Types without native Vortex equivalents are modeled as Vortex extension types with custom metadata, enabling lossless round-trip through the file format.

What is the rationale for this change?

ClickHouse is one of the most widely deployed analytical databases. Adding native Vortex format support allows ClickHouse users to directly query and produce Vortex files, benefiting from Vortex's adaptive encoding and compression without requiring format conversion pipelines.

The C FFI approach was chosen because:

ClickHouse's format system requires implementing C++ interfaces (IInputFormat, IOutputFormat), so a thin C++ shim calling into Rust via FFI is the natural integration point.
This follows the same pattern as other Rust integrations already in ClickHouse (e.g., BLAKE3, skim).
Opaque handles with _new/_free pairs provide a safe, simple ownership model across the language boundary.

How is this change tested?

225 unit tests covering:

Bidirectional type conversion for all supported ClickHouse types (primitives, strings, decimals, nested, nullable, extension types)
Column data construction and round-trip via VortexColumnBuilder
Extension type registration and metadata serialization
Scanner and writer FFI interface contracts
End-to-end file read/write cycle (e2e_test.rs)
All tests pass: cargo test -p vortex-clickhouse → 225 passed, 0 failed.

Are there any user-facing changes?

No breaking changes to existing APIs. This is a new, additive crate (vortex-clickhouse) with publish = false. It adds the crate to the workspace members and [workspace.dependencies] in the root Cargo.toml.

@robert3005
Copy link
Contributor

Can you please follow contributions guidelines? In particular bigger changes should start with a discussion https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md#contributing-to-vortex

@fastio
Copy link
Author

fastio commented Feb 11, 2026

Can you please follow contributions guidelines? In particular bigger changes should start with a discussion https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md#contributing-to-vortex

Thanks for the pointer! I should have started with a discussion first — my apologies for skipping that step.

I've opened a discussion here: #6425

Happy to wait for community feedback there before proceeding with the PR. I'll convert this PR to draft in the meantime.

@fastio fastio marked this pull request as draft February 11, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants