BigQuery and ClickHouse: equivalent concepts
The tables below map each BigQuery concept to its ClickHouse equivalent. For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.
Resource hierarchy
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Organization | Organization | Root node of the hierarchy in both. |
| Project | Service (region-scoped); warehouse for grouping services with shared storage and independent compute | A service is the data-container analogue; use a warehouse to group services that share storage but scale compute independently. Billing, quotas, and tier are set at the ClickHouse organization level, not per service. |
| Dataset | Database | Same role — logical container for tables and grant scope. |
| IAM permissions | Console roles plus SQL grants | Two-layer access: roles in console.clickhouse.cloud plus SQL grants in the database. Console users can also be granted DB roles for SQL Console use. |
Compute and capacity
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Slot | Replica (whole node); queries can parallelize across replicas | A replica is a static compute node in a ClickHouse service. See the callout below. |
| Slot reservation | Vertical and horizontal autoscaling bounds; warehouses for workload isolation | Where BigQuery reservations guarantee slot capacity, ClickHouse sizes via min/max autoscaling bounds. Setting min = max effectively fixes the size. |
| Quotas | Workload classes plus per-query limits | Workload classes cover memory, CPU, concurrency, and I/O scheduling at runtime. Spend caps have no direct primitive; there's no setting that suspends a service on hitting a cost threshold. |
Unlike a BigQuery slot, a ClickHouse replica is a static compute node that serves many concurrent queries; it isn't allocated to a query from a pool. Replicas are also much coarser-grained: a slot is closer to a unit of CPU than to a whole node.
Billing and pricing model
ClickHouse Cloud bills compute as replica-hours (sized by the autoscaling bounds) rather than per-TB scanned or flat-rate slot commitments, storage as compressed bytes, and backups as a separate line item. There's no equivalent to BigQuery's logical-vs-physical storage choice. Both platforms charge for public internet egress and cross-region data transfer, and both offer committed-spend discounts. Managed ingestion (ClickPipes) is metered separately; other features (materialized view refresh, secondary index maintenance, dictionary loads) run on the service's compute. See ClickHouse Cloud pricing for current rates, tiers, and commitment options.
Storage and tables
In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and ORDER BY / PARTITION BY / TTL clauses configure physical layout and retention. Many BigQuery per-feature settings map to a clause in the ClickHouse CREATE TABLE statement. Physical schema design also differs between platforms; see the migration guide for design tradeoffs.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Table | MergeTree-family table | Engine choice determines storage and merge behavior; pick by access pattern (MergeTree for append-mostly facts, ReplacingMergeTree for upserts, AggregatingMergeTree for pre-aggregations). |
Column schema modes (NULLABLE, REQUIRED, REPEATED) | Nullable(T) for optional; omit for required; Array(T) for repeated; Array(Tuple(...)) or Nested for repeated records | In ClickHouse, columns are non-nullable unless wrapped with Nullable(T). Nullability has a small storage and query cost, so use it only when the column needs nulls. |
| Schema evolution (add / drop / modify columns) | ALTER TABLE ... ADD / DROP / MODIFY COLUMN | Same DDL surface as BigQuery. Many column changes are metadata-only. |
| Partitioning | PARTITION BY clause on the table | Where BigQuery limits partitioning to time-unit, integer-range, or ingestion-time columns, ClickHouse takes an arbitrary expression. Use it for retention (drop a partition) and pruning. |
| Clustering | ORDER BY columns in the table definition | Where BigQuery clustering is best-effort and reorganized in the background, ClickHouse's ORDER BY is enforced at insert time and drives the sparse primary index. |
| External tables / BigLake | s3 / gcs / azureBlobStorage table functions for direct file access; Iceberg engine for open catalogs | Object storage and open-table formats are read directly through these functions and engines. ClickHouse does not provide a unified-governance layer over external storage. |
| Object tables (SQL access to unstructured files) | s3 / gcs table functions over binary formats | Unstructured objects (images, PDFs, audio) are queried directly through these table functions, not via a dedicated table type. |
In-SQL ML inference on objects (ML.GENERATE_TEXT, ML.GENERATE_EMBEDDING) | — | No in-SQL LLM-on-objects equivalent; call hosted models from the application layer. |
| Apache Iceberg | Iceberg engine | Reads Iceberg tables stored in S3, Azure, HDFS, or local storage. Writes are not supported. See the engine page for the current list of supported features. |
| Default table / partition / dataset expiration | TTL clause on the table, column, or partition | TTL can be set at table creation or via ALTER TABLE ... MODIFY TTL, and applies at the table, column, or partition level rather than the dataset default. |
| Table snapshot | Service-level backup | Granularity differs significantly; see the callout below. |
| Time travel | Point-in-time backup restore into a new service | No inline historical query; restore is the only way to reach prior state. |
| Authorized views | View with SQL SECURITY DEFINER (runs with the view-owner's privileges) | See CREATE VIEW for the syntax and the INVOKER / DEFINER / NONE modes. |
| Row-level security | Row policy — a WHERE-style expression evaluated per user | Same role; the policy expression is attached per user (or role) rather than as a BigQuery RLS row access policy resource. |
Wildcard tables (_TABLE_SUFFIX) | Merge table engine (persistent grouping) or merge() function (inline) | Same idea, different syntax. Merge is a persistent table-of-tables; merge() is inline without creating one. |
| Table clone | CREATE TABLE ... AS SELECT copy, or backup restore into a new service | ClickHouse has no copy-on-write primitive; every copy reads the source data fully. |
ClickHouse Cloud backups are per-service. Restoring a backup creates a new service; a single table cannot be restored back into the original service.
ClickHouse is append-optimized. There's no SQL MERGE statement
(unrelated to the Merge and MergeTree engines), and
ALTER TABLE … UPDATE /
DELETE run as background mutations
rather than transactional row writes. BigQuery DML patterns (MERGE, UPDATE,
DELETE, dbt incremental updates) typically port to engine choice in ClickHouse:
ReplacingMergeTree
keeps the latest row by sort key, CollapsingMergeTree
marks deletes inline, and AggregatingMergeTree
maintains aggregated state. Engine choice is set at table creation and is
non-trivial to change later.
Query model and performance
Query acceleration in ClickHouse comes from three layers: primary-key ordering (a sparse index over the on-disk sort order), secondary indexes on non-key columns, and materialized views.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Primary key (advisory) | Primary key — drives the on-disk sort order and the sparse primary index | Where BigQuery's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit LIMIT. Neither system enforces uniqueness. |
| Foreign key (advisory) | Wide tables or dictionaries for lookups | ClickHouse doesn't accept foreign-key declarations even as advisory hints. |
| Search index | Full-text index | Token index over string columns. Operator surface differs: BigQuery's SEARCH() maps to the hasToken / hasAllTokens / hasAnyTokens / hasPhrase family in ClickHouse; LIKE and match can also use the index when the pattern is tokenizable (see the index doc for restrictions). |
| Vector index | Array(Float32) plus a vector ANN index | ClickHouse has no dedicated vector type. Embeddings are stored as Array(Float32) and accelerated with an ANN index for approximate nearest-neighbor lookups. |
| Materialized view | Incremental MV — updates on each insert into a base table | Cost is paid at insert time. BigQuery's auto-refreshed MVs map to this incremental model, not to refreshable MVs. |
| Scheduled query | Refreshable MV for query-driven scheduled work; external orchestrator (dbt, Airflow) for procedural pipelines | Refreshable MVs run on a cron-style schedule and maintain a result table. |
Streaming inserts (tabledata.insertAll, Storage Write API) | Asynchronous inserts for server-side batching of small row-by-row writes; Buffer engine for in-memory accumulation; ClickPipes for managed streaming pipelines | The raw API surface is INSERT over HTTP or the native protocol, but for the row-by-row semantics of BigQuery streaming inserts, async inserts are the closer fit, batching on the server. For deduplication, ClickHouse exposes block-level dedup on Replicated tables (insert_deduplicate, default-on, recent-blocks window) and user-supplied per-insert dedup via insert_deduplication_token; for row-level dedup, use ReplacingMergeTree. None of these reproduces the Storage Write API's full exactly-once-by-stream-offset model. |
| Continuous queries | Streaming table engine (Kafka, NATS, RabbitMQ) feeding a materialized view that writes to a destination table; ClickPipes for managed streaming from Pub/Sub or Kinesis; an incremental MV on a regular table for inserts already landing in ClickHouse | "Always-on transform" maps to engine + MV rather than a single declarative resource. Pick by source: streaming queue → table engine or ClickPipes; ongoing inserts into an existing table → MV. |
| Dry run | EXPLAIN ESTIMATE — reports rows, parts, and marks the query would read; other EXPLAIN variants (PLAN, PIPELINE, SYNTAX) cover deeper plan inspection | Covers the plan-inspection role of BigQuery's dry run, not the cost-estimation role. ClickHouse billing isn't per-query, so there's no "bytes that will be billed" answer to return. |
| Federated queries (Spanner, Cloud SQL, AlloyDB, Bigtable) | External OLTP attached via database engine (PostgreSQL, MySQL, MongoDB, SQLite) | Distinct from external tables in object storage; these attach a live source so its tables are queryable directly. Bigtable has no ClickHouse database-engine equivalent. |
| Cached results | Query cache | ClickHouse's query cache lives in each replica's memory and is per-user by default; identical queries to different replicas don't share results. Not transactionally consistent. |
| Sessions / multi-statement queries | Per-statement execution; multi-step state managed in the client or an orchestrator | ClickHouse has no per-session variables or shared state. |
Security and governance
Authorized views and row-level security are listed under Storage and tables.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Policy tags / column-level access control | Column-level grants on specific columns of a table | Grants apply at the column level. BigQuery's centralized taxonomy/policy-tag governance has no direct equivalent. |
| Data masking | Views, row policies, or function-based transforms; see data masking patterns | No column-mask primitive; patterns are SQL-level. |
| Customer-managed encryption keys (CMEK) | CMEK on the service | Available on AWS (KMS) and GCP (Cloud KMS), with rotation and revocation. |
| AEAD / SQL-level encryption functions | Encryption functions (encrypt / decrypt) | Covers AES-128/256-CBC/GCM and AEAD modes. |
| Differential privacy | — | Not a managed feature. Applying noise in a UDF doesn't reproduce BigQuery's epsilon/delta-controlled privacy guarantees; for true DP, use an external library. |
| VPC Service Controls | Private connectivity — Private Service Connect (GCP), PrivateLink (AWS, Azure), and IP allowlists for ingress restriction | Restricts ingress; doesn't replicate VPC SC's data-exfiltration boundary. |
Data sharing
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Analytics Hub / data exchanges / listings | Read access to a shared database, or a dedicated service with consumer-specific row policies | ClickHouse has no in-product data marketplace; sharing uses standard access primitives. |
| Data clean rooms | Row policies and authorized views | No managed clean-room product. |
Operations and ecosystem
ClickHouse surfaces operational state through system.* tables (queries, sessions, replication, parts, metrics) and the cloud console; managed ingestion is handled by ClickPipes; ML, BI, and notebook workflows are typically handled in external systems that read from ClickHouse.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| BigQuery ML | External training and serving (notebooks, Spark, Vertex AI, feature stores) reading from ClickHouse; see AI/ML in Cloud for managed-side features | ClickHouse has no in-database ML. Use it as the analytical store and run training elsewhere. |
| BI Engine | Direct querying — no separate acceleration tier to provision | Sub-second BI latency comes from the storage engine itself; there's no in-memory cache layer to size or pay for separately. |
| OMNI / cross-cloud federated query | — | ClickHouse doesn't query in place across clouds. The closest pattern is one service per supported region with data staged into the target service before being queried. |
| Data sources / file formats | File-format and connector library | Managed connectors (ClickPipes) for sources like Kafka, Pub/Sub, MySQL, Postgres, and object storage; SQL table functions for ad-hoc reads of files in object storage. |
| Query jobs (ID, history, cancel) | system.query_log and system.processes for inspection; KILL QUERY to cancel | Same information, exposed through system tables instead of a job API. |
INFORMATION_SCHEMA | Native system.* tables for ClickHouse-specific detail, or the ANSI information_schema views for tool compatibility | Both surfaces available. |
| Data Transfer Service | ClickPipes — scheduled and streaming ingestion from SaaS, storage, and OLTP sources | ClickPipes is ClickHouse Cloud's managed connector platform; coverage spans streaming systems, OLTP sources, and object storage. |
| Audit logs (admin activity, data access, system events) | Cloud audit log for org and service admin events; system.query_log for data-access activity within the service | BigQuery's three audit streams collapse to two on ClickHouse Cloud: admin activity to the cloud audit log, query/data access to system.query_log. System-event telemetry is exposed through other system.* tables rather than a dedicated audit stream. |
| Change data capture ingestion | ClickPipes for Postgres, MySQL, or Kafka | Managed CDC from OLTP sources. |
| BigQuery Studio notebooks / BigQuery DataFrames | Jupyter with clickhouse-connect or another client library | No in-product notebook environment or pandas-compatible in-DB API; notebook-side libraries cover the same workflow. |
| Data Canvas | — | No drag-and-drop NL canvas. SQL Console covers ad-hoc query authoring; visual data prep happens in an external orchestrator. |
| Gemini in BigQuery (SQL generation, code completion) | ClickHouse Agents in the cloud console; Ask-AI in the docs | Agents (currently in beta) are conversational: natural-language queries against your data with tool calls and chat workflows. In-SQL LLM functions analogous to BQ.ML.GENERATE_TEXT are handled outside SQL today; check the Agents page for the current capability surface. |
| Knowledge Catalog / data lineage / data quality | system.* tables for metadata; external tools (dbt, DataHub) for lineage and quality | ClickHouse exposes metadata via system tables rather than a managed catalog product. |
| Cross-region replication / managed disaster recovery | Multi-AZ HA within a region (automatic); cross-region resiliency via external backups exported to a customer-owned bucket; see Data resiliency | ClickHouse Cloud has no automatic cross-region replication, failover, or active-active sync today. Cross-region DR is backup-based. |
Next steps
- For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.
- For the conceptual side-by-side, see Comparing ClickHouse Cloud and BigQuery.
- For loading data, see Loading data from BigQuery.