Skip to main content
Skip to main content
Edit this page

BigQuery and ClickHouse: equivalent concepts

The tables below map each BigQuery concept to its ClickHouse equivalent. For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.

Resource hierarchy

BigQueryClickHouseNotes
OrganizationOrganizationRoot node of the hierarchy in both.
ProjectService (region-scoped); warehouse for grouping services with shared storage and independent computeA service is the data-container analogue; use a warehouse to group services that share storage but scale compute independently. Billing, quotas, and tier are set at the ClickHouse organization level, not per service.
DatasetDatabaseSame role — logical container for tables and grant scope.
IAM permissionsConsole roles plus SQL grantsTwo-layer access: roles in console.clickhouse.cloud plus SQL grants in the database. Console users can also be granted DB roles for SQL Console use.

Compute and capacity

BigQueryClickHouseNotes
SlotReplica (whole node); queries can parallelize across replicasA replica is a static compute node in a ClickHouse service. See the callout below.
Slot reservationVertical and horizontal autoscaling bounds; warehouses for workload isolationWhere BigQuery reservations guarantee slot capacity, ClickHouse sizes via min/max autoscaling bounds. Setting min = max effectively fixes the size.
QuotasWorkload classes plus per-query limitsWorkload classes cover memory, CPU, concurrency, and I/O scheduling at runtime. Spend caps have no direct primitive; there's no setting that suspends a service on hitting a cost threshold.
Slot vs replica

Unlike a BigQuery slot, a ClickHouse replica is a static compute node that serves many concurrent queries; it isn't allocated to a query from a pool. Replicas are also much coarser-grained: a slot is closer to a unit of CPU than to a whole node.

Billing and pricing model

ClickHouse Cloud bills compute as replica-hours (sized by the autoscaling bounds) rather than per-TB scanned or flat-rate slot commitments, storage as compressed bytes, and backups as a separate line item. There's no equivalent to BigQuery's logical-vs-physical storage choice. Both platforms charge for public internet egress and cross-region data transfer, and both offer committed-spend discounts. Managed ingestion (ClickPipes) is metered separately; other features (materialized view refresh, secondary index maintenance, dictionary loads) run on the service's compute. See ClickHouse Cloud pricing for current rates, tiers, and commitment options.

Storage and tables

In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and ORDER BY / PARTITION BY / TTL clauses configure physical layout and retention. Many BigQuery per-feature settings map to a clause in the ClickHouse CREATE TABLE statement. Physical schema design also differs between platforms; see the migration guide for design tradeoffs.

BigQueryClickHouseNotes
TableMergeTree-family tableEngine choice determines storage and merge behavior; pick by access pattern (MergeTree for append-mostly facts, ReplacingMergeTree for upserts, AggregatingMergeTree for pre-aggregations).
Column schema modes (NULLABLE, REQUIRED, REPEATED)Nullable(T) for optional; omit for required; Array(T) for repeated; Array(Tuple(...)) or Nested for repeated recordsIn ClickHouse, columns are non-nullable unless wrapped with Nullable(T). Nullability has a small storage and query cost, so use it only when the column needs nulls.
Schema evolution (add / drop / modify columns)ALTER TABLE ... ADD / DROP / MODIFY COLUMNSame DDL surface as BigQuery. Many column changes are metadata-only.
PartitioningPARTITION BY clause on the tableWhere BigQuery limits partitioning to time-unit, integer-range, or ingestion-time columns, ClickHouse takes an arbitrary expression. Use it for retention (drop a partition) and pruning.
ClusteringORDER BY columns in the table definitionWhere BigQuery clustering is best-effort and reorganized in the background, ClickHouse's ORDER BY is enforced at insert time and drives the sparse primary index.
External tables / BigLakes3 / gcs / azureBlobStorage table functions for direct file access; Iceberg engine for open catalogsObject storage and open-table formats are read directly through these functions and engines. ClickHouse does not provide a unified-governance layer over external storage.
Object tables (SQL access to unstructured files)s3 / gcs table functions over binary formatsUnstructured objects (images, PDFs, audio) are queried directly through these table functions, not via a dedicated table type.
In-SQL ML inference on objects (ML.GENERATE_TEXT, ML.GENERATE_EMBEDDING)No in-SQL LLM-on-objects equivalent; call hosted models from the application layer.
Apache IcebergIceberg engineReads Iceberg tables stored in S3, Azure, HDFS, or local storage. Writes are not supported. See the engine page for the current list of supported features.
Default table / partition / dataset expirationTTL clause on the table, column, or partitionTTL can be set at table creation or via ALTER TABLE ... MODIFY TTL, and applies at the table, column, or partition level rather than the dataset default.
Table snapshotService-level backupGranularity differs significantly; see the callout below.
Time travelPoint-in-time backup restore into a new serviceNo inline historical query; restore is the only way to reach prior state.
Authorized viewsView with SQL SECURITY DEFINER (runs with the view-owner's privileges)See CREATE VIEW for the syntax and the INVOKER / DEFINER / NONE modes.
Row-level securityRow policy — a WHERE-style expression evaluated per userSame role; the policy expression is attached per user (or role) rather than as a BigQuery RLS row access policy resource.
Wildcard tables (_TABLE_SUFFIX)Merge table engine (persistent grouping) or merge() function (inline)Same idea, different syntax. Merge is a persistent table-of-tables; merge() is inline without creating one.
Table cloneCREATE TABLE ... AS SELECT copy, or backup restore into a new serviceClickHouse has no copy-on-write primitive; every copy reads the source data fully.
Backups

ClickHouse Cloud backups are per-service. Restoring a backup creates a new service; a single table cannot be restored back into the original service.

Updates and deletes

ClickHouse is append-optimized. There's no SQL MERGE statement (unrelated to the Merge and MergeTree engines), and ALTER TABLE … UPDATE / DELETE run as background mutations rather than transactional row writes. BigQuery DML patterns (MERGE, UPDATE, DELETE, dbt incremental updates) typically port to engine choice in ClickHouse: ReplacingMergeTree keeps the latest row by sort key, CollapsingMergeTree marks deletes inline, and AggregatingMergeTree maintains aggregated state. Engine choice is set at table creation and is non-trivial to change later.

Query model and performance

Query acceleration in ClickHouse comes from three layers: primary-key ordering (a sparse index over the on-disk sort order), secondary indexes on non-key columns, and materialized views.

BigQueryClickHouseNotes
Primary key (advisory)Primary key — drives the on-disk sort order and the sparse primary indexWhere BigQuery's PK is advisory only, ClickHouse's PK is load-bearing — it determines physical layout and is used to prune granules, avoid re-sorts, and short-circuit LIMIT. Neither system enforces uniqueness.
Foreign key (advisory)Wide tables or dictionaries for lookupsClickHouse doesn't accept foreign-key declarations even as advisory hints.
Search indexFull-text indexToken index over string columns. Operator surface differs: BigQuery's SEARCH() maps to the hasToken / hasAllTokens / hasAnyTokens / hasPhrase family in ClickHouse; LIKE and match can also use the index when the pattern is tokenizable (see the index doc for restrictions).
Vector indexArray(Float32) plus a vector ANN indexClickHouse has no dedicated vector type. Embeddings are stored as Array(Float32) and accelerated with an ANN index for approximate nearest-neighbor lookups.
Materialized viewIncremental MV — updates on each insert into a base tableCost is paid at insert time. BigQuery's auto-refreshed MVs map to this incremental model, not to refreshable MVs.
Scheduled queryRefreshable MV for query-driven scheduled work; external orchestrator (dbt, Airflow) for procedural pipelinesRefreshable MVs run on a cron-style schedule and maintain a result table.
Streaming inserts (tabledata.insertAll, Storage Write API)Asynchronous inserts for server-side batching of small row-by-row writes; Buffer engine for in-memory accumulation; ClickPipes for managed streaming pipelinesThe raw API surface is INSERT over HTTP or the native protocol, but for the row-by-row semantics of BigQuery streaming inserts, async inserts are the closer fit, batching on the server. For deduplication, ClickHouse exposes block-level dedup on Replicated tables (insert_deduplicate, default-on, recent-blocks window) and user-supplied per-insert dedup via insert_deduplication_token; for row-level dedup, use ReplacingMergeTree. None of these reproduces the Storage Write API's full exactly-once-by-stream-offset model.
Continuous queriesStreaming table engine (Kafka, NATS, RabbitMQ) feeding a materialized view that writes to a destination table; ClickPipes for managed streaming from Pub/Sub or Kinesis; an incremental MV on a regular table for inserts already landing in ClickHouse"Always-on transform" maps to engine + MV rather than a single declarative resource. Pick by source: streaming queue → table engine or ClickPipes; ongoing inserts into an existing table → MV.
Dry runEXPLAIN ESTIMATE — reports rows, parts, and marks the query would read; other EXPLAIN variants (PLAN, PIPELINE, SYNTAX) cover deeper plan inspectionCovers the plan-inspection role of BigQuery's dry run, not the cost-estimation role. ClickHouse billing isn't per-query, so there's no "bytes that will be billed" answer to return.
Federated queries (Spanner, Cloud SQL, AlloyDB, Bigtable)External OLTP attached via database engine (PostgreSQL, MySQL, MongoDB, SQLite)Distinct from external tables in object storage; these attach a live source so its tables are queryable directly. Bigtable has no ClickHouse database-engine equivalent.
Cached resultsQuery cacheClickHouse's query cache lives in each replica's memory and is per-user by default; identical queries to different replicas don't share results. Not transactionally consistent.
Sessions / multi-statement queriesPer-statement execution; multi-step state managed in the client or an orchestratorClickHouse has no per-session variables or shared state.

Security and governance

Authorized views and row-level security are listed under Storage and tables.

BigQueryClickHouseNotes
Policy tags / column-level access controlColumn-level grants on specific columns of a tableGrants apply at the column level. BigQuery's centralized taxonomy/policy-tag governance has no direct equivalent.
Data maskingViews, row policies, or function-based transforms; see data masking patternsNo column-mask primitive; patterns are SQL-level.
Customer-managed encryption keys (CMEK)CMEK on the serviceAvailable on AWS (KMS) and GCP (Cloud KMS), with rotation and revocation.
AEAD / SQL-level encryption functionsEncryption functions (encrypt / decrypt)Covers AES-128/256-CBC/GCM and AEAD modes.
Differential privacyNot a managed feature. Applying noise in a UDF doesn't reproduce BigQuery's epsilon/delta-controlled privacy guarantees; for true DP, use an external library.
VPC Service ControlsPrivate connectivity — Private Service Connect (GCP), PrivateLink (AWS, Azure), and IP allowlists for ingress restrictionRestricts ingress; doesn't replicate VPC SC's data-exfiltration boundary.

Data sharing

BigQueryClickHouseNotes
Analytics Hub / data exchanges / listingsRead access to a shared database, or a dedicated service with consumer-specific row policiesClickHouse has no in-product data marketplace; sharing uses standard access primitives.
Data clean roomsRow policies and authorized viewsNo managed clean-room product.

Operations and ecosystem

ClickHouse surfaces operational state through system.* tables (queries, sessions, replication, parts, metrics) and the cloud console; managed ingestion is handled by ClickPipes; ML, BI, and notebook workflows are typically handled in external systems that read from ClickHouse.

BigQueryClickHouseNotes
BigQuery MLExternal training and serving (notebooks, Spark, Vertex AI, feature stores) reading from ClickHouse; see AI/ML in Cloud for managed-side featuresClickHouse has no in-database ML. Use it as the analytical store and run training elsewhere.
BI EngineDirect querying — no separate acceleration tier to provisionSub-second BI latency comes from the storage engine itself; there's no in-memory cache layer to size or pay for separately.
OMNI / cross-cloud federated queryClickHouse doesn't query in place across clouds. The closest pattern is one service per supported region with data staged into the target service before being queried.
Data sources / file formatsFile-format and connector libraryManaged connectors (ClickPipes) for sources like Kafka, Pub/Sub, MySQL, Postgres, and object storage; SQL table functions for ad-hoc reads of files in object storage.
Query jobs (ID, history, cancel)system.query_log and system.processes for inspection; KILL QUERY to cancelSame information, exposed through system tables instead of a job API.
INFORMATION_SCHEMANative system.* tables for ClickHouse-specific detail, or the ANSI information_schema views for tool compatibilityBoth surfaces available.
Data Transfer ServiceClickPipes — scheduled and streaming ingestion from SaaS, storage, and OLTP sourcesClickPipes is ClickHouse Cloud's managed connector platform; coverage spans streaming systems, OLTP sources, and object storage.
Audit logs (admin activity, data access, system events)Cloud audit log for org and service admin events; system.query_log for data-access activity within the serviceBigQuery's three audit streams collapse to two on ClickHouse Cloud: admin activity to the cloud audit log, query/data access to system.query_log. System-event telemetry is exposed through other system.* tables rather than a dedicated audit stream.
Change data capture ingestionClickPipes for Postgres, MySQL, or KafkaManaged CDC from OLTP sources.
BigQuery Studio notebooks / BigQuery DataFramesJupyter with clickhouse-connect or another client libraryNo in-product notebook environment or pandas-compatible in-DB API; notebook-side libraries cover the same workflow.
Data CanvasNo drag-and-drop NL canvas. SQL Console covers ad-hoc query authoring; visual data prep happens in an external orchestrator.
Gemini in BigQuery (SQL generation, code completion)ClickHouse Agents in the cloud console; Ask-AI in the docsAgents (currently in beta) are conversational: natural-language queries against your data with tool calls and chat workflows. In-SQL LLM functions analogous to BQ.ML.GENERATE_TEXT are handled outside SQL today; check the Agents page for the current capability surface.
Knowledge Catalog / data lineage / data qualitysystem.* tables for metadata; external tools (dbt, DataHub) for lineage and qualityClickHouse exposes metadata via system tables rather than a managed catalog product.
Cross-region replication / managed disaster recoveryMulti-AZ HA within a region (automatic); cross-region resiliency via external backups exported to a customer-owned bucket; see Data resiliencyClickHouse Cloud has no automatic cross-region replication, failover, or active-active sync today. Cross-region DR is backup-based.

Next steps