Understanding Platform Views
Learn how the normalization layer of CloudQuery Platform works underneath the hood.
Last updated
Was this helpful?
Learn how the normalization layer of CloudQuery Platform works underneath the hood.
Last updated
Was this helpful?
When a sync runs on CloudQuery Platform, the sync is configured (through use of a transformer) to prefix all output tables with `raw_`. As the sync progresses, a post-load transformer detects when tables have completed syncing, and starts copying the latest data to two tables: cloud_assets_historical
and cloud_assets_incremental
.
cloud_assets_historical
stores data for all non-incremental tables. It uses a ClickHouse ORDER BY
clause that allows stale records to be removed after a period of time. It does this through incorporation of _cq_sync_group_id
into the clause.
cloud_assets_incremental
stores dat for all incremental tables. As incremental tables can only add new data, the ORDER BY
clause does not include _cq_sync_group_id
.
Finally a view, cloud_assets
, creates a single unified view over these two underlying tables, creating a cross-cloud asset inventory. The view defines how to identify the latest snapshot, and ensures de-duplication through the addition of a FINAL
clause to all queries. The view is updated only once a table from a _cq_sync_group_id
is complete, guaranteeing consistency on a per-table level.
Table views are less complicated. As alluded to above, syncs first write to tables prefixed with raw
. A view, with the original table name, is created for every table. This view is defined in a way that it always points to the latest complete snapshot of the data. This ensures data stay consistent during syncs, and switches atomically, while still allowing records to be appended efficiently into ClickHouse.