Skip to content

Data Platform Documentation

This section covers the data platform components, including DataHub, data quality, and analytics infrastructure.

Overview

The Fawkes data platform provides centralized data cataloging, lineage tracking, quality monitoring, and analytics capabilities. It enables teams to understand their data assets, trace data flows end-to-end, and enforce quality standards automatically across pipelines.

The platform is built on open-source components selected for their extensibility and integration with the broader Fawkes toolchain. All data platform components are deployed via ArgoCD, managed through Helm charts in charts/, and configured declaratively in platform/apps/.

Data Catalog

DataHub

DataHub is the primary metadata and data catalog tool. It ingests schema, lineage, and ownership metadata from databases, pipelines, and APIs.

Data Quality

Data quality is enforced through automated validation rules that run in CI and as scheduled jobs in Kubernetes.

APIs