Data Platform Documentation
This section covers the data platform components, including DataHub, data quality, and analytics infrastructure.
Overview
The Fawkes data platform provides centralized data cataloging, lineage tracking, quality monitoring, and analytics capabilities. It enables teams to understand their data assets, trace data flows end-to-end, and enforce quality standards automatically across pipelines.
The platform is built on open-source components selected for their extensibility and integration with the broader Fawkes toolchain. All data platform components are deployed via ArgoCD, managed through Helm charts in charts/, and configured declaratively in platform/apps/.
Data Catalog
DataHub
DataHub is the primary metadata and data catalog tool. It ingests schema, lineage, and ownership metadata from databases, pipelines, and APIs.
- DataHub Deployment Summary - DataHub setup and configuration
- DataHub Ingestion Summary - Data ingestion pipelines
- Hasura Quick Start - GraphQL API for data access
Data Quality
Data quality is enforced through automated validation rules that run in CI and as scheduled jobs in Kubernetes.
- Great Expectations Implementation - Data quality validation
APIs
- GraphQL API Implementation - GraphQL API for data queries
Related Documentation
- Architecture Overview - Platform architecture
- Implementation Summaries - Technical details
- How-To Guides - Step-by-step guides