data-quality-frameworks

Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.

Author

Install

Hot:4

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=sickn33-skills-data-quality-frameworks&locale=en&source=copy

Data Quality Frameworks - Data Quality Validation and Testing Framework

Skill Overview


The Data Quality Frameworks skill provides production-grade data quality validation solutions based on Great Expectations, dbt tests, and data contracts, helping you establish a reliable quality assurance system within your data pipelines.

Applicable Scenarios

1. Data Pipeline Quality Checks


Implement data quality validation within ETL/ELT processes to ensure data integrity and accuracy as it moves from one stage to another. Supports automated checks and timely alerts when data anomalies occur.

2. Building a Data Quality Test Suite


Use dbt to build a comprehensive data testing system, including column-level, table-level, and cross-table validation rules, covering multiple quality dimensions such as completeness, uniqueness, data types, and value ranges.

3. Cross-Team Data Contract Management


Define and implement data contracts to clarify the quality responsibilities of data producers and consumers, establish SLA standards for data services, and reduce collaboration friction caused by data quality issues.

Core Features

Great Expectations Validation


Provides a flexible framework for defining and validating data expectations, supports automatic generation of data documentation, a rich library of validation rules, and extensible custom checkers, suitable for various data sources and formats.

dbt Test Integration


Embeds data quality tests directly into the data transformation workflow, enabling versioned management of tests alongside code, and supports lifecycle management for unit tests, integration tests, and data quality monitoring.

Data Contract Management


Defines clear data schemas and quality expectations, automatically generates contract documentation, and provides contract validation tools to ensure data services meet predefined quality standards and compatibility requirements.

Frequently Asked Questions

What is the difference between Great Expectations and dbt tests?


Great Expectations is an independent data validation framework that supports multiple data sources and a rich set of validation rules, making it suitable for data quality checks at various pipeline stages. dbt tests are built into the dbt transformation process and are better suited for testing the correctness of data models. The two can be used complementarily: use Great Expectations to validate data before it enters the data warehouse, and use dbt tests to validate after data transformations.

Will data quality checks affect pipeline performance?


Data quality checks incur some computational overhead, but you can balance performance and quality assurance through reasonable configuration. It is recommended to set mandatory checks for critical datasets and tables, use sampling for non-critical data, and schedule validation tasks during off-peak hours. You can also consider optimizations like incremental validation and only checking changed partitions.

When should data contracts be established?


You should consider establishing data contracts when multiple teams or services need to share data and data quality directly impacts downstream business. Typical scenarios include: a data platform providing data services to business teams, data exchange between different data teams, and guaranteeing the structure of API output data. Data contracts help reduce production incidents caused by schema changes or quality issues.