Data quality expressions #
An expression is required to determine the scope of a data
quality monitoring rule.
All functions can be connected using boolean operators, such as AND
, OR
, and
NOT
, using parentheses to indicate precedence. Functions based on historical
statistics are gathered using the SHOW STATS
query on the table. Expressions
are case insensitive.
Examples #
The following table contains examples of valid data quality rule expressions:
Expression | Description |
---|---|
row_count_min(5000) | There are a minimum of 5000 rows in the table. |
row_count_max(99999) | There are a maximum of 99999 rows in the table. |
row_count_range(5000, 99999) | The number of rows is between 5000 and 999999 . |
row_count_delta(1000) | Row count cannot vary by more than 1000 compared to previous row count. |
row_count_delta(0.05) | Row count cannot vary by more than 5% compared to previous row count. |
nulls_fraction_min("age", 0.2) | Column age minimum fraction of NULL values is 0.2 . |
nulls_fraction_max("age", 0.3) | Column age maximum fraction of NULL values is 0.3 . |
nulls_fraction_range("age", 0.1, 0.9) | Column age maximum fraction range of NULL values from 0.1 to 0.9 . |
nulls_fraction_rows_delta("age", 5000) | Column age row count multiplied by fraction of NULL values cannot vary by more than 5000 of previous such multiplication. |
nulls_fraction_delta("age", 0.2) | Column age fraction of NULL values cannot vary by more than 0.2 of previous such multiplication. |
nulls_fraction_rows_delta("age", 5000) | Column age row count multiplied by fraction of NULL values cannot vary by more than 5000 of previous such multiplication. |
low_value_min("age", 18) | Column age lowest value must be at least 18 . |
low_value_max("age", 34) | Column age lowest value must be less than or equal to 34 . |
low_value_range("saturation", 0.5, 0.99) | Column saturation lowest value must be between 0.5 and 0.99 . |
low_value_delta("age", 10) | Column age lowest value cannot vary by more than 10 from previous low value. |
high_value_min("age", 18) | Column age highest must be at least 18 . |
high_value_max("age", 34) | Column age highest value must be less than or equal to34 . |
high_value_range("saturation", 0.5, 0.99) | Column saturation highest value must be between 0.5 and 0.99 . |
high_value_delta("age", 10) | Column age highest value cannot vary by more than 10 from previous high value. |
distinct_values_count_min("age", 200) | Column age minimum count of distinct values is 200 . |
distinct_values_count_max("age", 9999) | Column age maximum count of distinct values is 9999 . |
distinct_values_count_range("age", 200, 9999) | Column age count of distinct values is between 200 and 9999 . |
distinct_values_count_delta("age", 5000) | Column age count of distinct values cannot vary by more than 5000 from previous distinct values count. |
data_size_min("age", 200) | Column age minimum data size is 200 . |
data_size_max("age", 99999) | Column age maximum data size is 9999 . |
data_size_range("age", 200, 9999) | Column age data size is between 200 and 9999 . |
data_size_delta("age", 5000) | Column age data size cannot vary by more than 5000 from previous data size. |
nulls_fraction_min(“temperature”, 0.6) OR (row_count_min(7) AND nulls_fraction_min(“humidity”, 0.3)) | Column temperature minimum fraction of NULL values is 0.6 or there are a minimum of 7 rows and column humidity minimum fraction of NULL values is 0.3 . |
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No