About the example testĪt Qxf2, I have been writing data quality tests using Great Expectations for one of the projects. For the condition, I choose exactly equal to previous and for the batch of metadata, I choose daily data. I have tried to make step b) and c) trivial in my example so you can play along easily with this post. To implement similar checks on metadata in your project, you should think of the following:Ī) Metadata you want to track (e.g.: collection times, row count, etc.)ī) Condition/trend you want to check for (e.g.: within a certain range, exactly equal to yesterday, less on weekend, etc.)Ĭ) Batch of data you want step b) working upon (e.g.: monthly metadata, daily metadata, etc.) ![]() If you are planning on implementing such tests … In this post, I will show you how to implement one of these checks using Great Expectations. Tracking metadata about the data collected and then checking them for a range ends up being a super-useful check to have in place. As testers, we know something went wrong, because we ended up collecting lesser (or no) data that day. This can happen due to many reasons like the data source API changing, the format of the data being collected changing, modified identifiers in the case of scraping, etc. For example, if you collect data from external sources everyday as part of your MLOps or DataOps lifecycle, you are bound to have faced problems with incomplete data or partially collected data. metadata) is often an intuitive heuristic for testers to spot problems in data collection. Looking at trends in data about the data collected (i.e. You can apply it to any other SQL database supported by Great Expectations. For the storage backend, I am going to use Postgres SQL database that I have setup locally. Ensure that you are setup with a Great Expectations deployment and have a FileBased Datasource that can read the CSV.ģ. To show the table row count test, I will be using the same real-world scenario use case described in my first blog post – Data validation using Great Expectations using real example. I have covered the basics of using this framework in detail.Ģ. If you are new to this, do check out my previous blog posts. This post requires familiarity with Great Expectations. Read on to understand how keeping track of a simple yet useful test like daily row count of a table provides valuable insights for further analysis. This post is fifth in our series to help testers implement useful tests with Great Expectations for data validation. I will help you write a test to keep track of table row count using Great Expectations. These help us track data about our data collection. ![]() In this blog post, we will explore the metric store and evaluation parameters feature of Great Expectations.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |