Reltio Integration Hub

 View Only

Beyond the Missing Pieces: Tracking Data Quality Completeness for Actionable Results

By Guy Vorster posted 08-21-2023 10:49

  

When thinking about the overall quality of data within your Reltio environment, one of the numerous aspects you need to consider is completeness.  This is a measure of whether the attributes that need to be populated within an entity for your particular downstream use case(s) are in fact populated.  This is certainly not the only factor to take into account when it comes to the overall quality of your data but it’s an important one.  If you’re not paying attention to the completeness of your data over time in your MDM platform, you could simply be passing a potential problem down the chain to a consuming system or process that could in turn negatively impact an employee experience or a customer experience or some other important stakeholder group.  Do that too often and those same people, groups and/or processes will start finding an alternate solution to their data needs, which never bodes well for the longevity of an MDM program.

This blog post will introduce you to a simple mechanism to calculate the completeness of data over time within any entity and then write the ongoing score to a downstream consumer such as Snowflake or Google sheets to track over time.  Alternatively, it could be used to simply send an email with the latest completeness metrics to your stakeholders.

What’s Involved:

The completeness reporting solution is developed using Reltio Integration Hub, the low-code / no-code integration utility that is part of your Reltio platform subscription. The solution is available in the Community Library by searching for Gather Reltio Completeness.   It involves three very simple recipes (shown below).  The primary recipe (Gather Reltio Completeness Metrics) can be set to gather the completeness statistics at any frequency you choose (for example daily, weekly, monthly etc).  Each iteration of the recipe in turn leverages two callable recipes namely GetCompletenessPercentage and SaveCompletenessPercentage.

The GetCompletenessPercentage recipe performs three tasks;

  1. It executes a query to get the total number of records for the entity in question.

  2. It then executes another query using the supplied filter condition to get the total number of records based on the filter condition.  For example consider the Location entity in Reltio, if my rule for a complete Location is that it must have Address Line 1, City, State and Country populated, then the filter condition for the Location entity would be exists(attributes.AddressLine1) and exists(attributes.City) and exists(attributes.StateProvince) and exists(attributes.Country)”

  3. It then divides the number of complete records by the total number of records to determine the completeness percentage which is passed back to the calling recipe.

The SaveCompletenessPercentage recipe has been designed as an abstraction mechanism to send the current completeness score for an entity type to a specified target.  Currently it can write to a Google sheet or to Snowflake or just send an email but you can enhance this to send the output to anything you want in order to be able to report on it over time.  The examples below show the output to Snowflake and Google Sheets. 

Note the Google Sheets connector is using the Entity type as the name of the sheet to write to.

Completeness Report Metadata CSV (Edit Per Your Needs):

The Gather Reltio Completeness Metrics recipe uses some metadata defined in the CSV Parser on step 4 of the recipe that you must edit per your needs.  Each entry must have an EntityType (the Reltio entity you want to calculate completeness on) and a FilterCondition (the filter condition that equates to your completeness check for that entity). Shown below is a screenshot of the current contents of the CSV Parser on Step 4.  Modify this per your needs.

Note you can use any filter condition supported by the Reltio API.  If you’re unsure about the filter condition for your particular needs, simply enable developer mode on your browser from the Reltio UI and perform your query via the advanced search section of the Reltio UI.  In the split screen, choose Network and then select Fetch/XHR.  You will see the various queries being executed.  Find the one showing _total and then look in the Payload tab to find the overall filter condition being sent into the API.  Note when you see the filter condition in the split browser, it will include the entity type.  You don’t need to include that portion in the filter condition you plug into the CSV Parser on step 4 because it will be added by default in the recipe. 

Below is an example of the output being written to a Google Sheet with a simple chart being used to visualize the completeness metrics over time for the specific entity.  The SaveCompletenessPercentage recipe will write a new row into the sheet each time it’s called with a target of GOOGLE SHEET.

Recipe Setup Instructions:

Please note this is not a product offering from Reltio and as such is an unsupported solution.  In order to leverage it, please follow the steps below and modify per your needs:

  1. Goto the community library and search for the recipe (Gather Reltio Completeness Metrics recipe) and import it. 

  2. Update the connection information for your Reltio tenant in RIH.

  3. Update the CSV Parser on Step 4 based on the entities you want to calculate completeness for and the specific Reltio query to perform the calculation.  Suggest just adding a single entity at first for testing purposes. 

  4. If you want to use Google Sheets as the target for the completeness metrics

    1. Copy the google sheet “Completeness_Report_Template” and modify the tab names per your needs. Link to the google sheet is here.

    2. Read the instructions in the Readme tab of the spreadsheet.

    3. Set the tab names in your Google sheet to match the entity names you want to track completeness on.  For example if you have an entity called Employee that you are tracking completeness for via the google sheet, then you need a tab called Employee.

    4. Ensure your Google sheet has one header row (Date and Completeness) and one data row (could be 2023-01-01 and 0 if you want but there must be 1 data row or the write to the sheet won’t work).

    5. Update the connection information for Google sheets in RIH (if you want to use google sheets as the output).

  5. If you want to use Snowflake as the target for the completeness metrics

    1. Create a table in Snowflake (create table YOUR_DB.YOUR_SCHEMA.completeness_metrics (EntityType string, CalculationDate date, CompletenessScore number);)

    2. Update the connection information for your Snowflake instance in RIH.

    3. Ensure the step that inserts a new row into Snowflake (step 9 in SaveCompletenessMetrics) is pointing at the right table and column names.

  6. If you want to use Email as the target for the completeness metrics

    1. In step 17 (the call to SaveCompletenessPercentage in Gather Completeness Metrics), set the email address you want to send the completeness score to in the parameter called “AdditionalInformation”.

  7. If you want the Gather Reltio Completeness Metrics recipe to send an email after successful completion or an email if something fails, then update the variables EmailOnSuccess and EmailOnFail in step 3 to the email addresses you want to use.

  8. Start the two callable recipes (correct any errors if they prevent a clean start).

  9. Test your recipe.  Suggest testing it at first with just a single entity type in your lookup table and then expand from there.

Conclusion:

The Gather Reltio Completeness Metrics recipe (and associated callable recipes) not only offers a simple but powerful way to gather the ongoing completeness score for any particular entity in your Reltio tenant but it also allows you to write the score to a target system in support of time series reporting on this important data quality metric.  If you would like to see the Data Quality Dashboard in Reltio enhanced to be able to perform this type of completeness analysis over time, then please vote for this idea at https://reltio.aha.io/ideas/ideas/MDM-I-1582.

I hope you find this recipe useful.  Comments welcome.


#dataquality
#Blog
0 comments
971 views

Permalink