Match tuning is the iterative process of developing match rules while minimizing bad matches, maximizing automation, and preserving system performance. Match tuning is performed using a three-step process, that I call the match tuning life cycle. In my last blog we covered data profiling and analysis in order to lay the groundwork for our match rules. In this blog we will cover the next steps in the match tuning cycle: The best practices of design and implementation. In other words, it is time to start building match rules within Reltio Master Data Management.
Design and Implementation
Working from the foundation of your data profiling. Start by designing a rule around each of your identifying attributes such as SSN, IDs, Email, Phone, Names and Addresses. In the console match rule builder, create a potential match (aka suspect ) rule with an exact match on each of these to get a feel for how they perform on their own. Make sure to set the scope to ‘internal’ and turn relevance based matching off for now.
Now navigate to a few example records where your rule has been triggered. Go to the same as view and compare your record to its potential match. Take particular note of records that should not have matched and the attributes you used to determine they are not matches. You will use these to improve your match rules. In our example below we have two contacts named ‘Steve Roberts’ that matched on name, but have different addresses, phone numbers and emails. We could add these attributes to our name rule to improve them.
Best Practices for Rule Design
Always start with Suspect Rules
The two common rule types you will use in Reltio MDM are:
Automatic, as the name implies, automatically merge matches found by your rules. While you are still in the design phase you will likely be getting false positives on your rules that you do not want to merge. Once merged, unmerging is a manual process.
The suspect type, also known as potential match, allows you to test your rules without any merging. In fact, your suspect pool will change every time you modify the rule and rebuild match tables. This allows for quick cycles of testing and prototyping.
Never use automatic merging until you are confident that your matches are accurate. Some rules should be left on suspect permanently, allowing your data stewards to process matches where they see fit.
Set useOvOnly = true By Default
Out of the box your rules will only consider the operational value (ov). This is the ideal setting as it limits the number of comparisons made keeping your performance as high as possible. Unless you have a particular need to consider the non winning values of some crosswalks useOvOnly should always be set to true.
Put all matching attributes in the label
If your match rule considers FirstName, LastName, Address, and SSN, make sure to place all four of these attributes in your label. This helps your data steward understand why a particular rule was triggered when reviewing the history or activity log of record.
Avoid negative rules when possible
Negative rules are rules that utilize the “not” operator. The “not” operator is costly for performance and can often be avoided. For example, if you have a rule comparing large organizations and a not operator on the Small Business attribute, you could replace it with an equals operator.
From
Not(Organization.SmallBussinessIndicator)
To
equals(Organization.SmallBussinessIndicator, false)
Tokens
Tokenization is a shortcut for comparing one data set to another. It will reduce the set to likely matches, so that the user has a smaller set to compare. It does this by stringing together key attributes into a hash. When two records generate the same token Reltio considers them against the match rule to confirm.
For an example, consider a data set with 10,000 Smiths in the Last Name attribute, and LastName used as the token. Reltio would compare each Smith against one another. We could improve this by including FirstName in the token so that only records with matching First and Last names are considered.
ignoreInToken
There are some basic rules to keep in mind when it comes to ignoreIn Token. Always use ignoreIn Token when:
- Match rule uses the “notEquals” operator
- You are using “ExactOrNull” or “ExactAndAllNull” operators
- You are using the thresholdChars with the DistictWordsCompartor
- If your fuzzy comparator is generating too many tokens. We will dive deeper into this in the next blog
- You have similar match rules generating similar tokens
By following these guidelines, you will have a more effective match and merge practice.
Now that you know data profiling and analysis best practices, and have practiced designing and implementing your match rules, it’s time to tune them. Watch for another blog post covering the final slides of the webinar, Reltio MDM: Matching and Merging.
Check out other Relevant Content:
Match Tuning Best Practices: Data Profiling and Analysis