In the Reltio webinar Rule-Based Matching: A Deep Dive, we focus on discussing the details of how matching works, and how it builds from internal components.
This topic covers matching at a high level. It also covers match rule anatomy, or, the different components of match rules and how users should write them.
In further blogs, we will cover the webinar’s discussion on the matching process, matching behind the scenes, tokenization, comparison, and examples of proper configuration.
Building the Background: Match Rule Best Practices
Before diving into the anatomy of a good match, it’s worth summarizing a previous webinar on match tuning best practices. In this webinar, we discussed the match rule life cycle and its three stages; data analysis and profiling, design and implementation, and testing and tuning.
In data analysis, match attributes can be selected based on what results were found while profiling the data. Uniqueness, completeness, and cardinality of the attributes should drive the configuration for the best possible match outcome. In regards to design and implementation, we discussed tokenization, optimal token intersection and token counts. We also touched on IgnoreInToken, and how that affects the outcome of your data performance. In the last section, testing and tuning, we covered the use of verified API for troubleshooting, as well as match rule analysis for verification, correctness, and completeness of the configuration.
Now, we are digging deeper into these concepts, specifically APIs that you can use to grasp exactly what is going on in the background of Reltio. We also aim to answer the following questions:
- Is there any performance implication of having a redundant match rule?
- Do we rebuild only when the match rules are changed?
- What are the match-related jobs we need to be aware of?
- How are undermatching or overmatching related to tokenization?
- Do any limitations exist on the number of match rules we can configure for an entity in Reltio?
Matching, of course, is the process of identifying records that are identical or related to one another. Typically, when considering matching data in a master data management system, we think about finding data that is completely identical. When such data is found, it is unnecessary to keep both sets of data. One of the features of Reltio MDM is that you can find these identical records easily, and also find records that are related.
There are two methods of matching that Reltio supports. These methods are machine learning-based matching and rule-based matching.
Rule-based matching is instruction-based, where the configuration provides the instruction and the platform executes the matching based on those instructions. With machine-learning based matching, customers can build their match IQ model, and use this as a model for matching records.
Once match pairs are identified, users need to know the correct action to perform to either automatically or manually match those pairs. These actions are supported on Reltio’s platform:
- Automatic merge. This system-triggered action allows users to configure a platform to trigger a match without any further effort on the part of the user.
- Mark as “not a match”. This is a user-triggered action, given to the data steward as they resolve potential matches.
- Publish matched pairs. another system-triggered action.
- There is also a way to present a potential duplicate record for review to be merged.
- Users may also present a potential related record for review and/or merging.
We will cover these in more detail as we continue through this high-level matching overview.
Match Rule Anatomy
Again, match rule-based matching allows users to write the instructions necessary for the merge action. But, what do these instructions look like? It’s not as simple as just looking at the first name, middle name, and last name of an individual data set. Instead, all three pieces of criteria must be considered together, along with any other data collected about a customer. If all highly identifying attributes in the data do match, it is true that a pair should be matched and merged.
However, in Reltio, there are other things you can configure that provide optimal flexibility to address use cases. These are called different fields of property of the match rule. Different examples of this are listed below.
- uri- the unique identifier for every match rule. In other words, no matter if a user is dealing with ten rules or twenty rules, each match rule must have its own unique identifier.
Example: uri : configuration/entityTypes/IndividualC02/matchGroups/Rule1
- label- this rule label, or display name, is the name the data steward sees when presented with a potential match, matching this match rule.
Example: label : Rule 1 - Fuzzy First and Exact Last Name
- type- or, the action to execute on the match pairs.
Example: type : suspect
- scope- or, scope of matching, allows users to define whether they want to use this match tool intra-tenant within the tenant, externally, or both.
Example: scope : ALL
- useOvOnly- this indicates whether or not the user would prefer to use only the surviving value of the profiles or use all values regardless of surviving or non-surviving.
Example: useOveOnly : false
- Rule section- where users write their match instructions. Different components to the rule section include IgnoreInToken and MatchToken, related to the tokenization scheme.
- ComparatorClasses are also included, where users may define a specific comparison function, forming a statement of matching through the use of the logical operator to build complex match tools. This is where exact, fuzzy, and cleanse come into play.
- scoreStandalone and scoreIncremental can facilitate searching of potential matches more quickly.
Match Rule Configuration Breakdown
Next, we want to break down the components of a match rule configuration. A match rule is divided into two categories. These categories are Matching Logic and Match Type. Matching Logic includes a set of instructions used to identify matches, like cleanse rules, tokenization rules, and comparison rules, among others.
Match Type drives what action is taken on the matched pair, and has three different categories in and of itself. These categories are binary-outcome based, relevance-based actions, and custom type.
Binary outcomes have no grey area. Therefore, the data in question either comes out as a match or not a match.
Relevance-based action gives you that grey area in relation to the true or false, basically giving users a score between the records they are evaluating. Users then may decide if that score is good enough to set up an automatic merge, or if they would like to keep it as suspect.
Custom Type allows you to define your own action, and decide yourself between making an automatic merge or even showing it to the data steward at all.
Match Troubleshooting and Tuning APIs
Some valuable APIs may be able to help you troubleshoot, tune matches, and see what is happening between two particular pairs. Some generic tools that are not tied to your Reltio tenant data are:
In addition, the following APIs allow you to get the explanation behind a particular match pair.
To watch many of these APIs in action in the Reltio data tenant, watch the webinar below.
And, catch the next part of this series, The Rule-Based Matching Process.
For other relevant content, make sure to check out the below:
Matching, Merge and Match Rules
Anatomy of a Match Rule
Reltio Master Data Management Match Rules - FAQ's Part 1
Reltio Master Data Management Match Rules - FAQ's Part 2
Match Tuning Best Practices: Data Profiling and Analysis
Effective Match Tuning: Design and Implementation