The Rule-Based Matching Process Continued PLUS Configuration Examples

View Only

The Rule-Based Matching Process Continued PLUS Configuration Examples

By Suchen Chodankar posted 12-04-2021 18:32

Recommend

The Rule-Based Matching Process Continued PLUS Configuration Examples

Without a thorough knowledge of the rule-based matching process, users may not be able to use the Reltio MDM system to its fullest potential. Thus, we have presented the webinar Rule-Based Matching: A Deep Dive to build the necessary background knowledge of the matching process for our users.

In the first section of the webinar, we went over match rule anatomy, a full overview of the matching process, and match tuning best practices. We also defined rule-based matching, which means that the match is based upon the configuration’s instruction.

In the second section of this webinar, we went over some examples of APIs in real time, and watched them in action. We also talked about complex match rules, sub-strings of different attributes, multiple match actions, and internal files.

Finally, it was time to dive into the matching process itself. The process was broken down into a flowchart style hierarchy, demonstrating how data flows from the data log to the CRUD queue, and on to other services like the Match Document Processor. We then discussed the match queue, and how this component is what triggers matching and comparison. Finally, we began the data tokenization process. But, there is more to cover here, as we dive even deeper into rule-based matching, along with configuration examples.

The Data Tokenization Process Continued

Another way to look at data tokenization is to consider each record an entity, and picture the tokenization rules forming multiple buckets. For our purposes, we will refer to entities as E1, etc. When E1 is loaded, it goes through the tokenization process. At the end of the process, it will generate different token phrases. For example, Smith:Richmond:VA may be a token, or Michael Branson. It will continue on, forming many different buckets that present the token phrase.

Then, the next record, E2, arrives. E2 comes in and generates a similar token, and associates that new record with the same token that already exists.

Without a token, you would only be able to match records if you matched every record in the data set with every other record. This would be the only way to match an entire data set, thus taking up a substantial amount of time. These tokens, on the other hand, create smaller universes inside of the data set. Then, anything that is a part of the individual is matched, and nothing else.

So, if something is not matching, first check to see if they are in the same bucket. If they are not in the same bucket, that is why they are not matching. In fact, it is completely impossible for them to match while they are located in separate universes. This is why tokenization is so important.

The Data Comparison Process

Once the tokens are created, they must be picked up for the match rule. So, if there are four tokens for the entity E1, Reltio will pick up the four buckets and construct a possible list of candidates across all of the matched candidate buckets. It does this because each token bucket includes E1. In our example depicted in the picture below, E1 may match with E2, E3, E4, E7, E8, and E9.

Match Rules, Match Pairs and Match Action

Once a candidate list for comparison is created, there are multiple match rules to consider. And, there is no hierarchy to the match rules. In fact, the matching using different match rules are triggered at once. So, E1 and E2 will be picked first, and you will execute the matching for all rules at the same time.

For every rule, an action is associated with it. For example, if this rule evaluates this pair to be true, then what was configured for that rule will be triggered. So, if E1 and E3 are matched, and this rule is marked as automatic, then the pair will be merged automatically. On the other hand, if rule N is suspect and E1 and E2 is found to be true, it will be submitted for data stewardship review. This is how the end-to-end process of overall matching works.

Tokenization Scenarios

Having multiple redundant match rules hurts your potential matches. This is because the more tokens you create, Reltio must perform multiple evaluations, even if it is true or not. Here are some examples:

If you have six records, each with different versions of Michael Branson, one of these may have a different state noted in their physical address. This indicates that this is probably a different person.
Your sixth record, Rachel Branson, is most definitely a different person.
If you do not define a tokenization scheme, each of these records will go in separate buckets and never match.
So, you ignore the first name while creating the tokenization and create a Benson bucket. Now, you can match records with Branson in them, but if the address line is slightly different, they will not match in the same bucket.
So, follow the same approach, and leave the address line one out of the tokenization.
Continue this process until all of your records are in the same bucket.

However, then Michael Branson with a different address, and Rachel Branson, will be in those buckets. This is one example of a tokenization issue.

Tokenization Issues

Another tokenization issue is an excessive number of tokens for an entity. One example of this is fuzzy matching on more than one multi-valued attribute. Another is fuzzy matching on an attribute in combination with synonym matching.

Another example of a tokenization issue is the excessive number of entities sharing a token phrase. You may experience this when too many entities share the same set of attribute values for different attributes, or if there are multiple duplicate copies of the profiles from the source systems.

Configuration Examples

Let’s talk through another configuration example. In the webinar, you can watch as Suchen Chodankar, our presenter, loads all of the records in question in real time.

Micahel Branson is listed with the same address.
Miguel and Michael are listed, with different first name spellings.
Michael is listed twice, exactly identical.
We refresh, and look at Michael Branson for the potential match.
We see that Michael matched with Michael, and know that the match worked. This match worked because he has put in the configurations correctly.
Next, Suchen takes out rules number two and three to take out the configuration.
Instead, he places fuzzy on the first name. The rule has not changed. Thus, he has taken out the configuration.
This triggers the rebuilding and rematching.
Now, Michael is very different. It is only matching the exact Michael entity.
So, he does a double metaform so that Michael should match. He reviews the matches and finds that it did match by document ,but not by UI, as the matched final outcome is false.
This tells you that your tokenization is either wrong or insufficient.
So, we must go back and find the common tokenization. Suchen chooses to try ignore.
Now, there is an intersection token that did match, and in UI, the record returns.

All told, it is of utmost importance that you carefully configure your tokenization to make sure that you have the optimal performance and desired output for forming match pairs.

Rule-Based Matching

Rule-based matching is a complex process. If you have any further questions about this or any other Reltio process, simply post to our community page. And, look ahead to further webinars on this subject and many others coming up at Reltio.

Relevant content:

Rule-Based Matching: Matching Anatomy
The Rule-Based Matching Process

Watch the Rule-Based Matching: A Deep Dive webinar to understand more.

#Matching
#MatchRules
#Featured
#Blog

0 comments

4651 views

Reltio Connect