Ashish, not sure entirely what you mean by ".. for analysis purposes". As an FYI, for the implementation I'm involved with now, we have been using
Telm.ai extensively for profiling the data during the analysis phase; looking at the source files/data across the board, obtaining the classic measures such as uniqueness, validity, accuracy, etc.. but also extremely helpful in seeing all the various patterns that exist within the data such as we saw in your example. We use the result to remediate data quality problems before they get posted into Reltio. Telm.ai can then monitor all your daily feeds, etc. to look for drifts or additional anomalies that weren't in the initial data you profiled. Highly recommend a free trial of Telm.ai to see how it works for you.
------------------------------
Curt Pearlman
PwC
Agoura Hills CA
------------------------------
Original Message:
Sent: 06-14-2022 10:15
From: Ashish Rawat
Subject: How to match SSN with data quality issues
Agreed on improving the data quality in the ingestion process and that's where we at right now. However wondering if this can be addressed via match rules in the initial phase of implementation for analysis purpose.
------------------------------
Ashish Rawat
Fresh Gravity
Bangalore
Original Message:
Sent: 06-14-2022 09:53
From: Nagesh Lakinepally
Subject: How to match SSN with data quality issues
Hi @Ashish Rawat, I second this approach described by Curt. Implement a custom cleanser and assign cleansed string to another attribute that participate in match process. This simplifies the match rule configuration.
------------------------------
Nagesh Lakinepally
Original Message:
Sent: 06-14-2022 09:35
From: Curt Pearlman
Subject: How to match SSN with data quality issues
Ashish, presumably part of the ingestion process has the responsibility of cleaning up all of these variations and producing a standardized format to post into the reltio record. If so, then the tokenization and match rule scheme becomes easier, no?
------------------------------
Curt Pearlman
PwC
Agoura Hills CA
Original Message:
Sent: 06-13-2022 10:03
From: Ashish Rawat
Subject: How to match SSN with data quality issues
Hi Team,
We have matching records where SSN Identifier have following data quality issues
1. Leading zeros e.g., 537-23-6440 and 0537-23-6440
2. Matching pairs with and without hyphens in between e.g., 537236440 and 537-23-6440
3. White spaces in between e.g., 537-23-6440 and 53 7-23-6 440
I have configured following CustomMatchToken to resolve above issue but no luck
{
"attribute": "configuration/entityTypes/Person/attributes/Identifiers/attributes/ID",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "^0|[\\s-]",
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
}
Please let me know if there are any suggestion.
------------------------------
Ashish Rawat
Fresh Gravity
Bangalore
------------------------------