Reltio Connect

 View Only
  • 1.  How to match SSN with data quality issues

    Founding Member
    Posted 06-13-2022 10:03

    Hi Team,

    We have matching records where SSN Identifier have following data quality issues

    1. Leading zeros e.g., 537-23-6440 and 0537-23-6440
    2. Matching pairs with and without hyphens in between e.g., 537236440 and 537-23-6440
    3. White spaces in between e.g., 537-23-6440 and 53  7-23-6 440

    I have configured following CustomMatchToken to resolve above issue but no luck

    {
    "attribute": "configuration/entityTypes/Person/attributes/Identifiers/attributes/ID",
    "parameters": [
    {
    "parameter": "groups",
    "values": [
    {
    "pattern": "^0|[\\s-]",
    "className": "com.reltio.match.token.ExactMatchToken"
    }
    ]
    }
    ],
    "class": "com.reltio.match.token.CustomMatchToken"
    }

    Please let me know if there are any suggestion.



    ------------------------------
    Ashish Rawat
    Fresh Gravity
    Bangalore
    ------------------------------


  • 2.  RE: How to match SSN with data quality issues

    Reltio Partner
    Posted 06-14-2022 09:36
    Ashish, presumably part of the ingestion process has the responsibility of cleaning up all of these variations and producing a standardized format to post into the reltio record. If so, then the tokenization and match rule scheme becomes easier, no?

    ------------------------------
    Curt Pearlman
    PwC
    Agoura Hills CA
    ------------------------------



  • 3.  RE: How to match SSN with data quality issues

    Reltio Employee
    Posted 06-14-2022 09:53
    Hi @Ashish Rawat, I second this approach described by Curt. Implement a custom cleanser and assign cleansed string to another attribute that participate in match process. This simplifies the match rule configuration.​

    ------------------------------
    Nagesh Lakinepally
    ------------------------------



  • 4.  RE: How to match SSN with data quality issues

    Founding Member
    Posted 06-14-2022 10:16
    Agreed on improving the data quality in the ingestion process and that's where we at right now. However wondering if this can be addressed via match rules in the initial phase of implementation for analysis purpose.

    ------------------------------
    Ashish Rawat
    Fresh Gravity
    Bangalore
    ------------------------------



  • 5.  RE: How to match SSN with data quality issues

    Reltio Partner
    Posted 06-14-2022 10:51
    Ashish, not sure entirely what you mean by ".. for analysis purposes".  As an FYI, for the implementation I'm involved with now, we have been using Telm.ai extensively for profiling the data during the analysis phase; looking at the source files/data across the board, obtaining the classic measures such as uniqueness, validity, accuracy, etc.. but also extremely helpful in seeing all the various patterns that exist within the data such as we saw in your example.  We use the result to remediate data quality problems before they get posted into Reltio. Telm.ai can then monitor all your daily feeds, etc. to look for drifts or additional anomalies that weren't in the initial data you profiled. Highly recommend a free trial of Telm.ai to see how it works for you.

    ------------------------------
    Curt Pearlman
    PwC
    Agoura Hills CA
    ------------------------------



  • 6.  RE: How to match SSN with data quality issues

    Founding Member
    Posted 06-15-2022 12:30
    Thanks Curt for your suggestion, I will definitely look into the tool. These are basic Data quality issues which can be addresses in Integration for sure however by analysis I meant to come up with the effective match rules so that even in future if there are such issues in data we don't have keep changing integration but just a little tweak in match rule regex.

    ------------------------------
    Ashish Rawat
    Fresh Gravity
    Bangalore
    ------------------------------