Reltio Connect

View Only

Match IQ - Machine Learning Based Matching - Show

By Chris Detzel posted 07-26-2022 15:17

Recommend

Translating matching requirements to the match configuration can be time consuming, complex and error prone. In this webinar, we will walk you through the machine learning powered Match IQ feature for matching your data efficiently without having to code or configure. We will demonstrate how easily any user with the knowledge of only matching requirements can build, train and publish the Match IQ models for matching.

Find the PPT here.

Transcript:

Chris Detzel (00:07):

Welcome everyone to another community show, Machine Learning Based Matching. Today, we have senior product manager, Suchen Chodankar. And of course me, I'm the director of customer, community and engagement. This is a highly anticipated community show. A lot of folks are very interested in this. So the way this works is today... Of course keep yourself on mute. Most of these questions and I assume we're going to get a lot of them, will be asked at the end of the session. So we'll leave plenty of time free to ask. But the way that Suchen Chodankar mentioned, he's just said, "Hey, look, let's just get through it a little bit and then start asking the questions." But please feel free to push those questions in the chat as soon as you have it so we don't miss it. Definitely still want to ask those and we'll leave 15 to 20 ish minutes in that section at the end.

So we do have a few shows coming up. Today's Machine Learning Based Matching, really excited about that. As a matter of fact, in two days we have another show coming up called Reltio Integration Hub with [inaudible 00:01:17] at Avara, really excited about that one as well and how they're using Reltio Integration Hub. And then we have Using Reltio Integration Hub, and that's going to be in August, to automate Reltio workflows. I know we've had a couple of sessions just on Reltio workflows, but this one goes a little bit deeper in how to automate those things within RIH. And then on September 29th... And by the way, we will fill up September with shows. I just haven't got all of those pushed in yet, but we do have a show on Blazing Trails and Master Data Stories from Women Succeeding in MDM, so that should be really good as well. I'm going to stop sharing, Suchen, and let you go ahead and share. So when I first started, there was like 10 people on and now there's 45. So I told you. Welcome.

Suchen Chodankar (02:17):

Thank you, Chris. Welcome everyone. Let me just share my screen. Chris, can you confirm you can see my screen.

Chris Detzel (02:26):

Confirmed. Yes.

Suchen Chodankar (02:28):

Okay. All right. Yeah, like Chris said, what I'll try to do is basically go to the presentation. So today we're going to talk about the Reltio Match IQ feature, and that there are three sections that I'm thinking about basically. The first one focus on the problem statement related to the matching. The second one is like how we are trying to solve this using Match IQ or the ML based matching. And the third section basically we can probably have more of interactive kind of session where you can ask questions and all. But if there are any questions that you feel that needs clarification, feel free to interrupt and we can basically discuss those things. So with that being said, let's jump into it. So this will be a good session if you understand the challenges that we are facing in the data matching process.

And like I said, we are going to talk about how Match IQ which is a ML based matching, try to kind of solve or address some of these challenges. And we'll spend a little bit of time on going into Match IQ feature and capability. I'll do a short demo and I'll show you how the product works, and what you can do with that. So before we jump into the data matching problem, let's understand what data matching is. So the matching is a task of finding records in a data set that may refer to the same entity. And this is one of the common use cases that many of you might be familiar with. You have data in multiple source systems that you might be basically using for different functions like HR, finance, marketing products and things like that. And you have basically multiple copies of those entities. And the entities could be basically customer, your product, employees, individual, context, anything.

So the purpose is to find those duplicates and consolidate and create a 360 degree view of that. There are also other use cases which are related to the matching. Now, what to do with a match record is your action. Once you identify two records as a match pair, you can basically decide to merge them because that's what you're trying to basically do, consolidate and create a 360 degree view. But in another use case that you can also use matching for which is to find a related entity. One of the common use cases of that is basically householding for example. You want to find all your customers who belong to the same household. So what you do is basically use the matching engine to find those records who are related, and use the action as it basically relate which will basically kind of tie them together instead of merging.

So basically these are the two common use cases that we have seen and there could be more. So let's talk about the data matching challenges, and I'll talk about three different aspects of this one. The first one is consolidating this data sources or aligning these data sources, it is cumbersome and time consuming. And I'm sure you'll be able to relate that once you identify the source system that you want to basically bring together and create a 360 degree view, you have to go through a process of basically understanding what's the data? Where the data is coming from? What attribution from those different sources you can use? And basically how to structure them, or what is your business requirement? So there's a lot of processes that goes before you can get to even configuring your business rules that will basically give you the matching results that you're looking for.

And we've seen that this match and merge process using traditional approaches usually takes weeks, and even months in some cases based on your use cases. And you have to configure using whatever MDM tool that you use, understand the syntax, understand the technology behind it, understand how the rule has to be written, all of that. And you have to tune it to make sure that you don't have overmatching or undermatching, and you are getting relevant matches and accurate matches. So your results could be like really bad if you are matching your rules and your configuration is not correct. So this could lead to unreliable data.

Let's look at the second aspect of this one, basically time to value. I touched upon this little bit in the previous slide when I said it takes on average like eight to 12 weeks. And it depends on your use cases. So if you are de-duplicating or you're creating a relationship between different entities with which has complex kind of business requirement, this timeframe could be even much larger. So depending on the number of entity types that you have to basically de-duplicate, this could go in months. And the typical implementation process looks like this or different phases of the project. So you have the requirement phase where you can see the business and technical team working together. Here the business team basically looks at the result of the data profiling and decide what attribute they would like to basically consider for matching.

And typically business team will sit together with a technical team and write those detailed list of requirements. How the matching should happen based on the data profiling. Once the requirement is finalized, you basically hand it over to the technical team which then translate that requirement into the underlying configuration. Whatever tool you use, if rule base, whatever it might be, you need to understand or you need to have a team who understands how to translate those business requirement into rules, or into matching process. This typically takes quite a bit of time because it is the first time you are putting everything together. The first situation is sort of you try to hit close to 70, 80% of your requirement, and just kind of iterate to get through all your requirements in the iteration two and three sometimes.

So as you can see the iteration one, iteration two, and the final iteration is where you basically hand it over to business user and basically ask them to verify. And if anything is not as per the expectation of business team, then you have to repeat this process, add more iterations to it. So like I said, depending on your business requirement, this could be even in weeks or in months, so time to value is very important here. By the time you start implementing and get to production, your requirement could change and now you have to basically go back and retune everything. So it is very important to get the matching process set up as quickly as possible. And this is been one of the challenges that many customers face. Let's look at the third case or third aspect of this one.

This is again one of the common scenarios that you will come across. So you start with a business requirement where you say records should be matched if they match on the first name, last name and address. This is just one of the use cases and you could have similar other use cases. So you handle this requirement to your technical team. They will basically sit down and kind of create a rule out of it, or set up the matching process based on this business requirement, go through the testing, take it all the way to production, load data and it's doing what it is supposed to do. And then you realize that you have to onboard another source system, which is providing suffix which you did not have before. Now, obviously this is a very important information. Looking at the example on the right here, you can see that they're clearly two different people based on their suffix.

So what you need to do is basically go back to your requirement process, the chart that I showed earlier, and make sure that you retune your matching process to make sure that all this new requirement is accounted for. And also you have to make sure that because of this change that you're going to introduce in your matching process, it does not disrupt anything that you might have done with other rules. So there's a lot of rework that you have to do in this cases. And again, you are back to the same cycle which is make sure the requirement is documented properly, go through the tuning, iterate on that and then do the regression testing, all of that stuff and bring it. So these are three challenges that we wanted to kind of focus on this session.

There are more obviously, but let's start with this and then we can discuss a few other challenges in follow up sessions. So these are the three things that we are thinking about. And so what we are set out to do is basically build a best in class matching engine that addresses those three challenges that I talked about. Which is enable our customers to set up the matching easily for finding accurate and relevant matching, make sure that you are not overmatching or undermatching. So find the accurate and relevant match record, which also adapts to the changing business requirement. Which is the third part I was talking about where your business requirement has changed, now you don't want to go back and start all that process all over again and take another eight to 10 weeks to set up that thing.

So basically how we are solving that, so we are introducing Match IQ, which is a ML based matching engine. So there are three things that I want to focus on this in this session. So basically the first part is it leverages the machine learning for data matching, and I'll explain how that helps with some of those challenges that I talked about. Using machine learning or using match IQ you can now reduce the time needed to set up matching process and also increase the matching accuracy. So basically, what we are trying to do here is cut down the time that is needed to set up all the matching processes. And reduce it so that you can iterate more and spend more time on the data, rather than basically understanding the configuration, understanding the technology, and basically trying to figure out how to write the tokenize or how to write the comparison and all those technical stuff.

So focus more on the data and kind of reduce the time needed to set up. And the third thing is... Again, if you go back to that first chart, there's a business team and there's a technical team. A lot of time basically, you have the business team who does not understand how to configure those rules or how to translate that requirement into the underlying rules or matching process. So you have to basically rely on a technical team to do that, and then wait for them to kind of configure all of that stuff, run the simple pairs and wait for the results so that you can verify whether the configuration was done correctly as per the business requirement. We wanted to basically provide that capability to business users so that they can basically drive this whole match set up process by themselves, and look at the results right away instead of waiting for weeks.

So those are the three highlights of this and how we are solving some of those challenges. Let me quickly go to the Match IQ overview. So just to reiterate, we are doing data matching by leveraging the power of machine learning. And I'll talk about how machine learning is helping in those three challenges that I talked about. So we empower business users to build this machine learning models, which are historically like technical task. And you need a lot of technical expertise and teams to kind of put all of this together. So we are very excited to get this tool out and let basically business users use this model, which is very easy to create and deploy this model for matching. Does not require any coding or configuration to do this, and Match IQ application will guide you throughout the process to create this models and deploy.

So what is happening there is the Match IQ has tools that basically derive the matching requirement based on the model training. And I'll get into that, what model training looks like and how Match IQ understands what the matching requirements are if there is no coding and configuration involved. And the last one like I talked about in the first slide, there are multiple actions that you can do once the match pairs are identified. So you identify John Smith and Jonathan Smith as a matched record. What do you want to do with that? Do you want to merge it? Or do you have enough confidence to basically automatically merge? Or do you want to create a suspect record for data steward to review it further and take a manual decision? Or do you want to relate? Or do you simply want to just publish the record? You want to say, "Don't do anything, just tell me what you found out. And I will do something on my end." So all those three actions are supported in Match IQ. I want to just quickly switch over here.

All right. Let me go here. So how does this work? And I'm going to jump into the demo right after this, but just wanted to give you a high level process flow. What it means to create a machine learning model that can do the data matching using virtual Match IQ application. So there are four simple steps. The fifth one is really where it gets into the actual matching process and resolving the matches to either merge or relate and all that stuff. The first step is basically as simple as going to Reltio Match IQ application, selecting the entity type that you want to set up the match process for, and then select the attribution. Like you can say, I want to train a model for matching my... Say, individual or customer, and use first name, middle name, last name, address line one and city/state, whatever.

You just pick the list of attributes, that's all it takes for step number one. And you just kick it off. And then what is happening here is like Match IQ will take that information, and then go grab all the data that is needed from your Reltio tenants automatically. And do all that background processes, which is more of a technical work. In today's rule based matching world, you have to basically design the tokenization scheme, which is basically creating a smaller universes of data set that you can match within. Hey, Chris, I'm still audible, right? Because I just want to make sure that I've not lost anything.

Chris Detzel (19:13):

You're doing good, man. Yes.

Suchen Chodankar (19:18):

Okay. All right. The second step is basically, Match IQ is going to notify you saying that you told me you want to match on customer and these other 10 attributes that you want to consider for a matching. I've done all the backend work that I needed to do, and I'm ready to kind of understand your matching requirements, so that's where the training process comes in. So the way training works is... So this is how the training screen looks like. So as you can see, these are the list of attributes that you have selected and it'll... Based on your data that you've loaded in Reltio tenant, it'll provide you some sample pair. This is very similar to what a data steward would do in resolving a potential match. So this is a pair. Does this look okay? Yeah. The first name looks exact.

The last name looks exact. Address line one you can notice that there is a Northwest here and here there is no Northwest. Now, based on your requirement you can say this is acceptable to me, and then you can hit match here. Or you can say no, address line one has to be exactly match, so in that case you say no, not a match. You would also notice that the state is spelled differently here, there is GA and Georgia. So based on your requirement you can go with whatever option you want. So based on your action, Match IQ understands what your matching requirements are. So if I were to hit not to match here, what it infers is that I know that first name is same, I know the last name is same, I know city and zip code is same. So those cannot be the issue here if you hit not to match.

So it has to be either state or address line one or both of them, so that's where it will go and try to find another pair to basically see whether you agree with that pair. And it basically brings pair after pair. And that's how it basically derive the matching requirement from this training exercise. So you don't have to worry about what my business requirement should be, what different combinations I should basically document, all of that. All of that is basically derived through this training exercise. And I'll get into that in the live demo. So once the training is done, you have an option to say, okay, I've trained enough. Show me what you have learned. So that's when you say end training and show me the result. So this is where the review steps comes in.

Review is a step where Match IQ will generate sample pairs, and it'll provide a spreadsheet for you with a list of pairs. And you can go through that and you can see if Match IQ understood whatever you are trying to teach Match IQ on the models. So you will see the pairs. And if you say that, oh yeah, I did 80% of the matching correctly, but I still see 20% of the record which is not matching correctly. So the only reason it would not match correctly is because those scenarios were unknown, or Match IQ did not know how to match them. Those scenario did not come up during training, or you ended the training too early. So in that case you simply basically go back to training and resume from the point you ended the training from. And you just continue to train a little more and then kind of this is your iteration process.

So now instead of basically going and tuning and waiting for the results, you can do all of that in the Match IQ application. Once you sort of agree with the result, you can approve the model, and approve the model means the model is ready for deployment. Now, deployment is also something that we were kind of thinking how we can simplify this further. And great news here is, a business user can basically deploy all this by themselves and you don't need any kind of technical expertise for that as well. And I will show you that in a minute. You can basically say, I've looked at the results and based on the results, I've seen that anything more than 80% confidence in the match pair was pretty good match. I'm very certain that every record that I've seen in this review process, which matched more than 80% was a good match.

So let me go ahead and ask Reltio to basically automatically merge them. So you can configure that right from the Match IQ application. And anything less than 80% confidence, create a suspect pair for my data steward to review. And anything less than say 60, do not even bother kind of showing them on the UI or whatever, just ignore them. You can do all of that stuff. You hit publish and it's live. It's starting to basically do match and merge. So let's look at how that process looks like. I'll take a pause here and see if there are any questions so far.

Chris Detzel (24:22):

So there are a lot of questions. Do you want me to start asking them or do you want to wait?

Suchen Chodankar (24:32):

Maybe we can just get through the demo very quickly and we can come back to the demo, if needed.

Chris Detzel (24:38):

Yeah.

Suchen Chodankar (24:39):

All right. So these are the four steps that I talked about. So let's try to create a new model. So like I said, creating a model is my first ML model. Oh, before that, so these are the list of entity types that are configured in Reltio by you. And you can basically pick any entity type that you want to create model for. You can create for location, organizational. So I'm going to create it for contact, which is my individual. And I'm going to basically say, I want to create a new model, my first ML model. Then this is a basically list of attributes. As you can see here, there are some recommended attributes for you. This is just commonly used attributes for that entity type so it will list all those things for you, and then you can choose a recommended attribute.

Along with that, I'm just going to go and create another set of attributes and just click outside of this box. As you can see, this is a list of attribute that you have selected for model creation. And then go ahead and click create. Now, once you hit create, basically this is all it takes to start a model. Now, like I said, there's a lot of background processes that has to happen. It has to basically create those smaller universes of data to match within, because you don't want to match John with Jackie for example. You want to go and create those smaller universes of data set, which has any remote possibility of matching or basically generating any decent match. So this is what is happening here in the background right now, it is trying to figure out which is the best set of attribute to use to create those smaller universes.

It is also basically getting your training data ready so that it can start asking those questions. So I have one here which I already have created. As you can see, it automatically moves here in the training once the model is ready for training. And it'll basically have this status which says, start to train. So basically you have to click on this train and it'll launch the training session for you. I have launched one here before this session, so I'm going to go and click on this train. Now, when you click on that, the screenshot that you saw earlier...

Yeah. So this probably has a little more attribute than I had selected in the other. But you can see that it is asking me this question, how does this match look to me. And I can say the data steward or match export I can say, the last name is good, the first name looks correct, here the address seems to be different 6-516. So I'm going to say, no, it's not a match. So what is happening here it is basically going back and checking, okay, so the address components is something that probably user did not like. It'll try to basically find all different combination here. So as you can see it has a Church street, which is common. So it is trying to figure out whether fuzzy matching is what you're looking for. And I'm going to say, no, this is still not a good match for me.

And you can go on like that, right now, as you can see it got the number right this time but still 16th street and 17th street. And it still has the first name and the last name same, so I'm going to say not a match. So this is basically a training process. As you can see this is so data driven that you don't even have to worry about what kind of different combinations of attributions you should put together to find a match. You are looking at the data and you basically are saying, look at this, this is exact first name, last name, the address line one is pretty close now, 63 state and this is East, Texas Austin. Okay. So let me go ahead and say match to this one. The moment you do that now Match IQ has got its first branch or first set of requirement.

So when the first name is exact, last name is exact, city, state, zip code is exact and address line one is slightly different, it is a good match. And you might see similar match pairs coming up over and over again here. That happens because we want to make sure that it was not a accidental click on this match or not a match. We want to make sure that you really want that match a requirement for Match IQ to consider. So you will see multiple times, and we are trying to basically see if that is really what you want. So here now it looks like it's moved on to a new use case. So maybe just trying to see if first name is good enough or whatever. And I'm going to say not a match. Okay. So now you can see I'm on question number 32. So let's look at, I have another one here which I just picked off just before this session, it does not have any answers. Okay. So let me... One second.

And this training sessions are basically built on the fly, meaning it basically does not remain open all the time. To optimize for the best performance and the cost we basically start them on the fly as you need them. That's why you see this two different statuses, this one says ready to train and this one says, start to train. Meaning when you click on train, it'll take two to three minutes to set up that session for you and you will start seeing the match pairs. So let me try to do this one more time, and then we'll go here and talk about... Okay, so it looks the session was expired while I was covering the slides. So now it is trying to set up a new session and you will see that in a couple of minutes.

So before that let's go back here and talk about a couple of options here. So obviously you cannot train the model in one hour or two hours or whatever. So you probably want to take a break, or you want to train it for a couple of days or whatever it might be, so you have this option to say save and close. What that means is, I'm not done with my training. I'm just taking a break. I want to come back to this one. So when you do save and close, it'll take you back to the dashboard and... One second, let me pull up that one more time.

Okay. It'll take you back to that dashboard. You can resume whenever you are ready to try again. So it'll resume from the last question that you answer, 27, so that you will see the same pair and you can resume train as long as you want. And once you're done, you can basically click on end training. So when you do end training, it basically will move the model automatically from training swimlane to review swimlane. Now, this is where you basically say, okay, show me what you've done so far or what you've learned. So I can download a simple set of records or... Let's look at this, okay. While the spreadsheet is opening, let me talk about this other option.

This is basically where you can say, do not generate your own or sample record from the data that you probably have seen or something that I've loaded in Reltio. Let me bring up my own set of records that Match IQ has never seen and let me see how you do. So you have that option as well. And you can basically say create review model, and now you can basically bring your own records in a spreadsheet. I can say... Yeah. And this is a small file, actually just a simple file which I'm not going to run here, but I just want to explain the concepts. So you basically use this, I say these are my records, go run the match. When you click provide a name here and say, run match. So what you have done here is you have uploaded your own record set, and you are asking Match IQ to use the model that you just created and run the matching on this one.

And once the job is complete, it will give you those results in here. You will be able to download it from that job here, but let's look at what we have downloaded from the sample. So this is how the sample results look like. As you can see, these are basically the pairs and you would see that it does a pretty good job, even with the few number of answers or questions. Like the first name is same, the last name is same, the address line is missing here. And it will basically tell you that based on just the first name and last name I was able to generate this match pair. And I'm giving the relevant score which is the confidence is 100%. Now you would ask why this is 100% when it has just matched on the first name and last name, because that is what I trained the model to do. I said, if it's a first name and the last name, I don't care, just match.

So it basically creates the model based on your requirement. So you can look at this, look at the relevance here and this is where I can see this has generated slightly lesser confidence. You can look at that and say, no, this is still okay. Now comes to approved part, basically you say, okay, looks good I'm going to approve. Once you approve, these are all approved model which you can put in production in your tenant for matching. So the publish is as simple as just dragging the slider around. You can say, I've looked at the data, the sample results and anything more than 83% is a pretty good match so I'm going to automatically merge. Anything less than 82, more than 60, I'm going to create a potential match. I'm going to label them as high confidence matches probably, or if you can say merge candidates.

So now what this does is basically it gives that additional data point to data stewards basically to say, okay, let me work on this one first before I spend my time on some of the other ones which probably are not a match. And you can basically label and create as many buckets as you want. And there's another category I wanted to show you here is published match profile. There's a third category I was talking about the third action. You can say don't do anything, I just want those match pairs in my downstream system, publish them in my queue or something like that. So it can do that too. And once you save, it automatically deploys this model on your tenant and start matching and perform all those actions based on the score right away. So that's probably a good point to kind of stop and see if there are any questions.

Chris Detzel (36:48):

There is.

Suchen Chodankar (36:49):

We're doing good on time, I guess.

Chris Detzel (36:53):

Yeah, we are. But I think this could take a while and I think it's good. Couple of questions just to get right off the bat. One is, is Match IQ available now or is there additional cost?

Suchen Chodankar (37:08):

Match IQ is available now. There is additional subscription. But we are currently working on kind of a new pricing which will include the Match IQ in that new pricing. So there are two kind of pricings available right now. You can go without Match IQ, and you can say Match IQ included and a bunch of other stuff. You can buy it that way.

Chris Detzel (37:37):

Okay. I'm sure there'll be additional questions on that. So what I tried to do is kind of pile up these questions by person. So let's start with the first several. Does locate have a code for an address to understand that two addresses written differently are actually the same, without using rules like fuzzy or contain or street? Does that make sense?

Suchen Chodankar (38:04):

Yeah, so basically the question is if one is written as 101 Main Street and the other one is 101 Main St, what locate does it standardizes first. So both addresses will become St, so you don't need additional code. If locate returns the end result as one single address, that's an indication that is the same physical location. If you see slight variation, that means locate identifies those two entrances as two different locations. So it does not have a code, but it has the AVC code which you can use to say, the locality is very important for me in addition to address line one, city, state and all of that stuff.

Also, the unit number is very important to me, so do not treat unit and the one without unit as same. You can basically differentiate. So locate basically tells you that I've verified this and the AVC code which is sort of a code tells that tells you to what level they have verified that, to the street level, to the building level, to the unit level. You can then decide whether you want to treat two addresses as same based on your business requirements. So yeah, there is a code in a way you can use.

Chris Detzel (39:35):

Great. Does Reltio support multiple ML per entity, so example for like different countries?

Suchen Chodankar (39:44):

At this point of time there is one model per entity type, but we are looking to open that up and you can create multiple models in near future. Also, you would need multiple models, basically if you're trying to do de-duplication on entity and relationship creation. So the question here is, can I do two models for de-duplication by countries? So today you cannot do that but if you add country as an attribute in your model, you don't have to have two different model. What you have to do is every time that question appears where the countries are different, you keep hitting not a match and Match IQ will understand that user is trying to say that country is the bare minimum, a match or similarity that should exist before I can call anything match. That is one option.

But if you have to create multiple model per different country, and it could be for any other criteria, regions, countries, product lines, whatever it might be, we are considering basically supporting multiple models for entity type for the same use case like merging, or de-duplication or for relationship. So short answer, today you cannot create more than one model per entity type, but in near future you will be able to.

Chris Detzel (41:09):

Great, thank you. Is it possible to create custom dictionaries to understand that? So for example, in Polish like Jersey and, or Jurich, so representing a same first name.

Suchen Chodankar (41:25):

So what we have also as part of this Match IQ, there's an inbuilt translation happening. So if there are different... And probably a name is not a right kind of example there, but let's say you have a different data set in different languages. And it is supposed to be the same name or same value when you kind of standardize them in one common language. That is something Match IQ handles natively. You don't have to basically do any additional configuration for that. So what it is doing right now is when you look at this record, it will basically transliterate everything before it does this kind of training. And I'll show you an example of that. If you look at this record here, this is in Chinese but it tells me that this record here is a match with zao. And you can see that it was able to do that because it transliterated this data, and then it was able to find this match. I hope that answers the question. If not, we can come back to that if time permits.

Chris Detzel (42:51):

Okay. Yeah. We'll probably come back to that potentially. So another question, does the system verify the address by consulting any third party?

Suchen Chodankar (43:02):

Today we do the cleansing using our inbuilt, the address cleansing tool. So every record that you saw here was already cleansed before it was brought into this Match IQ. So we are pulling records from Reltio tenant today. So when you load data in Reltio tenant, by default it is cleansed first and then goes through any other processes like matching with or without Match IQ, [inaudible 00:43:38] roles, export, all of that stuff. So the cleansing happens before it gets here.

Chris Detzel (43:45):

Okay, great. How does this system address old addresses and the customer moved to the new addresses? So even if they used third party address verification, we have this challenge.

Suchen Chodankar (44:05):

So that's one of the things about address standardization. Regardless of what tool you are using, right. It is supposed to standardize, because location in the US or in any other country is going to be same. There might be some variation in the way this address cleansing tools return the data. So those kind of small changes, relative Match IQ can handle those things using fuzzy matching on the address line one, or address line two, or even city, state and any other. But if it's a completely different address and the cleansing tool itself is resolving or not resolving those addresses correctly, then you will have problem here. So it's like a Match IQ does not do any cleansing of addresses in addition to what we get from the standard address cleansing tool that we have built in. I hope that answers the question. So again, just to kind of provide a short answer, Reltio Match IQ does not do any additional cleansing of addresses here. It works on the addresses that was already cleansed and stored in Reltio.

Chris Detzel (45:24):

Okay. I think so. Is there limitation on the volume of data that it can handle?

Suchen Chodankar (45:31):

No, there is no limitation. Actually, probably there might be a question on this first step when I hit on this create model and I give seven, eight attributes. Let's say I have a billion record in Reltio tenant, how does it understand basically what record set to pull? So this application here, it is working on the simple data set. So even if you have a billion or 100 billion records, it is not pulling everything from there. It is randomly picking the data set. And there's a limit of number of records that we use for training. And that is configurable, based on your subscription you can basically say, I have 100 billion records I want to basically train on 100 million records or whatever. We can do that but right now the set that we are pulling is close to like a million record or something like that. So yeah, once the model is created and deployed on that 100 or 100 billion or 100 million record set, there is no limitation. You can use the same model to run on 10 records, 100 records, a million records, one billion record.

Chris Detzel (46:43):

Got it. How do we measure the impact of these changes to make sure that we are not significantly impacting the financials if the businesses are doing these changes? So do they do it in like a QA environment first?

Suchen Chodankar (46:59):

Yeah. So the typical pattern that we have seen is like I said, this model can be created or can be initiated in any environment. You can basically start with the development environment, run your test on there, take it to the production test environment. And this is the cool part. You don't have to basically start creating or build a model from scratch on every single environment, you can port this model. Once you have fully verified it and let's say you have multiple implementation, not even different environments. Let's talk about multiple implementation in Reltio, one for say Europe, one for US, you can basically say, this is the model that works pretty good for me for matching individual. I want to use this as a starter model in my UK implement or Europe implementation. You can simply pull this. You can simply push this model to that tenant and say, use this model and you can train it further there.

So same concept goes here. You can start a model in depth on a smaller data set, test it a little bit, push the model to test environment, test it a little bit more on higher volume data. And if you find something that you have missed, you can train it further in test environment. Once you're done with that you can then simply push it to production. So you don't have to create an individual environment. And that's how the pattern... What we have seen is customer like to create this in smaller or lower environment, try it, see what the impact is. And one cool thing about this one is in the managed published setting, you don't have to basically pick the auto merges if you're not confident. You can simply say, just generate the match pairs for me, I want to have a look at them before I decide. Once you look at all of this data in production and wherever you see 83% and higher and you are still convinced that it's a right percentage of confidence for you to do a demo, simply come back here, change this to automatically merge, save it.

And now it'll go back to those match pairs and start merging for you, so that is another thing that we have seen. And I'll show you that instance of that. As you can see here, this one is... Actually, let me go back and pull up this record here. So this is a pretty good match here, Bobby Smith and Bob Smith and all that. You can see there are some common... Yeah, this one right here. There's a common address there. This one says Bob, Bob Smith, this is a pretty good match for me to automatically merge. But what I've decided here is I want to have look at all those records first. And once I go through this, verify like 1000, 10,000, however how much on record you want to kind of verify. Then you can simply come back to Match IQ, change the action, save it and done. It'll come back and merge those things for you. All right. We have just 10 minutes, so I want to make sure I can answer as many as possible. Can we go to the next one?

Chris Detzel (50:16):

Yeah. So is there any confidence score threshold for matching? And how do we decide the attributes if we are defining it for the first time?

Suchen Chodankar (50:29):

Yeah, so that is one of the thing that we will add in the feature, that you don't have to basically pick the attribute. We will tell you which is a good match attribute to pick. At this point of time you select say 10, 15 attributes, you look at the score during the review process. And basically you can decide whether this is a good match or not. And if you feel that this is 100%, but this is a crappy match because it still does not consider this other attribute that I thought will be important. You can basically simply add an attribute and try with that. In future what we are trying to do is consider all your attribution, run the data profiling, run all that calculation we do in the background today, and suggest attribute which is more on the data and not just based on the entity type like first name, middle name, last name, and things like that.

Chris Detzel (51:26):

Great. Lots more. Will the training ML understand on what basis data stewards approved the match or not approved the match?

Suchen Chodankar (51:37):

Yes, it does that today. So basically what is happening here is Match IQ has already made the prediction here. Match IQ knows the answer to this question. It is just trying to see if that answer aligns with the response from the user. So it knows that there's a match on the first name, there's a match on the last name, there's a match on this address line, but there is no match on city. So that there's a answer behind this question which Match IQ already holds, which might be like it's not a match. And let me see if user click's not a match. If a user does, then what it means is okay, I'm on the right path, let's keep going. If the answer is no, which is conflicting answer, that means Match IQ has to basically go and find some different combination and come back with a better answer or better pair that user will agree with. So the purpose of this training is to get to that point where the answers of Match IQ and users sort of aligns.

Chris Detzel (52:46):

Okay, what's the end goal of training? Will it auto match and merge in the future?

Suchen Chodankar (52:51):

No. Sorry. End training, is that the question?

Chris Detzel (52:57):

What is the end goal of training? So will it auto match and merge in the future? Okay.

Suchen Chodankar (53:05):

The end goal of training is what I was just talking about. It starts... If you look at this record, this is brand new model. As you can see this is the first question. It has no idea what your matching requirements are going to be. It just starts with a basic, simple matching based on the first name, last name. And it makes a prediction that this is a pretty good match. So if I say, yes, it is a match, then now it has that starting point and keep making prediction until it can generate series of prediction that user agrees with. So that should be like 10, 15, 20, 30, continuous acceptance by user, either it's a match or not a match. As long as the answer align consistently, it basically will keep asking those questions. And so the end goal there is to come up with that requirement that user wants to put in, so that's what it is trying to do there.

Chris Detzel (54:06):

Great. So can we train this ML machine learning to not merge email addresses which are used for business?

Suchen Chodankar (54:29):

To merge-

Chris Detzel (54:29):

Sorry, let me repeat that.

Suchen Chodankar (54:29):

Or not merge email addresses?

Chris Detzel (54:29):

Can we train the ML to not merge email addresses which are used for business?

Suchen Chodankar (54:32):

Okay. So if I understand the question, if there is something that you should merge when something is not same or something like that. If there are two different email address, then you only go and merge. Today it does not handle that particular scenario and that is something that is already on the roadmap where we will basically... I want to clarify something, at this point of time, whatever you're seeing here is happening outside of your primary tenant. So whatever you do here does not affect your data in Reltio tenant until you go and publish. So this is just about creating that model. So it can support all those match or same, similar, fuzziness, all that kind of competitors. It does not basically look for something which is not same at this point of time, but that is something that we are planning to add in future.

Chris Detzel (55:34):

So how many iterations should be enough for the training to have a small percentage of success? So how much do they need to train before you start [inaudible 00:55:45]?

Suchen Chodankar (55:44):

Yeah. Based on the implementation that we have seen, go through 40 to 60 questions on individual data set let's say, and you will start seeing a good 40, 50, 60% accuracy. So obviously that is just to get a sense of how the Match IQ has learned, but obviously you have to train it a little more. Like 100, 150 questions is a good kind of data set for Match IQ to work on. But I've seen even with 40, 60 questions it generates pretty good matches.

Chris Detzel (56:22):

Great. Was Match IQ developed by Reltio or is it a third party?

Suchen Chodankar (56:27):

Match IQ was developed completely inhouse by Reltio.

Chris Detzel (56:32):

Okay. Is the base training specific to an industry or no?

Suchen Chodankar (56:44):

No, it is not. So basically right now we have certified individual model for matching, which means you can use this for any industry as long as it's an individual entity. There are other entity type that you can still train. And we are in the process of certifying those organization and things like that. But individual in different industries, you can use Match IQ for any industry as long as it is individual at this point of time. And you are free to basically train for non individual as well, it is just that we are in the process of certifying those other entity types.

Chris Detzel (57:28):

Great. Can we have multiple models, algorithms to be developed to cater to two different business requirements? I think you answered this but I'm not sure.

Suchen Chodankar (57:39):

Yeah. I answered that question. Okay. So you can create or you will be able to create models in future for different use cases. But at this point of time, one model try to use as many attributes as possible to cater to those multiple use cases.

Chris Detzel (57:54):

Got it. So when training the model, shouldn't, you have the option to say potential match or threshold versus auto match to help that definition when you're asked to compare pairs, identify?

Suchen Chodankar (58:10):

Right. So basically what is happening is you will notice there is a third option, not sure. When you do that Match IQ is basically deriving all of that stuff. So we don't want you to decide here whether it's a potential match or whether it's automatic match. We want you to just tell me how this pair looks like. If you were to look at two people and have all this information about that two people or two persons, what would you do? Will you basically automatically match, that's when you hit yes, it's a match. And whenever you are not sure whether it's a match or not a match, you basically go in not sure. So we want the matching process to be separate from the action. So the matching is just about, tell me how close they match and let me decide later what category of score falls in a good auto match bucket versus the other bucket, so that's the approach that we took.

Chris Detzel (59:04):

Great. So unfortunately we are out of time. We have literally 15 more questions that we did not get to. So what my promise is to you, is that I will push these questions out on community.reltio.com by the end of this week. And I will have Suchen answer those, if you're cool with that Suchen because there's still a lot. I can send you some of these questions and maybe just answer them, and then I post on their behalf

#Matching
#communitywebinar
#MachineLearning
#MatchIQ
#CommunityWebinar

0 comments

7566 views

Reltio Connect

Match IQ - Machine Learning Based Matching - Show

By Chris Detzel posted 07-26-2022 15:17

Permalink

Quick Links

Privacy & Terms

Account Not Active

Reltio Connect

Match IQ - Machine Learning Based Matching - Show

By Chris Detzel posted 07-26-2022 15:17

Permalink

Quick Links

Privacy & Terms

Contact Us

Account Not Active