Reltio Connect

 View Only

Discover Reltio's AI/ML Powered FERN for Data Unification

By Chris Detzel posted 15 days ago

  
Discover Reltio's AI ML Powered FERN for Data Unification

Find the PPT for  Entity Resolution powered by FERN

Join Reltio's Principal Product Manager, Suchen Chodankar, for an exciting deep dive into Reltio's innovative AI/ML-powered Flexible Entity Resolution Network (FERN). In this Community Show, Suchen unveils how FERN leverages Large Language Models (LLMs) to revolutionize data unification and enhance the entity resolution process. 

Discover how FERN addresses the challenges of traditional rule-based entity resolution by automating high-accuracy, real-time matching with pre-trained models. Learn about FERN's ability to understand subtle differences in match pairs, reduce false positives, and minimize the need for manual data stewardship. Suchen demonstrates FERN's user-friendly interface, showcasing how it seamlessly integrates with Reltio's platform, allowing users to enable and configure the model effortlessly. 

He also discusses the future roadmap for FERN, including explainability features, support for additional entity types, and the continuous expansion of scenarios covered by the model. Don't miss this opportunity to explore the cutting-edge AI/ML technology that is set to transform data management and unlock the true potential of your data. Learn how FERN can help you achieve faster time-to-value, improve data stewardship productivity, and drive better business outcomes.

Find the Transcript here: 

Discover Reltio's AI/ML-Powered FERN for Data Unification

Chris: Thank you, everyone, for coming to another community show. This one's a little bit different and really exciting on discovering Reltio's AI ML Howard Fern. It's for data unification. Suchen Chodankar, he's our principal product manager here at Reltio.

Chris: He's been doing this for a long time. He's been on several shows for matching and things like that. Certainly excited to have him on again. So again the rules of this show keep yourself on mute all questions should be asked in chat Or feel free to take yourself off mute. There's a lot of content to cover today we are recording this and we'll post it to the community.

Chris: My goal is to try to have it out by today. If not we'll be traveling a little bit and we'll not be out until Tuesday. My goal is [00:08:00] today, but we'll see what we can do. Coming up, we have some shows. Today's show is on discovering Realto's AI ML powered fern for data unification, more kind of a business view.

Chris: And then on the 7th, we're talking about unlocking, which is next week. Entity resolution, which goes perfectly with this show a little bit more technical in nature, but I think you'll enjoy that and we'll have Venky coming on Reltio's customer 360 data product, powering AI driven. Data Unification for Enhanced CXed.

Chris: That is really exciting, a new product out that we're excited to share. And then on the 13th and 20th, we have some integration hub type shows that I just got in today. So I'm really excited about that, that you guys have been asking for shows like that. And there you go in June. And lastly, next week on May 9th, we have a life science industry event.

Chris: So if you're in New Jersey, it's a half day event. We have lots of really good speakers. So I'll put the [00:09:00] registration in the chat here shortly. I'm going to stop here and Suchen let you share.

Suchen: Sure. Thanks, Grace.

Suchen: All right. Okay. Thanks everyone for joining. Welcome. And yeah, lots of good exciting webinars coming up and that event. So don't miss that. Today we are going to talk about entity resolution powered by phone. I'll introduce what phone is. We'll talk about this new patent pending. Technology that we are going to introduce and we currently working on.

Suchen: So just a safe harbor here. Some of this content that you'll see here, the way it is working might change based on the feedback and other improvements that might come up during. During the early evaluation, which is what is going on right now for the phone. So just keep that in mind. So before we dive [00:10:00] into the entity resolution, I just wanted to give a sort of a bigger picture of how we are thinking about a I at Reltio, right?

Suchen: So as you can see here, there are multiple areas that we are leveraging. A I are currently using a I are in the process of basically integrating some of the AI components into these different modules. So augmented entity resolution, which is what we're going to talk about here. We will show and we'll talk about how phone based matching can improve the data productivity, data stewardship productivity by 10 times.

Suchen: Ingestion and data modeling. There are also some of the things that we talked about in the past, which is using a leveraging our velocity pack, which is a starter pack that helps you get started with the implementation of various domains that you might have. Governance and anomaly detection.

Suchen: This is another area where we are leveraging AI to basically understand the patterns in the data if any field or set of fields are basically deviating from the standard patterns [00:11:00] beta completeness or uniqueness, those kind of thing, right? So a great way to have a peek into how your data is doing overall and quickly act on things that might not be looking good from the data point of view.

Suchen: and we want to be like, model ready data or provide model ready data. What that means is basically have some of this prebuilt model ready to go and it just requires activation and you can start leveraging and use it to generate the result by clean data driven model ecosystem.

Suchen: Lot of good stuff happening here, a lot of exciting stuff happening here in the a AI space at Reltio. And today we are going to talk about augmented entity resolution. Let me go to the next. Okay. So what are the challenges with traditional way of entity resolution? So I just wanted to talk at very high level.

Suchen: I'm sure there are a lot of other challenges that we face with the traditional way, but these are really at the very high level, right? We often basically go with the rule based [00:12:00] configuration and we tune them in such a way that we do not generate a lot of false positive, which is basically set of records that are not supposed to be same are coming together and getting merged as the same record.

Suchen: So we want to reduce that number of false positive. So what end up happening is many organizations go with very strict rule. So that generates a lot of potential matches, which requires a lot of data steward data stewards to resolve them quickly so that your potential match queue goes strings quickly.

Suchen: And that's one of the challenges to scale up the data stewardship team. And you cannot just keep adding team members there just to reduce the the potential match queue. Instead, what we want to do is basically try to. Merge these records automatically and as many as possible, leaving very few records for data stewards, which are really difficult to resolve automatically using a programmatic way or using algorithm using a match algorithm.

Suchen: Right. The [00:13:00] second challenge we face and we have often hear this Hey, this seems very obvious match why we are not able to match. And I'll talk about some of this in my later slides, where to humans, it might seem obvious. But when it comes to coding that or configuring those into a rule based matching, it becomes very difficult to spot those differences.

Suchen: Subtle difference in the match pair could be very significant when it comes to comparing them if humans were to compare them. So that is another challenge that we are trying to address here. And the last one I want to talk about here is we all know this, right? Setting up match routes require a lot of time.

Suchen: You need to understand the business requirement. You need to translate those business requirement into underlying MDM platform that you're using. You need to understand the syntax. You need to understand how it is configured. And then you have to find the best way to configure it using using the matching capability or the algorithms that is provided.

Suchen: One of the things that I've [00:14:00] highlighted here is it's only fixed set of matching algorithm. If you have 10, 15 different comparison functions, then that is all that is you can use, right? It cannot go beyond that. If you have a use case, which requires you to match certain pair of record.

Suchen: Which cannot be match using any of the existing set of match algorithms. Then you're out of luck. You have to basically go invent a new matching algorithm or matching functions, add that to the platform. And it just you can leverage it and you have to keep adding those functions, right?

Suchen: That gets very tedious and there's a lot of waiting time on that. So we looked at this three broad level of challenges and we wanted to see how we can use AI ML to automate some of these things or most of these things. So using for, which is a Reltio patent pending LLM powered entity resolution that we believe is a solution to some of these challenges that I talked about.

Suchen: And I'll try to, as I go through the [00:15:00] highlights of this or features of I'll try to relate back to the challenges that I talked about. So I'll we'll talk about how phone automates the high accuracy realtime matching with pre-trained model. So, why we started on this journey to begin with, right?

Suchen: Based on the various survey, it is clear that 90% of the data management will be affected by ml. What this means is there are a lot of AI ml improvements and advancement that has happened. In last year or so, that 90 percent of the data management will be affected by AI ML is what the survey says.

Suchen: Model will become commodities enterprise AI ML success will depend on a solid data foundation. So we wanted to basically be part of this journey and we wanted to basically leverage as much AI which will basically make a positive impact. On the use cases that we are solving for.

Suchen: So what is the action plan? And we have heard about the A. I. M. and models [00:16:00] that, sometime it is, you ask the same question. It tends to basically give you a slightly different response every single time. So we wanted to build. These four pillars right into the foundation.

Suchen: So a reproducible, transparent, compliant, and performant. What that means is we wanted to build AI ML technology in our Reltio platform so that, it is predictable. Every time we ask those questions, every time we provide a response to a set of questions, it is a predictable, reproducible, right?

Suchen: Transparency is basically explaining. How I got to that result, right? Not just saying no, it's a match, go figure it out. So we wanted to build that explainability into the model. And that is what we are working on right now. Compliant. This is a big part of our AI ML strategy.

Suchen: We know that, data and data is very important to our customers. And just sharing the data outside is not simply acceptable. So we do not want to use data, customer data for [00:17:00] training or for advancing the ML model at all. So anything that we want to do will be done like, within the space or within the boundaries of Relto, and the data does not leave the tenant that customer has loaded data into.

Suchen: And of course, the performance, if you get the result six hours, seven hours, a couple of days after this, that's not acceptable. So we wanted to make this technology performance so that it meets the cost and other performance criteria. Yeah. All right. Let's talk about fun. So what is fun?

Suchen: So fun stands for flexible entity resolution network. It's a brand new functionality feature that we have been working on for quite some time now since the introduction of LLM, we we looked at the LLM and we thought, okay, so there is a lot of interesting stuff happening in a large language model space where, and every one of you might have had a interaction with at least one of those LLM models where you ask questions and it tells you like, whether.[00:18:00]

Suchen: Whatever question you ask, right? We thought we can use this for matching purpose. And that's what we have done here. So if you ask LLM, yeah, a couple of examples I've listed here, let's start from the bottom there. The one which on the left here, that's Alexander Ivanov written in Russian language.

Suchen: And what on the right is basically possible outcome or possible basically match Possibility of the string being the same record. So as you can see, on the right here. The chances or the possibility of this name being Alexandra Jackson is very low, which is what LLM tells us, right?

Suchen: Which is why it is indicated here in red. As you move through this spectrum of the various records, you can see it is basically getting to the Alexander Ivanov. And that's where it says that's what it thinks. So all of this knowledge is pre built or in built into LLM. And that is what we wanted to take advantage of.

Suchen: Now, I said I'll talk about the challenges and how it relates to this one. So now if I [00:19:00] go back to those functions that I was talking about, fixed set of algorithms and everything. Now, if I were to do this, I have to basically come up with some sort of, transliteration to, to match this data set.

Suchen: Okay. And I will be okay with that, but the development of this function and then for customer to understand that new function and integrate or leverage that function in their match rules could take a lot of time and basically, that could be a problem, right? It does not solve the problem. It solves a half the problem, which is basically giving you a function that will do this.

Suchen: But you have to basically take that, understand that integrate with your match rule and test it out and all those kinds of things. So leveraging LLM gives us the ability to basically just expand those functions without having to basically write it, create these functions for every single scenario.

Suchen: Moving up a little bit here. This is. Yukio Yamamoto written in Japanese, but slightly different in a different way. It is still able to figure out that it's a [00:20:00] Yukio Yamamoto, right? So those kinds of basically inbuilt comparison which it has learned from, large set of training data set is something that we can leverage from.

Suchen: And the one on the top is a pretty hard one, actually. If you want to do this using any fixed set of comparators or fixed set of functions, it will be very difficult, right? You'll have to basically come up with a lot of pre processing of this data to figure out that Five by eight inch by three inch is similar to five by eight by, steel and bolt and screws is seen as similar in the real world.

Suchen: So these are some of the advantages of LLM that we get to tap into. So let's talk about firm. So what does firm do here? It uses all these LLM models, which is which is very good at basically comparing different attribution for name, for identifier, for date of birth, for addresses, and firm basically sits on top of it and [00:21:00] ask this questions about, is this two record same or similar? And based on the score that we get. From this LLM model, then for basically top it builds or understand the overall scenario, what it means from the bigger picture, like looking at that complete profile, the name, the address, email, and all those other identifiers that are part of the model.

Suchen: And it, it comes up with the similarity score based on all those things. So it is leveraging this underlying fixed LLM models. To do the core comparison of various strings, which are basically based on the attribute. And it has the semantic segmentation built in. What it means is if I say there are two strings Claire at abc.

Suchen: com and clarity IO. com. It understands that those are email addresses and that should be compared differently than Clare at abc. com and Clare at tio. com as a name [00:22:00] could mean something else. So all of those are built into the LLM and Fern basically takes advantage of that. But fun basically goes one step further and it understands the overall picture, which is basically the complete profile, not just by those individual model.

Suchen: And I want to call out here that, so known rules required here, because this is a prebuilt model that understands on all the scenarios, what it means when the first name, middle name, last name and address the same and phone number, email and everything else is different. Or what it means if the first name middle name last name is saved and phone and email is missing That could basically generate a different kind of scoring which is what you want, right?

Suchen: So there are no rules required to be configured here. Er can finally be rule free zero short learning This is what I was talking about in the earlier approach that we have taken and some of the things that you have Come across is We had to basically train the model using data set to [00:23:00] make model, understand what is same and what is similar to any customer, right?

Suchen: So we don't have to do that anymore. Because all this LLM models basically come with all this training in built. So there is a no, no training required. So we don't have to use any customer data. For training this model and every time you make a call to phone does not basically share this data with any of the LLM model.

Suchen: This LLM models that we are using are downloaded, installed within Reltio boundaries. They don't get enhanced based on the request that we are making. They don't get enhanced based on the scenarios that you see. So everything is fixed here and locked in, right? So nothing gets out of a Reltio space.

Suchen: And of course, data security, this is what I was talking about, which is making sure that we are just making calls and asking questions, but not enhancing the underlying model based on the data that you sent. And explainability is built in. This is something which [00:24:00] basically, you know if I ask phone, Hey, is this a pair of record same it tells you.

Suchen: That it is same or it is similar and it has same first name, but slightly different address and all of that. So that's the explainability that we have built in. And this is going to get much better over time where we want to explain in more natural language what it means to be a same or similar record.

Suchen: It is available now for early access for individual model which means if you are interested in trying this one please reach out to us and we can basically enable it on your tenant and we can try it out on the data set. Given this, all this improvements that we have done at the new functionality of we believe that we can go live in days rather than months.

Suchen: And the way we can achieve that is because it's prebuilt model already understands most of the scenarios, or at least most of the common scenarios that we have seen across customers. So most of your I would say 70, [00:25:00] 80 percent of your data can already be handled using some of the scenarios.

Suchen: Yes, there will be like, fuse very specific scenario where you have say identifier, which has certain meaning phone won't be able to understand that just yet. But the idea here is to get you started as quickly as possible without having a lot of rule configuration without setting up the match processes for weeks and months and all.

Suchen: So this will also help data steward focus on more critical business functions. If you reduce the number of false positive and take care of those automatic or take care of those obvious matches automatically and improving consolidation rate means improving, basically, getting value.

Suchen: out of the platform, right? Because that's the whole purpose of doing this one. If I load 100 billion records and if I leave no 20 million records for you to resolve, that's not good, right? So we want to deduplicate as many records as possible that we can confidently consolidate. How does it look?

Suchen: So as you can see here, [00:26:00] there's a snapshot of it and I'll, I have a live demo, which I'll get into in a minute. But what you're looking at here is I've set the phone to generate potential matches only. I didn't want it to basically auto merge anything because I wanted to demonstrate what kind of match pairs it is handling today.

Suchen: And we have a live demo. I'll get into that and I'll explain this screen a little better in more detail. So what this is showing is Michael Lieberman record at that address, having all this 14 or 11 potential matches. And you can see here the score for various matches based on how well the data is populated.

Suchen: And I'll talk about this one in a minute. I'll cover this part and then probably jump, see if I can answer any questions, if there are any questions. Yeah. So why LLM, right? So I was mentioning earlier about a fixed set of algorithms that we have. If you have a double metaphor or fuzzy matching or synonym matching, [00:27:00] it works based on only fixed set of rules or fixed set of kind of instructions that is coded into those functions.

Suchen: Now, this is something that I asked at GPT, which is also a large language model, a train on the large language model. Look at the first one. I asked is Bob similar as, Robert? And you can see there, it says that yes, Bob is a common nickname. So it understands by default. That Bob and Robert is a common nickname used by in, in different data set, but it doesn't stop there, right?

Suchen: It's not only about one set of synonym match pair. You can look at the William, Richard, **** then, there are other James and other. So all of those are built into it. So we don't have to rely on basically fixed set of values that we store. And based on that, we do the comparison, right?

Suchen: So that's one of the advantage. I want to move to the next attribute. Here I asked about claret, gmail. com and claret toy. com, right here. It basically [00:28:00] recognizes that it might belong to the same person because the username is saying. At the same time, it does realize that, this is a very a common case where people use different kinds of names, different kinds of usernames and different domains have multiple emails.

Suchen: Which is why it is not 100 percent sure it's the same person, right? But we get to understand that, this email Could be similar and then it basically returns code based on that. These are the responses from chat GPT. The model that we are using, it's basically more tuned to to return scores based on the scenarios.

Suchen: But I just wanted to make a point here that why LLM here, right? Look at this phone number. The phone number is exactly same. If you look at this phone numbers, you might think that it's a transpositional error, five, one, three, and three, one, five. The rest of the name, uh, numbers are similar, but it recognizes that this number are same, but the area code, right?

Suchen: So you can see that the area code is different, which is what it is [00:29:00] indicating. That's could be a very important information for us to take into consideration while matching and merging. We can look at the two records and we can say Rob and Bob having the kind of, this kind of phone number.

Suchen: It could be a completely different person because the area code and then I want to look at the address and see if they're really in different places, right? So those kind of calculation is what phone does on top of LLM. It can look at the name, it can look at the phone number, the output outcome of the phone number comparison, and it can look at the address now and say, you know what, it's not transposition of the error because the addresses are different.

Suchen: Okay. Or it could say it's a transpositional error because the addresses are the same. So that scenarios are built into phone, right? And it understands those things. These are very interesting name, Narayan and Narayani, right? It's a one character of if I do this using any fixed comparator, you might think it's a very close name.

Suchen: But this [00:30:00] here it tells you that, it's a different gender or it could be basically typically represent different gender. That's what it says. So this is a very important information for me to understand before I do match and merge. If everything is same first name, sorry, the last name, the address, the phone number, the email, everything same and the name is slightly off.

Suchen: You might think it's the same person. But this could tell us that, it this is a very typical different gender names that we want to take into account and penalize that whole match pair or downgrade the score so that it stays there in the potential match, but not automatically merge.

Suchen: The last one is also interesting. You can see here, this is a SSN that I asked 6, 3, 3, 5, 6, 2, 7, 4, 9, which is exactly same as the other one, except the first digit is slightly off. And it immediately recognizes that because it's a SSN, this is also very possible. That, one single digit off could be seen as very [00:31:00] different SSN number.

Suchen: So I wanted to just highlight what is happening underneath, why it is so important to take advantage of LLM kind of technology which can be very useful, beneficial for whole matching process. And this is just the first version that we have started.

Suchen: We are very excited to continue to work on it and the prospect and the opportunity that we have to provide an accurate and much better outcome when it comes to matching. Of course, for the simpler matching, exact matching, we want to leverage rule based matching as much as possible because that's the best way to do that, right?

Suchen: You don't want to kind of overcomplicate things or basically come up with some logic that might reduce your consolidation rate for the obvious match pairs using, say, your identifier or using your deterministic rule. So these are some of the examples that can really help data steward narrow down.

Suchen: The potential matches that have a very good chance of [00:32:00] getting resolved either matched as a merge record or is not a match very quickly. And this is something that we are, we believe that, will help data stewards to improve the productivity by quickly resolving them as fast as possible.

Suchen: All right, I'll take a pause here before the demo and see if there are any questions, Chris.

Chris: There always is. Yes. Will Fern replace the ML base matching we had previously, or do you see them coexisting with both providing some value?

Suchen: Yeah, that's a good question. And that's one of the things that I wanted to clarify in this call.

Suchen: You see here it says match IQ. So the match IQ ML based matching that we have provided in the past. It's a customer trained model. We provided the technology or we provided a framework that allowed you to basically train this model using your own data set. And that was a supervised training style or active learning style of model [00:33:00] creation.

Suchen: Which is again, the customer owns that model. We don't share that model with any other customer. It stays within your customer tenant, but again, it has the limitation, right? If you train to create a model based on your data set, it can operate only on based on what you're trained, which is why we believe LLM is much better fit for that.

Suchen: And so match IQ we are leaning towards using LLM for even customer trained models. So I would not say we are replacing it at this point of time. We are yet to make that decision. But I would say LLM is looking more and more better suit for matching use cases. MatchIQ is what customer created or what customer created model is for.

Suchen: And Fern is basically what we provide using LLM. It's a pre trained model that Relgeo has created. And in the future, We want to allow customers to create their own firm models, but we are not there yet at this [00:34:00] point of time. It's firm provided by LTO only.

Chris: What is the process by which the model learns ongoing, so if it does not use the underlying data sets and scenarios?

Suchen: Yeah. So right now it's a centralized model, which means all the scenarios that you see, you can see right here. Maybe it's a good time to just jump in a demo or screen and we can talk about that. Okay. So this is a screenshot that I took and just put that on the presentation there. But here we can interact and probably see a little more information.

Suchen: So you can see here it says 100 percent match for Michael Lieberman. That's pretty good. Because it has a first name, last name, and middle name exactly same, address is exactly same, phone number exists and the email address does not exist, meaning it does not have any conflicting email address, also it does not have a conflicting name suffix.

Suchen: So all these scenarios are built [00:35:00] in. So yeah, it does not learn from your scenarios just yet. So right now, all the scenarios. Are what we have trained the phone model on. So it will do only what we have trained it to do using that centralized approach at this point of time. Using your data.

Suchen: So we are not using your data set. So it does not learn that for you email is say most important if it is missing, then it should downgrade the score, to say 50 percent or whatever. It won't do that. Because it's a fixed model, we are not enhancing that on the fly. We are not enhancing that dynamically.

Suchen: But what we are planning to do here is based on the feedback, if we see there is a pattern where almost all customers think that phone or email should be very important component and if it is missing that should penalize the score, then we basically take that input and feed into the centralized model that we have, create another version for you to try.

Suchen: And that's the approach that we want to take. Like I said [00:36:00] customer specific fund model is something that we want to get to eventually where it will learn, but at the same time, say, in the space of that customer tenants boundaries. So that's not what is happening here.

Suchen: This is a centralized model trained on hundreds of scenarios different permutation and combination address, missing phone, missing address and phone, missing address, email, and phone missing, or some, all those scenarios. So it is learned based on those scenarios and we can tune them.

Suchen: So if you see something score here, and if you say, no, this should not be 87, this should be 67. That's fine. Because that's the kind of feedback that we are looking for to tune this model. And then release right now it is in early access. When we get to the GA, that's what it will have fine tuned model that most of you would agree with.

Chris: Why don't we go ahead and do the demo? Because there's tons of questions. So let's do that, and then we'll get to the questions at the end. Okay.

Suchen: All right. So the way it works is you go to the [00:37:00] data modeler,

Suchen: And like I said, individual is what we have a model for right now. So when we, when you go to the match tab on your individual and this is early access. So you may not see if you're trying this right now on your tenant. You may not see this part here, which is match IQ. It's really fun model. Like I said, we are transitioning from match IQ to phone.

Suchen: So it was a fun models. You may not see this part in your tenant. If you're on. If you're not part of the early access program. So this is available only to the early adopters at this point of time. So this is what you will see if there is a model for the entity type that you're managing and the label doesn't matter.

Suchen: You can call it contact. You Can call it broker agent or whatever, as long as it is individual, it will understand that, that there is a model for it and you can click on it and it shows you the list of all the [00:38:00] attributes that we have considered in this In this model, meaning it is trained to handle all this attribute in different settings or different scenarios where suffix is missing.

Suchen: Suffix exists, but they are different name is email exists or both are missing or one of them is populated. The other one is missing and things like that. That's what it means, the list of attributes and it cannot match based on any other attribute that is not listed here. So that's what it shows you here.

Suchen: The list of all the attributes that we have considered in this model. Now, when you go to enable this one, initially it will be set to like disabled because you get to decide when you want to put this in production or for testing or whatever making it available is just the first step.

Suchen: Basically, just letting you know that there is a model available, but it's your choice to activate it. So initially it will be in this state, disable, you can come and say, I want to enable this one and you get to [00:39:00] choose what you want to enable it for matching tenant record, which is matching the record that you've loaded in tenant with each other.

Suchen: Or do you want to basically enable it for matching with external records as well. So that's what this internal and external you can enable for both or one of them based on what you want to do. There's another setting that you get to apply here. This is very tenant specific configuration, which is why you have this controls the way firm.

Suchen: It is going to be very same regardless of this one. The only difference is when you set this flag to like match by OV value, you are now sending one survived name or multiple names that exist, which may not be OV value. That's what this controls. And this can play a very important part in the match outcome.

Suchen: If you're just sending John and Jack. And it is saying, Hey, is it a match? It's going to say, no, it's not a match, but let's say Jack has a non surviving value, John, and you uncheck this one, which says, [00:40:00] send all values. So now Jack is compared with John, but Jack is also compared to the Jack, which is another server, a non surviving value.

Suchen: So that's what it does. And as you can see here in, in my in this tenant, I've set everything to potential match because I wanted to just demonstrate, Before they actually go merge, I wanted to generate potential matches, even for the perfect score. So I said, generate this potential matches for a range between 80 percent and a hundred percent or 70 to 80 percent also potential matches.

Suchen: You have an option to automatically merge this, but of course you want to test it out in your lower environment. Make sure that every pair that is generated above 80 percent is something that you agree for automatic merge. Only then you would basically go and change this one, save it, and then you will see that phone will automatically merge all those match pairs, which are 80 and higher.

Suchen: You can move this around. There's a default setting that [00:41:00] we ship the model with, but you always have a choice. You can say for my data set. 80 and higher does not work. There are some potential matches that I didn't like, but 90 and above is something, which was perfect. Everything I agreed with.

Suchen: So in that case, you can just move the slider and it's just what you want to basically do those actions, you can save it. And that is all it takes. You can see there is no match rules here. Customer provided match rules is just phone working by itself. And of course it can work in parallel with with match rule as well.

Suchen: Yep. It works completely independent of rule based matching, which means if you load two records. Both and there's a one potential comparison pair that you can do between those two records. That pair is sent to both rule based engine and the phone based engine at the same time. And they both send the outcome from their respective engine and both are stored in the centralized location, which is what in the potential match.[00:42:00]

Suchen: So they don't talk to each other. They basically just work independently at this point of time. Yeah. So this will allow you to bring your own rules at the same time phone can do its own thing. And you can say, let me basically take care of the match and merge and phone can provide some guidance on this pairs that I'm generating.

Suchen: That's a very common or popular setting at least initially when customer wants to see what phone can do, they just set it up to potential match and. Look at the phone score for the already generated potential matches. Yeah, so as you can see the activation process, we have tried to simplify as as much as possible.

Suchen: Now, there's one other thing that we heard from the customer is the list of attitude that we saw earlier, like this one, for example. What if model says the prefix should be a sub attribute or something like that, right? What if it is spelled differently? Should I basically not use forms? For this, you have this mapping screen, which allows you to [00:43:00] say prefix is, or name prefix, Slash prefix, which is a model, how model is designed.

Suchen: You can say that's prefix in my word. And you can say here if it is spelled differently, you can just tell model what that attribute is in your local model. So it allows you to map those attributes from your tenant to how model is configured, right? So this is another good example in, in, in the phone model.

Suchen: We have model zip4 and zip5 as a subattribute of postal code. But in my local tenant, it is a subattribute of zip. Now this allows us to basically map it and tell model that postal slash zip4 and zip slash zip4 is the same thing. That's what it does here. So you, even if you your model is slightly different or it's spelled differently, then you can still use this by doing that mapping.

Suchen: Like I said, the activation part was very important. We didn't want to put any burden on our customers to go through the very [00:44:00] complex process of setting up the phone based matching or ML based matching. So everything happens in the background and you just basically decide whether you want to try it out, enable it and, manage the score that you like.

Suchen: All so let's look at one example here. So I have a lot of records in here, which I've run the phone based matching on. And now you can do is you can say, show me all the match pairs that is generated by phone. So you can see there are like 4000 of them. Records that has at least one potential match which is detected by phone.

Suchen: So these are out of 39, 279 records. So a lot of these records also have singleton, meaning they're not supposed to have any matches. It tells me that this is what it is. Now you can further put another filter, say relevance code, and you can say, show me all the matches generated by phone, which is.

Suchen: Say [00:45:00] 92 or higher, whatever, right? And it will tell you this. Now I can open this record or a potential match and you can see it's a,

Suchen: yeah, this is a spell slightly different. The address you can see. Hazel township. This one does not have township in it, but it was. It was basically predicting that this is 98 percent match, maybe because phone number is missing here. Email is missing here. It's still not conflicting email or phone number.

Suchen: And and the address is slightly different. So it just penalized it a little bit like, 98, which is still close to perfect there. So that's how you basically find the phone matches. Through this match IQ filter at this point of time which it says for a model for individual, and you can see the count is updated here because we have this many out of 4, 000 that we saw initially. 3000 of them had nine or have 92. 4 and higher [00:46:00] score. Now let's look at one example that I was playing with and actually different scenarios. I think it was, there's a crosswalk that I created for it so that I can spot it.

Suchen: Let me show you the set of records here and I can add any number of scenarios here, just upload them and see what's fun does. That's what I've been doing, actually, as part of the evaluation, it just play around with it. You can see Michael Christopher Lieberman with this address, and then what happens if I.

Suchen: Keep it same, but just drop the suffix name. Keep it same, but then drop the email. Keep it same, but drop the phone number and so on. So just trying with multiple combination and permutation to see what it does. And finally, I have this one record, which is I want this score to be very low because here I'm clearly saying this is junior and this is senior.

Suchen: So that's what I was playing with. And let's look at the result here. And if time [00:47:00] permits, we can just modify this. And try to add more scenarios. So as you can see here, this is a potential match screen. I can sorted by relevance code, and I can say, show me from the best to. The lowest score here.

Suchen: I'm going to pick all of them so that we can see them side by side. Yeah. All right. So this is a screenshot that I showed earlier, but let's deep, a little deeper into this scenarios now. So for the first four or five records, it generated a hundred percent match. And the variation you can see here, there is no conflicting email.

Suchen: There is no conflicting phone number, but they're missing. And in this case, both are missing still not conflicting phone numbers. So it has a hundred percent score, which is basically automatic merge category. And you can see the variation in the name as well. This one says Mike. This one is Michael here.

Suchen: There is variation in the phone number. So it's still generated [00:48:00] 100%. Now, if you look at this one here, it dropped the score by slightly because the name is, it says Michael spelled, differently. And if I keep going down there, like here is another example. Yeah, here is another example, which is in different language.

Suchen: This is in Chinese, which is Mike, actually not Michael. And you can see the score drop to 87 percent because it is still different. It's not a different name, but it's just in a different language, it just and it's not Michael, it's Mike. So there are, there is a penalty that they put here, even though everything else is same.

Suchen: Here it dropped further when I just said Cal instead of Michael, I just said Cal and it dropped it further. If I keep going down, you will see the score dropped here for 83 percent because it's a senior and it dropped further for Mitch which is again, like I said, these are the scenarios that we are currently [00:49:00] testing and tuning.

Suchen: And I want to show one other thing.

Suchen: So you can see here this is basically Michaela and Michael, right? So when you look at this potential matches, you would wonder why I'm seeing Michaela with Michael here. That's because I'm sending Michael also as one of the values to compare. So it is basically considering Michaela and Michael.

Suchen: In my in my match comparison, if you do not want that, and if you want to use only Michaela as a surviving name, and that's what the setting I showed earlier that you want to tune, you can say send only OV value. Okay, I'll take a pause here. We have just 10 minutes. See if there are any questions.

Chris: Yeah, there's some questions. Thanks to Jordan hashtag Rob for answering some of these questions in the chat. But do you still have other stuff to cover? Because we can actually get these questions answered on the community [00:50:00] and things like that before we. Sure.

Suchen: Yeah, I was just I was thinking like, we'll keep it interactive and see if there is any scenario with this data set that you want to try that I can just.

Suchen: I just had one another slide that I wanted to cover and we can get to that part maybe. So let me get rid of this.

Chris: Yeah.

Suchen: Yeah. All right. Let me. Remove this.

Suchen: Okay, that's both Michael. So that's fine. Okay. So I just had another thing which I wanted to basically talk about what's next for phone. So as I mentioned currently phone is fun for individual model is in early access. So we are working with few customers to test it out, verify, tune the scenarios, the scoring around it and all that stuff.

Suchen: The next thing for fun is to make that model available for for [00:51:00] everyone, for general availability. And then right now the explainability, like I said at the beginning of the call, it is in, built into the model. It is just not visible on the on the screen yet. So that is what we are going to do.

Suchen: That's the explainability will be provided on the screen. It is available on APIs at this point of time. So you get to see the score of individual attribute. How phone basically return the individual outcome of the attributes and at the pair level but explain ability on the screen so that users can see what it did, how it did and why it thinks it's a match.

Suchen: All of that will be added in the near future. Additional feature such as identifiers, date of birth, gender some of these are already in in development. They are very close to being part of the form, but they will be added as part of the next version as well. I showed you an example of SSN and phone number and how it can understand all those things, right?

Suchen: But there are Some identifiers, which are very specific to certain domains like [00:52:00] life sciences, if you are from life sciences, NPI, DA number, AOA number, and other numbers they play a very important part in the matching and merging process. And LLM is not basically very well aware of those identifiers as much as it is aware of SSN, for example, a phone or email.

Suchen: These are not very common numbers that it is trained on. So that is what we are going to add which is domain specific identifier. To the l. M. So I've listed in N. P. I. N. D. A. Here for life sciences. Then there are s. N. I. T. E. I. N. for other domains on that we support or that we have velocity back for the next one for phone here is we are also going to work on.

Suchen: We are working on actually organization data set. This is also in, in in development right now. As soon as we wrap up individual we'll take on this organization and release it for early access for you to try. Location is the next one after organization, because we saw a lot of customers [00:53:00] looking to master location, site information, and that kind of stuff, so we are taking on the location and site kind of data set.

Suchen: I use using phone. And, um, the second last bullet point here, this is really the interesting part expanding scenarios covered by phone. What I mean by that is if I look at this data set here and I can. We can move data around, drop something, change something.

Suchen: Here you can see there is one address. I think this one right here, it is 186 as compared to 185. Maybe I'll make it a little bigger here. Okay. This one is 86 as compared to 185. So all of this we have built, but we want to see what other scenarios that we can add so that it can generate better score score that you agree with or the category of the outcome that one will generate, like automatic merge versus potential match or or something else, right?

Suchen: So that's what this second bullet or second last bullet point [00:54:00] was about, and this will be ongoing effort. So right now if you enable Fern and if you agree with say 50 percent of the outcome that is generated by Fern and 50 percent is something that is completely the score is off, that doesn't mean that's a that's the end of it, right?

Suchen: Because we get to feed the scenarios into Fern and say, Not this or not this code for this scenario. It is, it should be this code. So we get to basically train it based on this different scenarios. And that is what we are doing right now with our early adopters to see what kind of variation, what kind of deviation is acceptable when it comes to like address matching or name matching or for phone number or combination of those things.

Suchen: That's what I wanted to just talk about. I hope you all are excited to know about phone as I am is a really cool technology. I get to play with it every single day. This is all, this is what I do every single day. [00:55:00] Just put all the scenarios. If something doesn't work or something doesn't work looks right.

Suchen: Kishore, Rob, Jordan, all these awesome guys basically say, yeah, not a problem. I'll do their magic. And voila, suddenly I have a very good score that aligns with my expectation at the same time, not disrupt something that was already working for me. So it's the, we have the framework that we are built and now all we are doing is just Tune it to, to align with the outcome that we have seen across our multiple customer, and like I said, at the beginning of the call, this is really exciting and then the prospect and the opportunity we have here to do something really awesome. Just look, not any example just blow my mind. I can never do that without disrupting the other valid matches, which might be one digit, one character off, but it's the same record, right?

Suchen: Transpositional and all of that stuff. So those kinds of thing is what I'm excited about. And I'm really looking forward [00:56:00] to extending this one with my team here and we'll see a better version with every passing day.

Chris: This is really great. There's literally. 25 questions.

Chris: Maybe half of them have been answered. What I recommend is that unfortunately we're not gonna be able to get to all these questions. So Rob Sylvester, he's on the call today, but he's doing a show next week, going a little bit deeper in this and we'll have some answers to some of these questions.

Chris: But additionally, my thinking is we go, we might get the two of you guys on the phone just to do and ask me anything. I'll put all the questions together and we'll do it live like we usually do because there's just tons and tons of really great questions that folks have.

Chris: Is that fair?

Suchen: Yeah, absolutely. And as you can see, I'm not stopping. I'm going to keep creating these scenarios for you guys and any help that you can provide here reach out to us. If you have some scenarios that you want us to try or just no one [00:57:00] to try the phone model, reach out to us and we'll be happy to work with you.

Suchen: And this will get, this is getting better every single day. Like I said one month ago when I started playing with it it was like, say 120 scenarios. So now. It's no, the 700 scenarios or whatever it is. So that's the fun part. And a quick

Chris: question that I think is going to be important here before everybody starts leaving is this available today?

Suchen: Yes, it is available for early access for individual model. Which means, all these attributes that I showed here, this is something that is covered and a little more, but I was just playing with this one. There's a gender, there's a date of birth, identifier is something that we are adding, but this is a data set and we can extend that, right?

Suchen: That's the power of Fern. We can extend those very quickly because we have those underlying models that we can work with, right? And it. Just really exciting stuff too. So yeah, it is available. Reach out to us if you want to try it. We can enable it on your tenant and just run it to see, what it does.

Suchen: And if you see some scenarios that is [00:58:00] not handled correctly, we want to know about it and we can talk to you about that and see what is the expected outcome versus what you're seeing. We can tune it based on that.

Chris: Great. All right, this is being recorded and everything else please take our please review us at the end of this.

Chris: I accidentally put the wrong title on the review, so when you leave it's going to say some other thing. I couldn't change that, but I apologize. But your feedback is certainly a gift and very appreciated. I use all of this stuff. To get our shows. So this one's a really exciting one. Stay tuned to next week If you haven't, signed up for that one, please rsvp For that show.

Chris: Thank you everyone for coming suchan. Wow, great stuff Really innovative exciting and I know we're and I think our customers are super pumped. So really exciting stuff Thank you. Thanks,

Suchen: Chris, for arranging this. And thanks, everyone, for supporting answering the questions.

Chris: All right. And you will [00:59:00] get the slides, Francis.

Chris: Yeah. Thank you. Wow. I couldn't even get to half the questions.

Suchen: Yeah, let's let's capture all the questions, Chris, and we'll try to answer them and post it on community so that everybody can see the questions either response to it and have a discussion. It may not be. Thank you.

Suchen: The right outcome that they expect, we want to have that discussion and community is the best way place for us to have that dialogue. I think

Chris: completely agree. All right. Thanks everyone. Take care until next time. Bye bye. Thank you

Suchen: guys. Thank

Chris: you all. Bye bye.

Suchen: Hey Marcy.


#CommunityWebinar
#Featured

0 comments
24 views

Permalink