Reltio Connect

 View Only

Deep Dive into Survivorship Strategies and Introduction to Fallback Strategies - Webinar

By Chris Detzel posted 02-01-2022 08:28

  



Welcome to the Reltio Community Show! Join Dmitry Blinov, Senior Product Manager, Tatyana Selezneva, QA Engineering, Yakov Goyhman, Engineering Manager and Chris Detzel, Director of Customer Community and Engagement for another show on the Reltio Community.

Find the PPT here: Deep Dive into Survivorship Strategies and Introduction to Fallback Strategy PPT

In this series, we will go into a detailed review of crosswalk-based and value-based survivorship strategies, sources for operational values and comparison attribute URI with examples and best practice. Going one level deeper - fallback strategies. For all your Reltio MDM questions, check out the Reltio Community:
https://community.reltio.com/home

Transcript: 

Chris Detzel (00:00:05):

My name is Chris Detzel. I am the director of customer, community and engagement. Welcome to another LTO community show, a deep dive into survivorship strategies and introduction to fallback strategy, which I'm very interested in because I don't know a lot about fallback strategies. Dmitry Blinov, he's a senior product manager, will be our speaker today and Tatiana and Yakov, they have been a big piece of really helping out today. I think Yakov is going to do some of the really cool demo stuff. So we have some demos for you today, and I love demos. But you guys know how demos and go sometimes, so be patient with us. So keep yourself on mute and as usual, ask questions in the chat. I'll make sure as usual that they get answered. We will record the community show and I can a link to all the recorded community shows that we've had in the past.

 

Chris Detzel (00:01:07):

So here's our upcoming schedule of the episodes coming up. We do have one on survivorship, that's today. So welcome today. And then we have a deep dive into advanced survivorships fallback strategies coming next week. And the week after, we will do one on making workflows work for you. And then we have a couple more over the next several weeks. So we're excited about those. And then, one of the thoughts was the other day, a recommendation, was something on cleansers. I got to email today or slack message that we're going to go ahead and do one on cleansers as well, just haven't had a chance to schedule that yet. Kim Toomee will be helping us out with that so I'm excited about that. Super cool.

 

Chris Detzel (00:01:55):

Also something super cool is our swag. So if you haven't been on the community and you haven't posted a question or answered a question, please do because Friday will be the last day. So please go in, post a question, answer question. You can take a look at that post and see the rules, but I think 19 people have already participated in that. And so there'll be some swag and gift cards and things like that, given out to some folks participating on the community. So without further ado, I'm going to stop sharing my screen and I'm going to hand it over to Dmitry.

 

Dmitry Blinov (00:02:38):

Thank you, Chris. Good morning, everyone. [inaudible 00:02:42] Today we will be doing this webinar together with Yakov. I already introduced because he couldn't participate in the first webinar. Yakov is a manager for our core engineering team, which develops the core of the MGM platform. So he holds the most subject expertise on the subject, on this matter. So deep dive into survivorship strategies and introduction to fallback strategy. Let's go to the next slide. This is the second of the series of webinars where we are going through the very basics and to deep dive into the topic of entity resolution and things like survivorship rules, survivorship strategies, operational values for attribute values and so on and so forth, overall mechanism of entity resolution, how it works. So the goal of today's session is to do a technical deep dive in the demo for each of survivorship strategies. We touched on them high level on the previous session. Today will be a deep dive and we'll try to distribute the timeline to focus more on the demos.

 

Dmitry Blinov (00:04:05):

We'll introduce into the concept of fallback strategies as well and we'll do a deep dive into them into the next session, but it'll be a good live demo of survivorship rules and fallback strategies from a real life example. And we'll do a demo, as I mentioned, for business use cases, all the knowledge we gathered from the two webinars, we need all of that to understand this demo and how this works. Will be really cool. Let's go to the next slide. I'm going to hold my microphone to make sure I'm audible. That's the agenda for today. As I already mentioned, we'll talk about sources for a V parameter because it is important to understand the subject. We'll do introduction into fallback strategies. We'll deep dive into all these survivorship strategies and some of them are data based or value based and some of them are data source based or crossword based. And we'll do a demo of a complex use case in the end of the session. Let's go to the next slide. Real quick and a refresh into why we are doing this and why this is important and such an important topic.

 

Dmitry Blinov (00:05:25):

As you can see, entity resolution is in the very core of the multi domain MDM. Next slide. This is a slide from the previous session. I'll just repeat all this. Entity resolution happens on the attribute level where each attribute value loses or wins or survive in other words. And survive value is called operational value. This is the one that you see in the profile view, for example, and this is the one that is used for all the operational value processes and operations like OB only search as an example of saturation. How it happens is you have to cross two crosswalks, A and B. You have values associated with each of the crosswalks for the first name attribute or last name attribute and after entity resolution, get an resolved object and one value for the first name attribute will survive from crosswalk B and value for the last name attribute to survive from crosswalk A and you'll get your final profile this way with [inaudible 00:06:27] here. Next slide.

 

Ganesh (00:06:31):

Hey, this is Ganesh from IQV here. Is it okay to ask questions right away in between or is it just, you're going to have a Q and A session later? How do you guys want to do?

 

Chris Detzel (00:06:40):

Hey Ganesh. So you can ask you so you can ask your question, but generally we just push them in the chat and I'll facilitate that conversation, but feel free to ask.

 

Ganesh (00:06:50):

Yeah, absolutely. No problem. The previous slide, exactly. This is source based, right? So you are saying that hey, I have a preferred source for each attribute and the winner will come from that source, correct? That's [crosstalk 00:07:04]

 

Dmitry Blinov (00:07:03):

It's yeah, it's a good question, Ash. It's a high level example. So there is a survivorship strategy, which that's exactly the topic of today's session. There is a survivorship strategy behind the... There was an blue arrow in the middle saying resolution. So it got resolved somehow. It got resolved based on survivorship strategies. And operational values were calculated by operational value calculator, which we reviewed in details in the first webinar.

 

Ganesh (00:07:38):

Yeah. I'm very familiar Reltio so don't... I got it.

 

Dmitry Blinov (00:07:41):

No yeah, I know. So that example is genetic example.

 

Ganesh (00:07:45):

Okay. Got it. How-

 

Dmitry Blinov (00:07:47):

[crosstalk 00:07:47] resolution may happen, we'll do a deep right now, yes. Thank you.

 

Ganesh (00:07:50):

Yeah. Thank you.

 

Dmitry Blinov (00:07:53):

Review of sources for the parameter is just an important parameter. Let's go back. Thank you. That is used as part of the survivorship strategy configuration so it is important to understand. It's an optional field and it contains a list of sources that can take part in every calculation. So in addition to priority lists and everything, you can define the sources for OV, as you can see in the example on the right side. If you do that, only the sources listed in this block will participate in this specific survivorship strategy. Survivorship rule calculation, I should say. Not survivorship. Strategy survivorship rule calculation. So you can narrow down the sources that will participate in the survivorship rule calculation with this parameter. Well, [inaudible 00:08:49] can be specified on different levels of your configuration. Again, it is optional and just two things. First of all, you should know about it, second, you should know how it is used because it will be part of the examples we'll present today.

 

Dmitry Blinov (00:09:04):

Let's go to the next slide. Reduction into fallback strategies. Public strategy is another part of the survivorship configuration and overall data model configuration. It describes a mapping that will be used over or other words on top of the results of a main or initial strategy, in case the main strategy returns a number of [inaudible 00:09:29] that does not match the specified condition. For example, if you have source system strategy for the attribute but it returns too many winners, you can shorten them, put in a fallback strategy with last update date logic, which will pick up the winner from the source system strategy based on last update date and define the final winners. So fallback strategy is like a next level of survivor definition for the rule. This condition is determined by fallback using criteria field. There are two types of fallback criteria. More than one is a default criteria to work if main strategy returns two more winner values. So it will be skipped if only one value was returned. It's already a winner, we don't need to run into fallback strategy.

 

Dmitry Blinov (00:10:24):

Specified fallback strategy will only use winners values and only winner crosswalk from the previous step. So on the second step, it will ignore everything else but the two winners that came from the previous level of calculation. Zero or more than one. If a main strategy returns nothing or more than one winner value, then fallback strategy will be kicked in and will be used and determine the final winner. So again, it's a second layer which adjusts the result of survivorship rule calculation. Let's go to the next slide. Well, it's just a simple fallback strategy. I'm not going to read through this. Well yeah, I'll skip, but we attach the slides always in the end of the webinar, so feel free to open and go through this example. Here is just the same thing I explained in the configuration itself. So now next step is let's do a technical deep dive into survivorship strategies. We again, reviewed them on the last webinar on a high level. Today we'll go through a demo of each of them.

 

Dmitry Blinov (00:11:36):

There are two types of survivorship strategies, those that are based on data source or crosswalk and those that are based on data or attribute value. Yeah, let's go to the next slide. Just let's review those that's based on data source or crosswalk strategy. Number one is most popular and also it's a default value. So this strategy it is used if no strategy was defined for an attribute, last update date will be used. This one makes all values with the most recent update date to be winners for this attribute. An update date is taken from the crosswalk, this is why this is crosswalk or data source based strategy. It calculates the maximum value of update date, single attribute, update dates and [inaudible 00:12:28] load date for each crosswalk.

 

Dmitry Blinov (00:12:29):

So there is actually, if you look deeper into this there more than one field that's taken in consideration by this strategy. There are three different fields and they have specific priority listed, the way it's listed here. Then it will find the maximum value among all the values from this second step. The corresponding crosswalks becomes a winner and the value from this crosswalk becomes separational value. Let's switch to the demo.

 

Yakov Goyhman (00:13:00):

Okay. So regarding demo. Here I'm going to take some specific entity which already has been posted with appropriate attributes. And with strategies already is configured for this entity and attribute. So let's take entity and opened in browser. And this is my toolbar. Okay. I paste an entity ID here.

 

Dmitry Blinov (00:13:52):

Taking serial [inaudible 00:13:54] UI here, for those of you... If someone is not familiar. Yeah.

 

Yakov Goyhman (00:13:53):

Okay. Here's a profile view and we should go to another view, it name is sources view, to understand why some value is survived of one. So here's very simple entity. It has attribute with name. I'm sorry. I think I copied wrong ID. Okay. So again, some entity with attribute name, first name load. And as we can see, this entity has only one attribute with two values. One is Michael and second is Mike. One of them came from crosswalk and another is came also from Facebook crosswalk from another profile. I'm going to take JSON for this profile and show you how it looks under the hood. So there's a postman which allows me to retrieve entity. Driven entity, copying it and pasting to some tool which allow me to see it properly. So it's always be some entity when I will display a demo on the right side and on the left side, there is survivorship conflict for HCP entity type. HCP entity type is a entity type which we are working with and this entity type has many attributes.

 

Yakov Goyhman (00:16:19):

And on the left panel, we can see all attributes mappings to their strategies. So let's check for which strategy configured for first name lot. We can see this pretty simple. There's first name lot attribute, which is mapped to survivorship strategy. By the way, it's a default strategy and we don't need to specify it. There's only strategy we don't need to specify. So any attribute which is not specified in survivorship mapping would be calculated as a lot strategy. So let's go back to our entity and we can see that first name lot with one of them has OV true and another value has OV false, Michael and Mike. Let's go back to user interface and we can see that winner is Michael here. And as attributes appeared here as well. Okay. Let's check for why it wins. Copying attribute ID, searching for it and there's three appearances of this attribute in this file. As we can see, the second appearance is inside the crosswalk, this attribute assigned to this crosswalk and the date of this crosswalk update date is 38, 35. Let's compare with another.

 

Yakov Goyhman (00:18:12):

There's only two crosswalks and I know that another Mike, which is OV false, came from this crosswalk. Updated date here is absolutely same. So why this wins and not this? Okay. There's a third appearance of this ID. Third appearance of this ID inside single attribute update dates section and we can see that this attribute was updated more recently than was created or updated it initially. And Mike, so we see the time is 35 and here's 38. So it's more recent therefore it-

 

Dmitry Blinov (00:19:03):

It's seven minutes later actually, 42.

 

Yakov Goyhman (00:19:06):

Yeah. Not two minutes, even seconds. Yeah, 42 comparing to 35. So let's sit with this example. Let's move back to...

 

Dmitry Blinov (00:19:26):

Yeah. To the slide deck, thank you Yakov. So again, as you can see on the step two, three different fields were taken in consideration and if in update date we had complete match, single attribute update dates taken into consideration, then by this strategy and winner was defined by third field, not by update date. Okay, let's go to the next slide. Next strategy is source system. The main thing you need to know about such is it is based on sources URI order field and it makes values belonging to a source with the highest priority to be winners. And the priorities defined exactly in the source's URI order. As you can see on the rights side under the config title, you can see sources URI order, IRAP, HCOS HMS and DA, just simple crosswalks but oftentimes we can see this data sources in different data sets and basically priority will go as you see listed in here. So IRAP will win if it is present, if it's not present, next winner will be HCOS and so on and so forth. Let's go to the demo.

 

Yakov Goyhman (00:20:49):

Switching back and taking another entity. Another entity is for SRCC, this one. [inaudible 00:20:57] the entity. Okay. And I need to show you in a user interface because for this strategy, it's much more easy would be to check it in user interface, again, why some value is been. Okay. So here we can see that there is three different values. For attribute first name, SRCC. Let's go back to configuration and check for... Okay, I would prefer to copy the whole entity and copy attribute name and search for this attribute on the left panel. Okay. So here is our strategy. As we can see on the slide, there's four sources, URI order list four items in this list. And most priority in this list has IRAP. Next HCOS, HMS and DA.

 

Yakov Goyhman (00:22:30):

Let's go back to user interface and we can see that one of them, HMS there, which is over priority and Facebook, which is not appear in priority list at all. So we can see that HCO and HMS has more priority than the DA, therefore HMS wins. And we can see a surviving value like HMS. Let's sit with this example, going back to slides.

 

Dmitry Blinov (00:23:13):

I think we can show one more thing real quick, if you can switch back to UI.

 

Yakov Goyhman (00:23:15):

Okay.

 

Dmitry Blinov (00:23:16):

If you just remove HMS data source and see how another value will survive.

 

Yakov Goyhman (00:23:23):

I'm sorry Dmitry, [crosstalk 00:23:26] I'm not familiar with the interface interactions. And I don't know how properly to remove the value from it.

 

Dmitry Blinov (00:23:34):

Okay, I'm sorry. Hover over the source.

 

Yakov Goyhman (00:23:36):

HMS. I can ignore value?

 

Dmitry Blinov (00:23:41):

No, not the value but source. You can see the trash bin on the side. Very right side. HMS, trash bin.

 

Yakov Goyhman (00:23:49):

Okay, I see.

 

Dmitry Blinov (00:23:49):

Click on it, remove it and Mike DA became winner because it's the next source defined in the list of priorities. So the previous winner disappeared, next winner defined from the source of priorities immediately as you can see in the list of yours. Just wanted to show that. Thank you, Yakov. Let's switch back to slides.

 

Yakov Goyhman (00:24:11):

Okay.

 

Ganesh (00:24:12):

Hey, one more question here. I know the answer already that I cannot have two sources defined at the same priority level, meaning I will have to have a source one by one, right? The first one is a winner, then next source so on so forth. We don't have an ability to have two sources defined at the same level, right? And then go for a fallback strategy based on that?

 

Yakov Goyhman (00:24:39):

This configuration is correct, but there is a small possibility actually to have same priority levels. But those priorities describe it in another place in L3 section of sources.

 

Ganesh (00:24:55):

So I can have two sources having the same priority saying that if I get the same record from both the sources then, because we have two OV's from each of that, it'll go back to fall back.

 

Dmitry Blinov (00:25:12):

Yes.

 

Yakov Goyhman (00:25:13):

Yeah. Correct.

 

Ganesh (00:25:15):

Okay. I'll probably need a session for that because that's something that we are lacking in [Roche 00:25:23] today. And we have been told from Reltio that's not possible. There's only one source that has the highest priority and then everything else follows. So we cannot have two sources with the same highest priority. That's how we are... But maybe a separate session. But I please go ahead.

 

Dmitry Blinov (00:25:42):

Thank you Ganesh. All right, back to slides. Let's go to the next one. Oldest value. Take the crosswalk bound to each value of the appropriate attribute and calculate the minimum value of create date field. Now that's different because create date will not change if you update the profile, it is set once you created the profile and created the attribute and created the crosswalk and it doesn't change. That's the difference from last update date. Let's go to the demo. Should be simple.

 

Yakov Goyhman (00:26:28):

Let's open it in interface. Okay. For this, we have to open [inaudible 00:26:42]. Okay. First let's check strategy on the left. We can see that there is oldest value strategy applied to this attribute. And here we again see two names, Mike and Michael. Mike is OV true. Let's check why. Let's use again, attribute ID and search for it. So again, we can see here two different crosswalks and now we have to check created date. And let's check latest value. One of them is 4153 and one of them, 4135. So 35 is created more early comparing with this one. So therefore Mike value wins. Dmitry.

 

Dmitry Blinov (00:28:29):

Sorry got distracted. Okay, let's go to the next one. Other attribute winner crosswalk makes attributes from the source that wins in the other attribute, to be winners for this attribute. Again, we had a simplified example for this, for how it works in the previous webinar. It's a simple strategy, but it takes one more step to understand how it works. Basically you need two different attributes and one attributes what wins by a simpler strategy, say last update date. And another second attribute you'll point to this one and say well, whatever crosswalk this one wins, take my value from the same crosswalk as well. Something like that. It's possible to have primary attributes which service [inaudible 00:29:24] also other attribute in a crosswalk. There is only requirement not to create cyclic dependencies. So you cannot point attributes to each other, this one wins, they take the winner from this one and this one takes winner from this one. So you cannot define other winner crosswalk onto attributes and point them to each other. There'll be a validation error in this case and it will not work. Let's go to the demo and see how it works.

 

Yakov Goyhman (00:29:53):

Okay. This demo would be much more complex than previous one. Okay. Going to postman, get your entity and going to UI again. And engaging same entity. Again, not copied properly. No, it's copied correctly. So I think we have some... Okay. Let's check for slide. I think on slide we can have maybe... Again, we have the same.

 

Dmitry Blinov (00:31:01):

So other [inaudible 00:31:03] crosswalk is the same entity?

 

Yakov Goyhman (00:31:06):

No, it shouldn't be because there should be many different attributes as we can see here on explanation.

 

Dmitry Blinov (00:31:16):

Let's try and do a search and just find it by search. And you can actually... You know what, take it from the URI. In the URI you can see entities. It says A72. No, it is [crosstalk 00:31:29]

 

Yakov Goyhman (00:31:30):

Okay, I already did it. So I think this was improperly placed, the URI. [inaudible 00:31:44] Okay, that's fine. Got it. So again, let's check for attributes which appeared here on the list. Go back and forth with sources view in UI and adjacent editor. So we can see here that we have the last name as a attribute winner. There is a name of attribute, middle name as a attribute winner and first name. Let's go back to JSON and check configuration for... Okay. Let's take one of them and search here. So we can see that middle name, other attribute winner, attribute name has as primary attribute, last name other attribute winner. And last name in the case has first name as primary attribute. So there is some chain. Middle name points to last name, last name points to first name. Two levels of other attribute winner crosswalks. So first we check for first name. First name. There should be first name without any post fixes.

 

Yakov Goyhman (00:33:42):

Okay, here we have. We should treat such complex a strategy as attribute winner crosswalk from the end and the end is first name. For the first name, the mapping is SRCC strategy with two different sources, URI order HRAP and HCOS. So let's go back to here and see. For first name we have one from IRAP and second is from Facebook. So IRAP definitely wins because Facebook is not listed in priority order. Therefore, middle name wins because it came from same crosswalk and it takes value from same crosswalk as first name. And last name again, in the order, gets value from same crosswalk as winner crosswalk of middle name. Such a complex thing, but that's it.

 

Dmitry Blinov (00:35:04):

We can do the same trick. We can remove the crosswalk. I wish we had three crosswalks. It would be more [crosstalk 00:35:11]

 

Yakov Goyhman (00:35:10):

There's only two crosswalks.

 

Dmitry Blinov (00:35:12):

Yeah. Yeah. Because if you remove one, obviously all the values filled in from remaining crosswalk. But basically if you remove the IRAP crosswalk, first name will win from Facebook crosswalk now because there is nothing else to win from and middle name and last name will be taken from the same crosswalk, not because there is nothing else to win from, but because they will win from whatever crosswalk first name wins. So basically all three attributes in this case have dependency one on each other, the [inaudible 00:35:43]. And not on the first name, but even chain. So last name will win from what middle name wins from, middle name will win from what first name wins from. So you can do chains like this.

 

Dmitry Blinov (00:35:55):

All right, let's go to the next slide. Last of the data source based strategies, winner entity crosswalk. Mike [inaudible 00:36:07] winner of the attribute from the winner entity to beginners. Example we provided last time, you have two entities entity, one and two entity. One wins, all the values from entity one becames a winner for the attributes where this strategy is used. Things to know about the strategy before merge separation of the values of the attributes has operational value post true. So they all operational. Figures set explicitly in a merge separation then its values will become operational values. If winner is not set in merge separation, the oldest entity will be chosen as a winner and its values will become operational values. Let's see the demo.

 

Yakov Goyhman (00:37:19):

Okay, let's get entity from our API. We have to renew our token. Pasting it here and check in for attribute and the strategy on the left. So we can see here that first name winner [inaudible 00:38:05] crosswalk attribute has winner [inaudible 00:38:08] crosswalk strategy mapping. Again, we have two values. One value, Michael from entity two and Mike from entity one. And in order to understand which attribute cames from which entity, we have to get cross crosswalk three. There's another API call, here is it and I just executed for this entity. So we can see here...

 

Dmitry Blinov (00:38:46):

The history of merges, right? So we see history of merges for this entity's values.

 

Yakov Goyhman (00:38:56):

So history of merges. We have four latest winner, we have one loser. And in this loser, specified crosswalks and there is a crosswalk. And if we get back, we can see that this... So a crosswalk with value VEA. Okay, going back to entity. This is a crosswalk and we can see here attribute with this ID. Let's check what the ID is. And we can see that ID from a attribute which belongs to a crosswalk which from loser entity loses and operation values falls. And vice versa, the value from winner entity wins therefore this is OV true. Dmitry.

 

Dmitry Blinov (00:40:18):

Yeah. So yeah, to understand the winner in this case, you'll need to merge to your merge history, which is also well built in the MDM UI. But basically, whatever loses continue losing on the attribute [inaudible 00:40:34]. Okay, let's go to the next set of strategies real quick. Data based, they're simpler than crosswalk based if you ask me and just lesser number of them, and then we'll switch. Let's go through the quick demos for these type of strategies and then we do a complex demo which includes all of the strategies. So now we are looking into specific individual strategies, but yeah, we'll have a real life example, which includes everything and fallback strategies as well. So frequency is the first one value based. Very simple, makes values that came from most numbers of different sources to be winners. If I have three crosswalks, two of them had the value Mike and one of them had the value Michael, Mike will win. Very simple.

 

Dmitry Blinov (00:41:26):

If more than one value, wins it can happen if I have four crosswalks and two of them has this value, two others has another value, equal number of frequent... The equally frequent values will win. Both of them. Last update date stretch will be applied by default and define the final winner. If the stretch is used for nested or referenced attribute, because we can set their nested and referenced attributes as folder so it's a folded structure or contained structure or container structure type, right? We need to set one sub attribute which we can visually define our nested reference attribute. URI such sub attribute has to be placed in comparison attribute URI property of the frequency strategy. And frequency strategy will use this property to define winner in such complex folded cases. Comparison attribute URI is mandatory for nested and reference attribute. So if you going to apply the frequency strategy for a nested or reference attribute, you have to have this field otherwise you'll have validation error. Let's go to the demo.

 

Yakov Goyhman (00:43:05):

Okay, let's open it in UA. So we have an attribute first name frequency. And again, we have two different values, Mike and Mike in caps. So we can see here three crosswalks. And as we can see, Mike wins and let's check which crosswalk Mike came from. Mike came from two different crosswalks, HCP Facebook and NPI and Mike in caps, from NPI only therefore it gives us understanding why Mike wins. The number of occurrences of this value in different crosswalks, how many crosswalks it belongs to, says us that if the more frequent value belongs to more crosswalks, therefore it more frequent, it wins.

 

Dmitry Blinov (00:44:41):

Yeah. So this is exactly an example of three crosswalks, two values. But again, let's remove the purple one, the last crosswalk and in this case, frequency will became the same but the winner will still be defined. Click on the okay in the... Yeah, so it disappeared. And actually last update date did not not define. There is no fallback strategy in this case if you can see that last update, they did not define the winners. So we ended up with two winners right now. So here, I think we will have to apply fallback strategies to define the final surviving value.

 

Yakov Goyhman (00:45:35):

Yeah. Last update will be the same for both values.

 

Dmitry Blinov (00:45:37):

It's because it's the same. Got it. So if last update date is the same, then we'll have two winners and then we'll have to define again fallback strategy to define the final, final winner. Okay. But yeah, I hope this demonstrates. Let's go to the next strategy real quick and then... Aggregation. The most simple, make everyone winner. Most simple strategy we have. Let's just show it in the UI. I don't think we need to go through the postman and structures here. Basically with aggregation, if you have three crosswalks that contain values for the attribute, the attribute will have three operational lives. So if you have five crosswalks, you'll have five operational lines with aggregation strategy if it is applied to the... Yeah, exactly, as you can see here. The previous one, by the way, after our last update they didn't win, aggregation took place. So in this case, you can see the same. You can see two operational values that came from three crosswalks in this case, but because two crosswalks contain the same value, the gut can collapsed into a single value, which is Michael.

 

Dmitry Blinov (00:47:08):

So if value is the same, it will be collapsed in this case. Okay, let's go to the next strategy. Minimum value and maximum value. Again, very simple, apply to minimum maximum value to be a winner. Meaning the minimum value is the smaller numeric value. Date, the minimum value is the minimum date value. [inaudible 00:47:40] true is more than false so for max value, true will win, for minimum value false will win. Strings. The minimum value is based on the graphical or alphabetical sort I should say actually, order of the strings. If you need to sort values in a way that different from the default sort, you can use sort as field and define sorting priority there. Same as a for frequency strategy. For minimum value you need to define comparison attribute URI in case of nested in reference attribute. Otherwise you'll get validation error because again, nested in reference attribute it's a folded structure. So you have to define which one defines the winner in the end. Let's take one demo. Let's not go into both demos and let's switch to the main demo then will be the last step of this.

 

Yakov Goyhman (00:48:44):

Dim, did you mention the latest demo?

 

Dmitry Blinov (00:48:45):

Yeah. Let's switch to that one.

 

Yakov Goyhman (00:48:48):

Okay. So we have different demo for mean value, for max value but it's the same, it's very simple strategy. Okay, we have run out of time so we are skipping those two demos and going to latest one. There is [inaudible 00:49:05] some customer issue. Customer asked us explain why some value is winner because it was absolutely not obvious. And we have here the entity. Here we definitely need JSON.

 

Dmitry Blinov (00:49:29):

By the way, this demo contains minimum value, maximum value examples in it. So we'll demonstrate it this way.

 

Yakov Goyhman (00:49:45):

And again, going back to postman and getting this entity.

 

Dmitry Blinov (00:49:54):

So there is a profile Simon which is a set of attributes.

 

Yakov Goyhman (00:49:59):

Copying it and pasting it here again on the right and let's check. So the most interesting thing here, we have address. Address is a nested attribute. So again, as Dima said, it has folder structure. For some address, we can specify many attributes and the only sub attributes which specified here is address type, address rank, address line one and city. So we can see that for address, we have one winner and five loser, five overall attribute values. So collapsing them here, and we can see the label of each attribute on the right. And only attribute value with this crosswalk, Facebook. This value offers address line 1, 548 Pier street, Trenton, is one. Let's remember this value. And go back to JSON. Okay. How it looks in JSON. JSON is pretty big. It's not comfortable to go through all of them so let's take a attribute and search for strategy and apply here.

 

Yakov Goyhman (00:51:42):

Again, I'm going to increase site. Increases and we can see how huge it. It started from line 107 and continues until 135. It has three levels of fallback strategies. So let's go from... Here we can go from the first mapping. First mapping say that this address have to be calculated as max value. Max value is value based strategy. This address attribute is nested attribute, therefore we have to specify comparison attribute URI. For contribution, the comparison attribute URI selected address rank. Okay. Let's go and check address ranks for each. Oh, this was... I have to reformat. Check for rank

 

Chris Detzel (00:53:07):

Yeah. Dmitry, Yakov, quick question. On the real TOUI, the column rule type is survivorship for each attribute, is that right? And where can we see the fallback strategy of configured on UI?

 

Dmitry Blinov (00:53:20):

I think we show it in UI.

 

Yakov Goyhman (00:53:23):

So here we can see the only first level of strategies appeared in UI.

 

Dmitry Blinov (00:53:31):

Yeah. So fallback strategies, the full structure will not be shown here, you'll have to extract the data model configuration and see it there today. But it's a good question, yeah. So they definitely will discuss it with product team.

 

Chris Detzel (00:53:49):

Thanks guys.

 

Yakov Goyhman (00:53:51):

Okay. So checking for rank. So we can have rank 10, rank one, seven, again 10, again 10 and yeah, it's pretty complex to check them. There's five addresses and three of them has rank 10, which is maximum.

 

Dmitry Blinov (00:54:22):

So this is the first level of calculation of light, right? This three will be defined as first level winners because they have maximum value of attribute strength. So you saw that strategy applied is max value and the nested attributes that was defined as a flag for this max strategy was at this rank. So you can learn to address this and you can make them survive this way. We have three [crosstalk 00:54:55]

 

Yakov Goyhman (00:54:54):

So we have three winners and for each winner, we also have winner crosswalks. It's a very important thing winner crosswalks. It never appears in any place, but it always calculated together with the value under the hood. For each winner value, again, should be winner crosswalk. For value-based strategies which will wins, their crosswalks are becoming winner. So for fallback strategy, we have sources URI order and SRCC strategy. By the way, as you can see that there is fallback using criteria and if this criteria is not specified on first level, it means that this is only more than one, not zero. For zero or more than one, you have to specify it explicitly.

 

Dmitry Blinov (00:56:07):

So this strategy, let me just streamline this. So on the first level of calculation, we defined three potential winning attributes, right? And now we going into the fallback strategy, which we will consider only this three winner in a category. So we narrowed down the results to three winning attributes, all three with rank address equals 10. Now for the fallback strategy, we say that fallback strategy should be applied only on zero or more than one. A result set came into the fallback strategy. In our case, we have three results so fallback strategy will be applied this way. And let's look at the first fallback strategies that will be applied to this three results now. As Yakov mentioned, all three values defined have associated crosswalks associated to them, right? So these crosswalks will be considered in the fallback strategy. Yeah, please go on.

 

Yakov Goyhman (00:57:08):

But there's another trick here. There are sources for AV specified on the very highest level. These sources for AV restricts only crosswalks. So only winners from crosswalk which belongs to those sources are restricted to be, I think, to be calculated.

 

Dmitry Blinov (00:57:32):

Yeah. In our case, actually both AHA and DA are listed in the sources for AV for the fallback strategy. So both of them will be used because they are listed in the sources for AV. Yeah, okay.

 

Yakov Goyhman (00:57:46):

Okay. And there also will be much more easy to check it in user interface. So when I hover on each attribute, it shows I sign on the right of a crosswalk it belongs to. Source and crosswalk.

 

Dmitry Blinov (00:58:32):

So from these three values, now if we look at the fallback strategy, we will see that the one will win from the source AHA, right?

 

Yakov Goyhman (00:58:44):

No. Actually no one will win because some of them will be filtered. And there's the latest fallback strategy, which is LUD and because of the criteria says that zero values allows to apply the latest fallback strategy to the set of values. And if we will check each update attribute, we can see that this one will win. And yes, again, it's not easy work. You have to go through each value, you have to get it's ID and check for crosswalks and check for update dates, check for single attribute update dates to make sure that why it exactly wins.

 

Dmitry Blinov (00:59:50):

Yeah. So basically you need to streamline... First of all, you need to know how each strategy works. Second of all, you need to know the order that will be applied. Fallback strategies is actually, also this is the same survivorship strategies with additional conditions which defines when fallback strategy will be used or not be used, right? And in which order fallback strategy. But it's a folded structure of the same survivorship rule. So you can just have more than one strategy applied if you need to narrow down to a specific value. In this case, as I understand, first result was defined by strategy max value and it was based on the address rank rather than address itself. And actually, in the rule type, you can see that maximum value address rank is applied. You can see the first level. Next, because we had three results, the fallback strategy kicks in and it will consider everything. It will consider the sources for AV first of all.

 

Dmitry Blinov (01:01:02):

If you apply filter by sources for AV, it will apply filter by list of data sources for specific fallback strategy. If your three winners don't have... Any of them don't have a source from this list, they will not go through. But then it doesn't mean that you will not define the surviving value because you can apply another fallback strategy and if you had a zero results from your previous Cascadian logic, you can apply a zero result fallback strategy, which will define the winner by last update date. And that's, I think exactly what happened in this case, right?

 

Yakov Goyhman (01:01:45):

Correct.

Dmitry Blinov (01:01:46):

Yeah. Okay. It's a complex example, but it is defined in the... We will also explain it in the slide. I really put the scenario in the slide so you can try and go through it as well. Thank you, Yakov. So yeah, not a lot of questions today but it was a demo so we've been through a lot of technical details. It is hard to switch between... I just know that it's hard to switch between data structures and back to UI and then again, back to data structures. But this is the way to understand entity resolution, going back to where we started with, the way to understand entity resolution at Reltio. And I think you now see from these two webinars we did, two things. The complexity that can be modeled but at the same, flexibility that can be modeled.

 

Dmitry Blinov (01:02:41):

Reltio allows you to model very, very flexible data models and entity resolution logic applied to that with existing strategies, fallback strategies and so on and so forth. So you can even have cascade and chains where you're trying to define operational value. You fail to define operational value and then you say well, if I didn't define anything that just makes the last update to be winner in my case. That is it for today. We will be going through more complex examples like this in the future webinars as well. We will think on how to make them more consumable, more easier to consume. But it's a technical... Well, there is some technical complexity to this type of [inaudible 01:03:25] anyways. That's all Chris, work back to you.



#communitywebinar #Survivorship #Fallbackstrategies
​​​​
#CommunityWebinar
0 comments
6721 views

Permalink