Reltio Connect

 View Only

Survivorship, What Comes After the Match? Webinar

By Chris Detzel posted 08-30-2021 08:15

  



Matching and merging data is only half the battle. To get all of the value possible out of your data, you need a comprehensive survivorship strategy. Join senior technical consultant, Joel Snipes, as he deep dives into the dos and don'ts of survivorship in Reltio.

For more information, head to the Reltio Community and get answers to the questions that matter to you:
https://community.reltio.com/home

Take a look at the PPT for Survivorship what Comes After the Match

Transcript: 


Joel Snipes
(00:05):

All right. Good morning, good afternoon, good evening depending on where you're joining us from. I appreciate you taking the time out of your day. Our last webinar, we spoke about some matching and merging best practices and today I want to follow that up with survivorship, which is really what comes after you do your matching and merging in Reltio. And to unlock the full value of your matches and merges, you're going to have to have an effective survivorship strategy.

 

Joel Snipes (00:39):

What is survivorship?
Survivorship is the process of determining your operational value or golden record, depending on how you want to describe it, out of the multiple sources you brought together in match and merge. And then particularly within Reltio, how it might be different from some other products. Our survivorship is rules driven and a cool thing about it is that it's calculated on the fly so there is no big heavy backend process or anything like that you have to do if you make a change to a rule, it immediately affects and updates all of your records because it is calculated on the fly. And the vast majority of survivorship configuration can be done directly in the UI. I'm going to touch on a few more advanced features that we'll jump into postman and play with the JSON a little bit but most of this can be done in the UI, which is nice.

 

Joel Snipes (01:42):

In this bottom image, what I have and I'm going to jump into a demo where we look at this live in a second, is where you configure survivorship. On the right-hand pane, all your sources kind of buildup and each source system gets a different color. And on the left hand, you get the operational value. In the middle of those two are your survivorship rules, which kind of acts as a filter, processing everything on the right and giving you your golden record or operational value on the left. And I'm going to talk a little bit about what options are available and then we're going to jump into the UI and look at them live.

 

Joel Snipes (02:23):

There are nine survivorship settings available out of the box and you can mix and match and build some custom ones as well. I'm going to start with recency. Recency basically looks at the updated date and gives you the most recently updated version of the record. This is the default survivorship. If you create a new attribute in Reltio, it's going to be assigned recency as its survivorship rule right out of the box, unless you decide to make a change. Another very popular one is source system. You provide a list of source systems in order of your most trusted system to least trusted system and it will take the most trusted system available and use that as your golden value. Frequency will look across all of your values provided across your crosswalks and the most frequently used value will be the winner. Aggregate rather than choosing a winner, it keeps all potential values as winners. You see this one a lot with things like addresses or any nested attribute really. Why choose a winning address when you could keep a billing and a mailing and a home address?

 

Joel Snipes (03:45):

Other attribute winner, this one pairs with address very often. And this happens when you have a data set that doesn't make sense to split the survivorship across source systems. Address is another example of this. There isn't really a situation where you want to take your address line one from Dun and Bradstreet, your address line two from your ERP and your address line or your city and state from LinkedIn. That address line one might not exist in that city and state if it's a different address. If you want to keep a group of attributes together, other attributes crosswalk winner is your tool choice. Oldest value is straightforward, whatever has the create date that's furthest back in time will win. Max and min the kind of self explanatory.

 

Joel Snipes (04:39):

And the last one is Reltio Cleanser or nothing. This one is also pretty straightforward but what it does is if there's a Reltio Cleanser applied, we have out of the box ones on phone, address and email. It will only display a value if it's been cleansed by Reltio. If Reltio is unable to cleanse an address, that address won't be considered at all. This could be if you have really dirty address as a double edged sword, if you want to keep those addresses, if they're not able to be cleansed. But if you have a very high standard and you want to make sure every address you have in your system is mailable or something along that line, it can also provide a lot of value. There's two big ways we configure address survivorship and I'm going to talk about both of those.

 

Joel Snipes (05:29):

Enough slides, I'm going to pop over to a demo record I set up for this particular example. We are going to be working with this data live and we're going to see how we can update these on the fly. For first and last name, I configured Lauren to have source system survivorship. And if I click the sprocket here, I can see a list of source systems I have configured and I can drag and drop and rearrange these. Currently, I have a Dun and Bradstreet crosswalk here on the right and a Facebook crosswalk. Facebook's yellow, Dun and Bradstreet is green. These are currently in agreement but what's actually happening is the Dun and Bradstreet is the winner here and that is because Dun and Bradstreet is higher up than Facebook on this list.

 

Joel Snipes (06:28):

I want to jump over to match merge and I'm going to add another crosswalk where the name is not the same and we're going to see whether it continues to win. We have Lauren and Lori Vargas and this record came from Twitter whereas these other two are a Dun and Bradstreet and a Facebook brought together. I'm going to merge them and see how this affects our survivorship. They're brought together. We can now see three crosswalks on the right side, three colors and Lori here is in green. We have darker green, we have Lauren and yellow we have Lauren. And so we have Lori now as our first name rather than Lauren and that is because Twitter has a higher priority than the other two systems.

 

Joel Snipes (07:53):

For address and pretty much any nested attribute, aggregation is going to be the strategy choice. We'd like to collect multiple phone numbers, multiple addresses in most scenarios. And what you can't see in the UI and this is one of the features that you have to use Postman to get into is each of these sub-attributes also have their own survivorship and these have to be configured separately. Let me show you how you do that.

 

Chris Detzel (08:25):

Hey, Joel.

 

Joel Snipes (08:25):

Sure.

 

Chris Detzel (08:26):

Before you get to that, question. When changing source order, in that updating config for this attribute or is it changing the order for the profile only?

 

Joel Snipes (08:40):

This is system wide. For contacts, if we update the source system order, it's going to affect every single contact's first name, not just this particular record.

 

Chris Detzel (08:51):

Okay. That's good to know. Thanks, man.

 

Joel Snipes (08:53):

You're welcome.

 

Unnamed (08:55):

I have a question. Twitter was a priority, where do we set that, that Twitter takes precedence? You mentioned that.

 

Joel Snipes (09:06):

If you click here, you can pick which rule you want to apply and if you click the sprocket, you can adjust the settings.

 

Unnamed (09:15):

Okay. Whoever is the top here, for example here, the Reltio cleanser is the top?

 

Joel Snipes (09:22):

Yes.

 

unnamed (09:23):

Okay, thank you.

 

Chris Detzel (09:27):

Thanks for the question

 

Joel Snipes (09:38):

Jumping back to how you take care of aggregates, the sub-attributes have their own survivorship separate from the parent nested attribute. Jumping over Postman, the first step will be to get a token. We can see our token has lots of time left on it. Then I'm going to get the configuration with no inheritance. This no inheritance means we're just getting the L3. The L1 and two are uneditable but they're useful if you want to understand how for example, contact is inheriting from individual and some of these attributes may not be listed in the L3 .Most of them will be. But anytime you're making any kind of configuration change, you're always going to want to grab the no inheritance version or the L3.

 

Joel Snipes (10:36):

What I'm going to do is I'm going to grab the response here and bring this into JSON Editor Online, paste. And what I like to do is copy back and forth, that kind of validates everything you formatted well. And then I like to hit this button here that kind of reformats everything nicely. We have a well formed JSON and we're going to go find our contact entity type and our survivorship groups. This is how you manually edit in the JSON, the survivorship rules. As we can see here, our addresses and just like the UI showed us, it's managed by aggregation. But below that, address line one is controlled by the rule source system. It has its own source system list, just like our first and last name did. And there's only two that can win, that's Reltio Cleanser. The cleanse value that comes back from our Reltio Cleanser or Reltio. If someone manually puts the information into the UI.

 

Joel Snipes (11:46):

And importantly, like I was discussing earlier, every single other address attribute has the same strategy which is other attribute crosswalk. And it pointing at using the primary attribute field, address line one. Whatever address line one chooses as the winner, which is going to be based on source system, all the other address attributes will choose the same winner because of this other attribute crosswalk winner. Phone has a similar set up here because it's a nested attribute, as well as email. The next one I really want to talk about is age and what we can see here is.

 

Chris Detzel (12:34):

Joel, before you get to age, a couple questions, if we change survivorship rules, do we need to run a rebuild matches job for them to take effect on the data already in the tenant?

 

Joel Snipes (12:47):

No, you don't. That is a good question though because if you make a change to a match rule, you often want to do that but survivorship, you don't have to. You will never have to run a re-index to get the most up to date or current version.

 

Chris Detzel (13:02):

That's handy to know. Also, if users change the survivorship setting on the UI, is the config updated automatically with the changes the user made?

 

Joel Snipes (13:12):

That is a good question and it is. That's an important consideration. If you do a get no inheritance, you get a version of the config and then you decide, oh, rather than me updating with JSON, I'll make a quick change here, change the rule and then I'll make another change, you can overwrite your changes because every time you make a change here and change the rule and you're going to want to do a fresh get, that way you get that change in your configuration rather than leaving it behind. Otherwise, if you work with an old version, you'll end up reverting your change on accident.

 

Chris Detzel (13:54):

Thanks, Joel. That's handy too, to know. I think that's all the questions for now.

 

Joel Snipes (14:00):

Great. With age, people don't get younger, unfortunately so I've chosen maximum value. Likely 38 is more reliable than 35 for a field like age. Right now we see Twitter providing 38, Reltio's providing 35 but under our operational value column, 38 is prevailing here. But if for some reason we believe that the younger value is always right, maybe Reltio or whatever system or Twitter is an unreliable system, you could change to minimum value or source system, that sort of thing. And we can see that the winner changed over here on the left side in real time.

 

Chris Detzel (14:52):

Hey Joel, what's the security for being able to change the rule? Is a specific role needed or is there one needed there?

 

Joel Snipes (15:02):

Yeah, so role API gives you the ability to edit this. And that's pretty dangerous when you're in a production system because role API often gets used as kind of a baseline access. What I normally recommend my customers is you clone the out of the box role API and you remove the specific access to editing this so that your kind of everyday users and data stewards can't change survivorship. This is something you probably want only admins to be able to control.

 

Chris Detzel (15:39):

Makes sense. Thank you.

 

Joel Snipes (15:41):

Yeah, that's a very good question.

 

Joel Snipes (15:47):

I have failed to define a survivorship rule for website. If you've looked in the JSON you would see that I didn't list it and because of that, recency is winning here. Luckily all three of our source systems are in agreement that Aetna is the source system. That is going to work out.

 

Joel Snipes (16:10):

For identifiers, identifiers are a pretty common case. We have this universal ID called UUID and we have 1, 2, 3, 4, 5, 6 and we have two systems. A lot of times I'll see for each crosswalk, they have their own UUID assigned. Maybe it's handled by ETL or we've used a Reltio ID generator but because we don't want a records ID to change over time, minimum value is a pretty good strategy if you have an incremental ID so that the oldest ID or the lowest number ID will always win. And as new crosswalks get added, they'll always be a higher number and this ID will be maintained the same. And if you click the sprocket for a minimum value and then you're dealing with a nest, this is a newer feature which is really nice. You can choose which field you want to apply the minimum value rule to. Type is the type of identifier, that wouldn't make much sense so we're doing ID, but we could change this.

 

Joel Snipes (17:19):

And we see when I change the status because they don't have it populated, both of these winners are coming across because there's a tie. And that's an important thing that I'm going to touch on later. How do you handle ties or when there's no winner? We're going to come back to that. In the meantime I'll set this back.

 

Chris Detzel (17:38):

When you're ready for a few more questions, let me know.

 

Joel Snipes (17:40):

Sure. Now is great.

 

Chris Detzel (17:41):

Great. What is a best practice for compound or nested survivorship rules? Example, ordering of the rules.

 

Joel Snipes (17:53):

On the top level, when we're looking at addresses, we're not inside the nest yet, we're at the top level, almost every nested attribute you're going to want to have set to aggregation. There are examples, I've seen very specific examples where you don't, but as a rule of thumb, aggregation is the way you want to go about it. The lower level attributes are kind of data dependent. For example, with address line one, we're using the source system with the Reltio Cleanser. In this example, our customer wants the cleanser data as a top priority but if the cleanser was unable to find a cleansed version of the address, the uncleansed version will still come through. We could have made this Reltio Cleanser nothing and then only cleansed values will come through. But that differs from the example I was just showing where for identifiers, we want the minimum ID and I can even think of another example here.

 

Joel Snipes (19:11):

The tool we're using to generate our identifier goes from being an incremental number to a hash. Now the minimum value of a hash is meaningless and this rules no longer hold up. The future strategy might move to the oldest value and this will drive and pick the UUID based on create date. You still only get one winner but it'll always be the oldest crosswalk, ensuring that the ID never changes. The only fast rule with nested attributes is you're probably going to want aggregation at the top level. Inside the nested attribute it's going to be data dependent. Have another question?

 

Chris Detzel (20:00):

There is and actually Sunchen meant to ask, how do you define a compound rule type? Can you talk a little bit about that?

 

Joel Snipes (20:11):

Compound. I think you're asking for a fallback. My example earlier when we had two choices that win and we only want one winner and that is something I can go ahead and jump to. It's conveniently my next slide. When you have, for example, first name, you had a duplicate Twitter record. Twitter's our highest source system priority here in source system as our survivorship strategy. There's two records representing the same individual. They both provide a first name. Well source system if there's two Twitter records, how does it choose which first name to use if they're both Twitter, there's a tie?

 

Joel Snipes (21:00):

That's where fallback strategies come in. And it's an array within the survivorship rule. And this is only configurable from the JSON with Postman. Unfortunately you can't do this with the UI but what we see here is we've provided to fallback strategies. If there's a tie on first name, we're going to take the minimum value, which seems arbitrary to me but that's what this example chose. And if there's still a tie, so potentially there's the same source system and they have the same name provided but we don't want to see Joel twice. We just want to see Joel once. We would switch to last updated date. And one of these Twitter records is likely more recently updated than the other and that one will win. This would be a good example of a compound rule or what we call an Reltio a fallback strategy.

 

Joel Snipes (22:00):

How you fall back is also important. If you have more than one winner, that's the default. It's going to jump to a fallback strategy. If you have zero winners, it will just show zero winners. If you have a source system list but your only record that provides first name is not in that list you'll have zero winners and you will not have a first name get through. If you want to make sure that's populated, you might consider turning fallback using criteria on and doing zero more than one. If there's no winners here, then we'll fall back anyway. Whereas the default behavior is we only use fallback if there's more than one. And I can show you in our address example, this is set up. If no address is in this list of source systems between the Reltio Cleanser and Reltio, we'll go to the last updated date and we have fallback using criteria zero more than one. If more than one address wins or if no address wins, we're going to go to the most recent address rather than the source system list. Hopefully that answers your question.

 

Chris Detzel (23:21):

There's a lot of other questions. Did you want to just continue to go in for a bit and then I can start asking them once you get a little bit more into? Or do you want to answer those now?

 

Joel Snipes (23:31):

I think now is probably good.

 

Chris Detzel (23:36):

All right. Great. Can you talk a little bit about survivorship rules for nested attributes such as address and the match fields URLs?

 

Joel Snipes (23:42):

I'm thinking that question has to do with match filled URI and while this isn't directly a survivorship.

 

Chris Detzel (23:50):

That's what I meant, URIs.

 

Joel Snipes (23:51):

Yeah, topic, I had this page ready because it goes hand in hand with.

 

Chris Detzel (23:58):

He knew it was coming. I like it.

 

Joel Snipes (24:00):

Yep. Yep. And this is great. When you think about a nested attribute, we have address. If you think of our contact as a row in a table. We have Lori Vargas here and she's a row in our table. Every nested attribute is like another table inside that table or a foreign key. If you think about this that way and we have multiple addresses or multiple phone numbers here, if this is the table, what is our primary key? And that is your key attribute URI. Let's look at our example here. I think phone number might be better because we have two values. We have a home phone number and a business phone number. And the home phone number has two source systems providing values, the business has three. And why do we not see the business three times?

 

Joel Snipes (25:10):

The default match filled URI of this field or let's go look at it. We'll go look at phone. That'll be better than me just talking about it. The match filled URI is number. What's happening here in phone, the reason we don't have three phone numbers with business and just one is because we have this match filled URI set to number. It's collapsing the three records into one because the match filled URI is basically acting like the primary key and we're consolidating all of these because they share the same phone number. The same thing is happening with home.

 

Joel Snipes (26:06):

Now, if sometimes there's examples where the field you want to collapse on is different from the field you want to consider as a key. And for that scenario, you could have a key attribute URI along with your match field UI. In most examples, these are the same and you can justify in the match filled URI if you like. And the way I see most customers configure this is to put the match filled URI on type rather than number. What that does is it limits you to one phone number per type. You can have one home phone number, one business, phone, number one fax phone number and you don't end up with multiples, but number is also a valid case because maybe sometimes you're sell and your primary are the same and then we can collapse on number but have two different types.

 

Chris Detzel (26:57):

I hope that answers the question but then she asks another question. Specifically if match rule URI groups address is by a address type, and the source sends two different addresses for the same address type, Reltio will aggregate the two addresses rather than picking up one, is that correct?

 

Joel Snipes (27:24):

Now we're going back to survivorship. With our current configuration here, we're prioritizing addresses by aggregation. In that scenario, we're doing it by source system. If two Reltio or two Reltio Cleanser records came through, we would have two winners. It is possible but it's also preventable. If you set the match filled URI by type and we did Cleanser, Reltio Cleanser wins or nothing. The only record that'll make it through will be the Reltio Cleanser and you would remove that duplicate. That is something that could happen. And you can plan for it by changing your survivorship strategy. You're going to want to use this Reltio Cleanser or nothing rather than source system in that scenario.

 

Chris Detzel (28:26):

Paula, I hope that answers that question. It's funny, I think these are more like AMAs, a lot of great questions. This guy goes, probably missed the email ID discussion, but that email didn't look cleansed back kind of, I don't know if you know which one he's talking about but I should have ask that question earlier. My apologies.

 

Joel Snipes (28:55):

No worries. I think this email did not get cleansed for a good reason and it is because...

 

Chris Detzel (29:01):

Nice catch.

 

Joel Snipes (29:03):

The email is invalid. There is no at symbol, so let's go ahead and let's see if we can fix that live. What I'm going to do is I'm going to update it through the UI and what that will do is update the Reltio crosswalk. Your source system data will never change when you edit through the UI, the Twitter, the Facebook and the Dun and Bradstreet.

 

Chris Detzel (29:31):

Harsh, nice catch. Wow. He is really paying attention. That's good. Love it. All right, you ready for the next question or do you want to kind of get this?

 

Joel Snipes (29:44):

Let's see if we get this cleanser to fire. I've updated it and still doesn't seem like it got cleansed. I might not have email cleansing configured on this tenant. This is my kind of personal demo tenant and I bet I turned it off for something else and that's why we aren't getting this cleansed. But let's jump to the next question.

 

Chris Detzel (30:11):

Sure. Can the custom role mentioned for preventing changes to the survivorship from UI be shared post this meeting as an example? The permission scheme for it, is that possible?

 

Joel Snipes (30:25):

Yes. And I would be happy to do that. Let's throw that up as a community question and answer and I'll get to that after the meeting.

 

Chris Detzel (30:35):

If you want to post that on the community, please do and then we'll make sure to get that answered, community.reltio.com if you have not been there and sign up. Another question, what is the difference between UID and UUID?

 

Joel Snipes (30:54):

I looked it up. U is universal user identifier.

 

Chris Detzel (31:00):

Looks like Gino answered it.

 

Joel Snipes (31:02):

Thank you, Gino.

 

Chris Detzel (31:03):

Sorry about that. I'm scrolling through these questions without. Can you discuss the survivorship advanced behavior? We have a field that uses src_sys and it's not working. It shows correct Reltio, however, if you look at a search using this field, it's not really working. For support, we need to change it to our survivorship and she put in docs and I can put this question in the chat but because the old survivorship, as it was unpredictable and new advanced has predictable behavior. We have been cautioned change into this may impact a lot of the OV survivorship and must do full testing, hence have not made a change yet. Can you talk about that?

 

Joel Snipes (31:51):

Yeah. To kind of put the question in my own words is you're using src_sys which is source system survivorship. That's just how it's described in the code rather than in the UI. And you are not getting the results you expect. You expect one to win and another is winning. And there are a few things that could be causing that. The first thing I would check is that your source systems. Just do a quick double check of all your source systems in the correct order here. I'm sure they probably are. And then RDM is a big one. If there's a lookup on that field, it's possible that Twitter's providing a value and the RDM lookup's being applied and the RDM value that's coming through, that should be the winner is being deprioritized or something along those lines. I would look at maybe how RDM is affecting it.

 

Joel Snipes (33:07):

Another thing I'd look is if one of your crosswalks are a Reltio crosswalk, meaning the user made an edit in the UI, which I just did earlier. What you'll see what happens is my record in the UI gets prioritized outside of the survivorship rules. And that was something I'm going to come to a little later but we'll see this ignore flag has been flagged on both of these other emails but not on the email I put in here. If you're seeing unreliable behavior in source system, I would check to see if there is an ignore flag on any of your fields. That could definitely be doing it. And the same as a pin. The pin kind of chooses the winner outside of source system, the opposite of ignore. It could be that pins and ignores are overriding your survivorship and that's what's causing that strange behavior. If you check all that out and you're still having issues, I would post that in the community as well and we can help you further troubleshoot.

 

Chris Detzel (34:10):

And she's good at posting on the community so she knows exactly where to go. There's a lot more questions, Joel but I'm going to let you go a little bit longer. We have 23 minutes or so let's keep going but then I will get to the other questions once we kind of get through some of this, if that's fair.

 

Joel Snipes (34:28):

Perfect. The only one I haven't talked about that I have modeled here is frequency. And so for Lori here, there are two records, let's see, a pink one, which one is the pink source system? Might have to expand. Here we are. Reltio generated record. This is a reference attribute. We have a Reltio field, another Reltio field and a third Reltio, all three of these are coming from Reltio source system. Two of them say she's the chief marketing technologist at Aetna. One says she is the chief marketing technologist Cigna. And because frequency chooses the winner, the Aetna record wins. And what you'll notice here is that we have two winners. This would be a great example of where we would want to use more than one in a fallback strategy. If you have more than one winner, which we do, we might want to choose the most recently updated. Let's do that.

 

Joel Snipes (35:44):

I have been playing around quite a bit so the first thing I'm going to do is grab the configuration fresh. Going to paste it in my JSON editor and format it. And let's find the survivorship strategy for employer. Frequency doesn't show up too many times. Here we go. I'm just going to borrow this code from the docs and I'm going to add a fallback strategy, a poorly formatted one. Let's see, I think we have one too many of these blocks.

 

Chris Detzel (36:51):

I really do love on the fly demos. No, I do. I think these are great.

 

Joel Snipes (37:17):

Yeah. Debugging live in front of people always makes commas and semicolons just disappear too but I'm going to.

 

Chris Detzel (37:28):

It's probably a little stressful.

 

Joel Snipes (37:29):

Unexpected, let's just drop that. Oh, here we are. I didn't see this here. There we are. We're we'll format it now. What we just accomplished, currently we're choosing the name by frequency, the name of the employer and I would like to have a backup plan for when there's two values. Min value probably is the best choice. Let's get the most recently updated employer, if there's a tie. That would be LUD. In the UI it shows us a recency. And what we're going to do, is we're going to copy our well formatted document. We're going to go to put configuration, which is almost the same call but just to put instead of a get. I'm going to paste it in here, like to hit beautify just to make sure it's all nicely formatted and let's hit pointing at the right tenant, it sent. 202 accepted. We did well. Let's come back and give our page a refresh.

 

Joel Snipes (38:50):

All right, now Lori no longer has to work two jobs. She is only at Aetna one time. That is an example of how you would implement a fallback strategy and kind of on the fly example of how you could end up with two winners and you might need a fallback strategy. I was touching on the pins and ignores before but I would like to go a little more into them. Your data steward is reviewing Lori's record and they know for a fact that maybe they just looked at her LinkedIn page, that she is currently working at Cigna. She no longer works at Aetna. However, your survivorship rule is bringing Aetna to the front. We don't want to change every record in your tenant survivorship strategy just to get Lori right because that'll likely cause other problems. This is where the pin and ignore come in handy.

 

Joel Snipes (39:47):

Because there's two Aetnas, I can either ignore both the Aetnas or more conveniently use the pin on the correct one. I'm going to click the pin on the wrong one apparently. Let me click here. There we go. And now Cigna is the winner. I'm able to correct Lori's record without affecting all my other records and having to change my survivorship rules. This is really handy too where a data steward could see this as messed up and their gut instinct might be to go play with the survivorship rules. If you have this disabled, they won't make that mistake but you can leave them the ability to pin and ignore to make corrections.

 

Joel Snipes (40:47):

All right. Do we want to take another question or two? Are there any out there, Chris?

 

Chris Detzel (40:50):

There's a lot of them. Going back to the phone number example, that's a little bit back, is there a way to configure survivorship for different phone type? Aggregate all business numbers but only allow one home number.

 

Joel Snipes (41:05):

That is a great question. And it is possible with filtering. You're going to want to take a look at this section of the docs but the short of it is you put in equals, which is in SQL similar to a where clause. And it would be something along the lines of this probably won't be perfect syntax. It would be something along the lines of this. We'll have equals and you'll have to grab the right one but let's see. Filter, if we were dealing with phone, let me move this to phone where this makes sense.

 

Chris Detzel (42:17):

There's a lot of phones.

 

Joel Snipes (42:38):

Yeah. Better do a better search. All right, now we're down to nine. That's where we want to go. Cleanser, cleanser. Where's our survivorship rule on phone? Go at it this way. Into our survivorship strategies. Here we go. We have phone. What we do is we add a filter on, let's see, we want to do it on type and home. We could have, let me actually drop this. The number will survive on last updated date for home phones and then we would define another one. The number will just, let's see. I don't have max value when we have business. What you would do is you just keep creating survivorship objects and the changing the filter. For home you'd have one rule and for business you could have another rule and that's how you would do that.

 

Chris Detzel (44:40):

Great. That was that's intense. I like it. If we change the survivorship rule in the middle of the load, would that affect the output in the SQS queue?

 

Joel Snipes (44:55):

Yes. The SQS queue is going to publish your operational value and you're changing how the operational value is calculated. I would not advise changing that in the middle of a production load. You're probably going to want to wait. Do it after or think about it before.

 

Chris Detzel (45:15):

Great. Can we stop sources sources capital FOR, capital OV to filter outsources as it goes through a fallback strategy?

 

Joel Snipes (45:33):

Let me read that one.

 

Chris Detzel (45:34):

Can you see it?

 

Joel Snipes (45:44):

Stop sources for OV if it goes through.

 

Chris Detzel (45:50):

Gene, if you have some thoughts there.

 

Gene (45:54):

Yeah, I can clarify this. My thing about this is let's say you have a survivorship rule that uses sources for OV and I have a fallback strategy to it, what I've been experiencing is that certain sources will not go through the fallback strategy. Let's say if I ignore the word two sources, those two stores sources are still being ignored in the fallback strategy. What I'm trying to say is one, get to stop that filtering as it goes into the fallback strategy, is that possible?

 

Joel Snipes (46:43):

I see. You want to ignore the sources for the initial strategy but if it were to go to fallback, you want it to consider the sources once more. That is not possible right now. What I would say is rather than ignoring those sources, don't put them in your source list, which would have almost the same outcome. When you look at this list, the winner, if you have a source system you want to keep out consistently and they're not in this list, without using the ignore flag, if you just keep them out of this list and define a fallback, then they'll be considered. But if you have the ignore flag on them, they're always going to be ignored all the way through the chain, not just at the top.

 

Gene (47:41):

The source's URI order that means but let's say Reltio constraints at the top and my next sources is Twitter, if I don't have that first one, it will automatically go to the second one, right?

 

Joel Snipes (47:56):

Yep. That's right.

 

Gene (47:57):

But what if, is there something else with a higher cardinality that we want? Because there is one instance that this doesn't work out for us, I guess. That's why I was wondering if there's a way to shut off sources for OV.

 

Joel Snipes (48:27):

I think I need to get a little closer to your example of when it doesn't work. But say LinkedIn is always the source you're ignoring, you could remove it from this list. If LinkedIn is the only one to provide a value, if it's configured like this, no winner will come across. It just, you won't have a winner but if you add a fallback and you don't have LinkedIn listed here, LinkedIn can still win. If that makes sense, I think that might be an approach. But if you want to give kind of more specific example with maybe some data and more details, that might be a good problem for the community as well. I'd be happy to take a look.

 

Gene (49:13):

Okay. I'll try to come up with an example.

 

Chris Detzel (49:21):

Great. Thanks, Gene. Next question, when we add filter criteria and survivorship, it considers both OV and non OV value. Is there a way that we can control it? The way we do it on an API filter or in match rules?

 

Joel Snipes (49:40):

The filter is pretty limited for survivorship. It can only do an equals where with API filters and match rules, there are a million options. But to think about what you're asking here is, is can we consider the OV in survivorship when survivorship is calculating the OV? There is no OV at the time of calculation, it's the OV is being determined. I don't think that's definitely not possible. We can't use the OV to determine the OV but I agree that the filter can be quite limited. A workaround I like is to use flags. In your ETL, you can in a more advanced way, choose a winner and set a Boolean flag. The true or false, a primary flag or preference indicator, something like that. And then use other attribute crosswalk winner to follow it. That is one work around on how you might be able to do in a more advanced filter.

 

Chris Detzel (50:56):

Cool. And it looks like the last question for pin and ignore, do they get overwritten at any time?

 

Joel Snipes (51:05):

That is another good question. They don't get overwritten unless you directly manipulate them. But an interesting scenario is before we brought these two Loris together, for example, what if they both had a pinned answer and those answers are different? How do we decide the winner? And that could lead to two results in the left side but there also is a way to prioritize within the pin and ignore and that's a little bit advanced. I am not finding, here we go. This is the page you would want. If you're interested in this topic, you're going to want to look for this pinned or ignored attribute merges. And there's a matrix here and there three levels of priority within a pin.

 

Joel Snipes (52:06):

There's the ignored pin and neutral. If you have a pinned pin, I know that the silly string, and a neutral pin, the pinned pin would win. And there's some examples here kind of detailing. I'm going to put this in chat. Definitely an advanced topic. It can be a bit tricky but the only scenario you really need to think about your pins being overridden in are in merged records. And your survivorship should be catching 98, 99% of the correct winners. Your data steward should only be manipulating a handful. If they're doing more than that, you want to reconsider your rules. The odds of two pinned records being merged, that shouldn't be happening a whole lot but there is a feature that handles for that. If you need to dig into it, check out that link.

 

Chris Detzel (52:59):

Thanks, Joel. If you want to go on to the next section, there's no other questions at the minute. I would say that if you enjoyed the webinar or if you have other ideas of other webinars that we can do, please let me know. Certainly want to hear your feedback. Post that in the chat and I always kind of give this up to the executives and others.

 

Joel Snipes (53:24):

I think I've worked through most of what I had. I do want to, if you look at the slides after this call, I put a link out to Postman and JSON Editor Online, the two tools I used, if you're unfamiliar with them and a link out to the survivorship docs, if you want to read more on the topic.

 

Chris Detzel (53:42):

Something I like that you did, Joel, was actually used the doc site to put in some of your code and everything else. I think that was really cool. And by the way, this webinar will be shared with the PowerPoint. This doc will be shared as well. I put that link directly into the chat. Any other questions for anyone? Any other thoughts, Joel, that you want to kind of?

 

Joel Snipes (54:17):

Well, you gave me one idea where you said copying from the docs. which is something I do a lot.

 

Chris Detzel (54:22):

Yeah, I like it.

 

Joel Snipes (54:22):

One other thing that I will add to this resources page is we have our out of the box data models. And when you're looking at how something's been implemented or could be implemented in terms of anything, not just survivorship, but survivorship too, coming here and looking at how our out of the box models work is a absolutely great place to steal code from. Let me put a link out to that.

 

Chris Detzel (54:52):

Joel's just on the fly kind of guy. Like it. Just put in there, so you don't forget. All right, well, thanks everyone for coming. Really do appreciate it. And Sunchen, really appreciate the feedback and Joel, he did a great job and he put a potential webinar that we could do in the future. We do have lots of webinars coming up in the next, I think one a week, all the way through October. And I have October starting to get booked up as well. Keep your ideas coming so that I can push us to kind of do those so that you get value out of the community.

 

Chris Detzel (55:30):

If you have questions that didn't get answered and want to get answered, please go to community.reltio.com, post your questions there and we'll make sure that we get you the answers that you need. Joel will happily specifically around these topics, match and merge and survivorship, can really help in that kind of stuff. Again, thank you all for coming. Really appreciate it and we'll see on the community. Thanks, Joel.



#communitywebinar
#Survivorship
#CommunityWebinar
1 comment
3445 views

Permalink

Comments

10-27-2021 12:04

Great and informative webinar, highly recommended.