Reltio Connect

 View Only

Enhancing Data Integration: Address Cleansing Techniques with Reltio Integration Hub

By Chris Detzel posted 06-13-2024 16:31

  
Enhancing Data Integration: Address Cleansing Techniques with Reltio Integration Hub

Find the PPT Here: Enhancing Data Integration: Address Cleansing Techniques with Reltio Integration Hub

Welcome to another insightful episode from the Reltio Community! In this session, we dive deep into the world of data integration with a focus on address cleansing techniques using the Reltio Integration Hub. Join Chris Detzel and Dan Gage, Principal Solutions Consultant at Reltio, as they guide you through the best practices and advanced methods for cleansing and standardizing address data.

Agenda:

  • Introduction and Overview
  • Importance of Address Cleansing in Data Integration
  • Techniques for Address Cleansing
    • Default Configuration
    • Using APIs for Address Cleansing
    • Configuring Multiple Cleanse Configurations
  • Detailed Walkthrough of the Reltio Integration Hub
  • Leveraging Locate Services for Enhanced Accuracy
  • Real-World Examples and Use Cases
  • Q&A Session

Key Highlights:

  1. Default Configuration: Learn how Reltio automatically cleanses and standardizes address data using default configurations.
  2. APIs for Address Cleansing: Discover how to leverage Reltio's API services to cleanse and standardize data before it is loaded into Reltio.
  3. Multiple Cleanse Configurations: Understand the process of configuring multiple cleanse configurations to handle different data sources and requirements.
  4. Locate Services: Explore how the Reltio address cleansing service, powered by Locate, ensures data accuracy and consistency.
  5. Advanced Techniques: Dive into advanced parameters and tuning methods to improve cleansing results, especially for regions like India and China.
  6. Practical Demonstrations: Watch step-by-step demonstrations of configuring and testing address cleansing using Reltio Integration Hub, Postman, and scripting tools like Bash or Python.

Full Transcript: 

Chris Detzel: All right why don't we go ahead and get started? So thank you everyone for coming to another Reltio community show. And this one's called enhancing data integration, address cleansing techniques with Reltio integration hub.

Chris Detzel: And so Dan Gage, he's a principal solutions consultant here at Reltio. So Dan, welcome back. Thanks Chris. You're welcome. So the rules of the show, keep yourself on mute. All questions should be asked in the chat or you can take yourself off mute and ask there. As usual, these are recorded and posted to the Reltio community.

Chris Detzel: And I will send the follow up. To everyone that attended and registered. Next slide. And as usual, we have a jam packed summer full of community shows and one or two that I still haven't even pushed out to the community yet. [00:01:00] But today's show is enhancing data integration with Realty Integration Hub, and then we have two, two more shows specifically around Realty Integration Hub as well next week, and then the following And then we have a show around if you're a life sciences company, you'd be very interested in this one, patient centricity, a pharma industry trend, defining new ways of patient data management.

Chris Detzel: And then on July 9th, We are very excited to show that we are we've got a new kind of product called Roteo Business Critical Edition. So it's enhanced security and resilience, more to come on that. And then on the 11th, really excited to show a new kind of integration with Roteo Data Pipeline for Databricks.

Chris Detzel: That's really exciting. And then one on Improve data discovery with Reltio integration for Calibra. So that should be really exciting. If you haven't signed up for those, please do. Next slide. I'll post this in the chat, but we have a [00:02:00] conference coming up called Data Driven 2024, and feel free to scan that and you'll get these slides as well.

Chris Detzel: And then I can get you 200 off. If, since you're part of the community, just put in the community code on there. That and then I'll put that, put the information there in the in the chat. These are all the customers that and or companies that are presenting as of now. Really excited.

Chris Detzel: And that's all I have, Dan.

Dan Gage: Hey, thanks, Chris. So what we're going to be focusing on today is primarily around address cleansing. So the idea is when you have data that is brought into Reltio, Reltio is going to automatically cleanse and standardize that data using the default configuration within the Reltio cleanser.

Dan Gage: But your existing license also entitles you for the opportunity to take that external data. And if you choose to not load that data into Reltio, but instead to leverage that address cleansing [00:03:00] service, you can just leverage those APIs. And we're going to go through a couple of different techniques on that today.

Dan Gage: To bring that data into Reltio service to cleanse and standardize it and then bring it back to your source via those APIs and a couple of things we'll talk about is locate. So the Reltio address cleansing service is powered by locate behind the scenes. So that's something that for any of you that weren't aware of that, it's completely transparent to you.

Dan Gage: Reltio maintains and operates that instance of located again. It's typically included as part of your license, whether you're using U. S. based records or international records. your entitlements. You can certainly work with your customer success team if you have any questions about what regions you're entitled to.

Dan Gage: But all U. S. based data would be included by default for all customers. You can also leverage the, as part of your testing process that we're going to talk about today. There is a public website here, support. locate. com. That will allow you to go through and test the locate service outside of Reltio.

Dan Gage: So if you're getting results that [00:04:00] are inconsistent with what you're seeing in Reltio, you've got some techniques there that you can test the default configurations. And we'll look at that. So what we're going to focus on today is configuring the address cleanser. And what you're going to do is you have the ability to have multiple different, what we call cleanse configurations within your Reltio tenant that gives you the ability to have different thresholds that are leveraged.

Dan Gage: So again, there's a default configuration that's typically used when data is brought into Reltio, but when you're leveraging the APIs, you explicitly have the opportunity to tell Reltio, I want to cleanse this record via an alternate configuration. Which uses different parameters, different thresholds, different mappings.

Dan Gage: And again, we'll go through some examples of that here shortly. The testing can be done directly through the Reltio console. So as you configure the Reltio cleanser to leverage your specific parameters, the locate service is still being used behind the scenes, but those parameters will be passed into that service.

Dan Gage: And again, you can do that [00:05:00] directly through the Reltio UI. That service, of course, is going to be available via an API. So you can also test that with tools like Postman. We're going to go through a process today where we're going to use the Reltio integration hub to take an external file. In our scenario, we're going to grab that file from an S.

Dan Gage: F. T. P. site. We're going to pull it in. We're going to cleanse it and for demonstration purposes. We're just going to show you the results inside Reltio integration hub. But you can certainly append that data. To the original source file or write it out to a new file based upon your business needs. And then you also have the ability to leverage those APIs through scripting tools like bash or Python.

Dan Gage: And we'll show you an example of that today. So we're we're going to show you the results as part of our community deliverables. We will provide you the Reltio integration hub recipe, as well as the bash recipe that we're going to use here today. But again, I would emphasize that, the Reltio integration hub is a tool that Reltio provides scripting tools like bash or Python or something that would be supported [00:06:00] by your organization.

Dan Gage: So we would provide the API services and ensure that they're working correctly. But it would be up to your organization to decide how you do that scripting. So I will provide the example here today, but that's purely for guidance and direction. So looking at that cleanse service. So first thing that we're going to point out is configuration of the Reltio address cleanser is done through the JSON here today.

Dan Gage: So While there's a wide variety of configuration options that can be done directly through the Reltio console, such as adding attributes or defining match rules, the Reltio Address Cleanser is something that still requires you to do your base configuration in the JSON file itself, and you would access that by going into your Reltio console data modeler, and you'll see there's two options on the left for you to download the configuration, Which allow you to bring that Jason out of Reltio where you can tweak and change any of the configuration options and then to import [00:07:00] that configuration back into the instance.

Dan Gage: And here at the bottom of the screen, we've got a link. That's going to highlight the the base cleansing services and the general parameters that are available. So we'll go into some specific examples here. Around defining some input mapping and some optional parameters, but again, all of that is going to be fully documented.

Dan Gage: Through that link on the Reltio documentation site. So the 1st thing we'll talk about is the service that locate provides has default parameters that locate can support for both inputting the data that you're choosing to cleanse as well as the data that's returned. Your default things that you would certainly expect is to have the cleansed and standardized version of the address city state zip.

Dan Gage: But there's also additional parameters that can be brought back to enrich that record. So things like latitude, longitude, geocoding, RELTI also supports the ability to return a time zone to return census data with regards to a congressional district or [00:08:00] demographic data for the regional territory based upon that census evaluation.

Dan Gage: And then, if you're leveraging services like CAS which is the the U. S. Postal Service for Advanced Address Stenzing, you also have additional fields like Residential Delivery Indicator. That can be optionally provided if you're leveraging the cast subscription. So what we're focusing on is the ability that you're mapping data from your existing tenant.

Dan Gage: So you can see here the attribute URI that's indicating where data is coming from inside the Reltio tenant when a record is created and where the output is going to go. Is going to come from with the locate service. So there's some locate parameters like delivery address 1 delivery address to time zone name.

Dan Gage: Those are the pieces of information that are going to come back from locate in this output mapping is going to allow you to define a default. Set of mappings between the locate data and where you want to put that data in Reltio. So if you're leveraging the Reltio velocity [00:09:00] packs, you're again, you're going to have these cleansers are going to be predefined.

Dan Gage: But in the case where you may choose to use an alternate address structure, or if you create a new entity type that you wish to do cleansing on, then the ability to use these output mapping files is very important. So one of the things I want to focus on here is the idea that within the cleanse configuration, you can define a global mapping.

Dan Gage: So in this case, I've called it mapping one, and it's an output mapping. And when I go to the next screen here, what you're going to see is you have the ability when you're creating a specific cleanse configuration. In this case, this one's called default. And you see the name that's highlighted in red here becomes very important because when you're leveraging those APIs, you're going to want to explicitly tell Reltio which cleanse configuration to use when cleansing the data that you're providing.

Dan Gage: And you see here at the bottom in the mapping, the input mapping is explicitly being defined that data is going to come in via field called address input, and it's going to be passed to [00:10:00] the locate parameter address. A value is going to come from the country field and be passed to the locate field country.

Dan Gage: And you'll notice the mandatory and parameter for the address input is set to true, which means that this cleanse config will only be executed If an address input is provided, if no address input is provided, then this cleanse configuration will be effectively skipped because it is a mandatory parameter.

Dan Gage: However, country is not a mandatory parameter. So if it's not provided, locate will make its best effort to estimate the country based upon a couple different things. So you have the ability you're going to see in the next screen where you can set a default country. But locate will also attempt to parse that country from the address input.

Dan Gage: , one last thing to point out here is the idea that you're chaining together these cleanse configurations. So in your Reltio cleanse config, your your configurations are in a sequence that's a chain of the different cleanse configurations that can be looped together. And then [00:11:00] you'll see this parameters here.

Dan Gage: It says proceed on success or proceed on failure. So where you define multiple cleanse configurations. You have the ability to indicate whether or not it's gonna leverage the 1st configuration. And if that 1st configuration does not have the required parameters, then it can automatically fall into the next configuration within that chain, which would still be within your default configuration here.

Dan Gage: So you have the ability to create multiple different configurations. And then within any configuration, you can chain together different mappings that allow you to look for data in different incoming fields.

Dan Gage: So when you look at the parameters that allow you to optimize how that data is going to be accessed, the first one that I want to focus on is return data by status. So if you're going in and you're tuning your address cleansing by default, it's only going to enrich the records with additional data if the verified status is if the verification [00:12:00] status is returned as verified.

Dan Gage: And for testing purposes, there may be additional reasons why you may want to enrich partially verified, unverified or ambiguous addresses to make sure that the verification status is being brought back. And in many cases, an ambiguous or partially verified address may be able to extract country or even city or state information from an address and do a partial enrichment.

Dan Gage: So that is something you'll want to consider as part of your cleansing strategy. And again, that's one of the reasons why. You can sometimes chain together multiple different configuration settings. To allow you to account for the different verification statuses that will be possibly suggested.

Dan Gage: So you'll see that you have some optional parameters in here around setting your default country, your licensed countries. It is your responsibility to make sure that you're leveraging the records that you are entitled to access. So for United States data, that's going to be supported for all customers, but for customers that are also leveraging data [00:13:00] outside of that region.

Dan Gage: You if your entitlement allows you to support that, you can go in here and you can add additional countries to your license configuration code. If it is not in a licensed country. So if, for example, I was bringing in data from Brazil, if Brazil is not in my licensed country list, and I have an address that explicitly says that it's from Brazil.

Dan Gage: Reltio will not attempt to cleanse that address because it's not in our list here. So be aware of that as part of your configuration, and you can again, of course, have different configurations to support different countries. And that's again where that chaining together may be very relevant. Yes, please.

Chris Detzel: Can I ask two questions? Sure, please. All right. How can we cleanse address for India and China regions, we are using locate feature, but not satisfied with kind of the result for countries like India and China, can you help us with any of that. Is there another option or is there.

Dan Gage: So a couple of things that I would emphasize is using some of these additional [00:14:00] parameters.

Dan Gage: So you can see in the process here, there's a pre processing environment. There's enrichment, there's CAS certification, there's verification, geocoding, using these different parameters and then using primary aliases, using duplicate handling masks. So again, there's definitely some advanced parameters that you see.

Dan Gage: There's a link down here at the bottom of the first link says address cleansing parameters. So there's additional details out there where you may be able to tune the cleansing engine for certain regions to get better results. And in some cases, when you get into rural areas the postal data is just not good enough that you may always get only partially verified addresses or addresses verified only to a locality.

Dan Gage: And we'll see some definitions that on the next slide of what the level of verification is being returned. But in some cases, you can enhance your, the results that you get for different countries. And again, this is where chaining together can be beneficial is when you look at the one configuration.

Dan Gage: One [00:15:00] configuration could be US, Canada, and France, and you could have a different configuration that looks for country code in India, as well as China, and then being able to use different parameters, different verification status mappings for those different configurations again is definitely something that can be done to try and enhance your enrichment in those countries.

Chris Detzel: Thank you. Can you quickly provide like a real world example where chained cleanse configurations are used?

Dan Gage: Yeah, and I think the example we just gave there is where you may want to use different parameters for different regions, and that's going to get into your verification status mapping. You'll see here that the verification status that is assigned to a record is based upon this verification status mapping.

Dan Gage: If this field is omitted from your cleanse configuration, it will use the locate default mapping. But you have the ability here to based upon those address verification codes that locate returns, you can set up how that information is going to be scored. [00:16:00] And so you'll see here that it's a four part result in that address verification code.

Dan Gage: The first portion tells you the level of verification that locate was able to do. The parsing technique that it was used whether or not it was able to pre process that data to produce the results that were expected whether it was able to find postal code data and very importantly here, this last number sequence is going to represent the match for the level of confidence that locate had in the cleansing that was done.

Dan Gage: And then looking for these for the first two sections for the verification and the parsing, there's going to be three digits. So the first one is going to indicate the verification status. So whether it's verified, partially verified, ambiguous or so forth, the 2nd digit is going to tell you the level of verification.

Dan Gage: So 5 is going to be to a delivery point, which is specifically a sub unit within a building. If you're in an apartment complex, it could be the apartment number. If you're in an office building, it could be the suite or unit [00:17:00] number. In that case, if you're able, if located is able to determine. That specific delivery point, you will get a V5 or even potentially a P5.

Dan Gage: If it's able to discern that there is a sub building designator, even if it's uncertain about other factors, you could potentially get a level 5 verification, indicating that it's identified it to a delivery point for is going to be to the premise level, which is basically your rooftop. So it knows that it's going to get delivered to the right building.

Dan Gage: But within that building, it may or may not have. Additional details necessary to parse and identify a sub delivery point. Now, in some cases, that could be because it's just a residential address and it's a single unit dwelling. But in cases where you have an apartment complex, but if an apartment unit is not specifically designated, it will simply indicate that it's a premise level.

Dan Gage: Now, if you're using cast certification. You do have the ability to have parameters that tell you that this is a multi unit dwelling, [00:18:00] and if it is a multi unit dwelling, but you only have a premise level verification, then that does allow you to identify where you have verification that, that is potentially not detailed enough for assured delivery.

Dan Gage: So being able to use things like the Reltio Data Validation Framework, To identify records where it has only been verified to a premise level. But if the delivery point verification system indicates that it is a multi unit building. You can leverage those parameters that are coming back from locate to identify.

Dan Gage: When a subunit designation should have been possible, but is not currently present, you can identify that with the data verification codes. And again if anybody is looking for more specific details, I'd encourage you to post a message into the Reltio community and we'll respond back to you with additional details.

Chris Detzel: So there are a bunch of questions, but we'll get to those after some of this.

Dan Gage: That's fair. Yeah. If there are questions around the configuration process, I think we can go ahead and take those now. And [00:19:00] then we'll get into some testing and examples.

Chris Detzel: Okay. So does locate consider the change of address in US and return those address parameters after standardization?

Dan Gage: Yeah, so that the national change of address registry is not currently leveraged by locate using the Reltio default services. Is exploring the ability of providing those change of address services as part of our platform. It's not yet available. Today, it's only going to cleanse and standardize that address independent of who actually occupies that building, whether it's an individual or an organization.

Dan Gage: Locate is just going to standardize based upon the building that exists. Independent of the occupant. So the answer there would be, no, it's not going to take into consideration change of address data. It's only going to take into consideration the the building unit itself.

Dan Gage: Yeah keep those questions coming in and we'll keep addressing them where appropriate. And then certainly we'll wrap it up at the end and come back and address them all. Things to point out is I mentioned the the fact that when you're [00:20:00] defining those cleanse configurations, Reltio's velocity packs are going to deliver 2 configurations by default.

Dan Gage: 1 that's going to be labeled as default is going to expect all of the information to be concatenated together into a single field called address input. The other common approach is to we call other is where the information is already pre past pre parsed to the ability that your source data can map it in as address line one address line to city state zip postal code, whatever information is specifically broken out.

Dan Gage: If you already have that level of designation, sometimes that's. A more accurate way of allowing locate to to find hints as far as being able to parse and identify that address correctly. In situations where you have the address information in a single field, you would leverage that default configuration and just pass all that data into a single parameter.

Dan Gage: And where you have the ability to break it out, you'll be able to pass that in [00:21:00] using an alternate configuration. And again, by default, Reltio is going to chain these two options together. And it's going to check first to see if you have all that information can data together into the address input field.

Dan Gage: And again, that would be a mandatory field in that configuration. So if that field is not populated, it would then fail over to that other configuration where it expects it to be in the individual fields. Again, just showing you here that in the Reltio data model, or you have the ability to just copy and paste and put an address in there and run that test where you can see based upon your configuration, whether it's able to break out that data.

Dan Gage: And again, you'll see an example here where it says 6601 Dorchester number 12. And we were able to parse that out and say, yes, address line 1 should actually be 6601 Dorchester road apartment number 12. So that hashtag 12. Is indicating an apartment unit, and you'll see, it's also going to take that data and put it into a sub building field.

Dan Gage: That was apartment 12, if this was an [00:22:00] office suite, it may call it suite 12 or unit 12. Depending upon the the building designation. So again, just emphasizing that there's a wealth of additional information that can be brought back things like census indicators, census codes, the sub building indicators, the latitude, longitude, the ISO codes for your country's zip for information.

Dan Gage: Again, all this would be controlled through that mapping process. Of leveraging those parameters that are available from locate and again, many of these fields are going to be pre mapped as part of your velocity pack. But if you're using an older configuration where they may not have been there, you can certainly go in and extend your existing environment.

Dan Gage: And add those fields,

Dan Gage: the next option would be leveraging that API directly. And here you can see in the postman tool. We're just going in and we're leveraging the environment that you're accessing. Whether it's your dev test prod. Whether you're in a European tenant or U. S. based tenant [00:23:00] that would control the environment and then, of course, your specific tenant I.

Dan Gage: D. and then the U. R. L. that we're leveraging here is the batch cleanse. And you'll see that there's a link here at the bottom to the Reltio developer page that gives you the swagger definition of all the parameters that are available to be passed into that API, as well as the expected schema. And in this case, we're passing in that information into an address input field.

Dan Gage: And on the right hand side, you can see that it's parsing and enriching it. In this case, we're asking only that the the label. be returned, but you could certainly allow it to return each of the individual fields, city, state, zip, zip four as individual return parameters. And again, that's all done via the the input here.

Dan Gage: It's actually a little bit off the screen on this URL, but we'll see some examples of that in the the demonstration.

Dan Gage: The Reltio Integration Hub gives you the ability to have a low code, no code environment where you're orchestrating [00:24:00] this process Through the Reltio integration of interfaces. So the first thing to point out is that the Reltio connector within the Reltio integration of that connects to your tenant has predefined actions for many of the tasks that you would perform within the Reltio hub.

Dan Gage: So it would be things like create entity, create Reltionship. Looking for additional data, leveraging certain services, but the address cleansing service is not defined as a prebuilt action. So in order to leverage this service, you would leverage what was called a custom action within the Reltio integration hub.

Dan Gage: So you're using the standard Reltio connector, but you're choosing that custom action option. And then you would use a post method. And then you would bring in the URL. In this case, I'm using an environment called G U S sales. And my tenant ID begins with P L X. So you would put that path directly into that custom action path.

Dan Gage: And then you would also, because you're using a custom action, you [00:25:00] would need to define the input and output parameters. So in this case, I would just get a sample schema that I'm going to use for passing that data in. And within the Reltio Integration Hub, I would use JSON to just provide that sample schema.

Dan Gage: And Reltio will generate the array attributes and values so that you can use that drag and drop low code, no code methodology within the Reltio integration hub to source the data from your incoming file into that structure. And likewise, you would take the resulting structure that Reltio returns that data in and you would import that into your response body so that you can map that data back out.

Dan Gage: To the appropriate place where you want to return that date we're also going to see an example here where you can use scripting tools. In this case, I'm going to use just a command line tool called bash shell script, and you can use the curl operator where you could pass into again that same API, the batch cleanse API.

Dan Gage: You would provide a header parameter [00:26:00] with the the Reltio token and again in that script example that will provide. I will include the the sample code that requests a token specific to the Reltio environment and then uses that token concatenates it back together when calling that data from the address input into that tenant.

Dan Gage: So again, we'll see an example of that here in a few moments. And the last thing to focus on is just our demonstration. So before we jump into that demonstration, Chris, do we want to address any questions about the process? Yeah let's do it. So for default and licensed country parameter, I see that you have U.

Dan Gage: S., and this was going back, United States, and CAN and Canada. Is that because the list is trying to mimic all the variations that may appear in my data source? Yes. Yeah, that's correct. So when you're passing data into that country field, if the data that you've mapped in is matching one of those existing fields.

Dan Gage: Then it's going to attempt to, to use that country in passing that data over to [00:27:00] locate. And again, where that becomes important is if you have multiple cleanse configurations that you have chained together, and you want to use that country code to, to basically guide the Reltio cleanser to using a different configuration.

Dan Gage: For different countries, you have the ability to define those there. You can also, of course, that parameter can be leveraging the Reltio reference data management solution to standardize that incoming value. So if you have country codes that are coming in via your ISO numeric. So if you have a country code, this is eight four zero that represents the United States being able to map that via RDM to us is supported.

Chris Detzel: Can we somehow in the config as to get back the locate address ID of the locate address reference data against which our address was verified?

Dan Gage: I don't think that's supported to my knowledge. That's definitely something where if you post that into the community, I can work with our product management team.

Dan Gage: And find out if that detail is available. [00:28:00] But to my knowledge, it's not the metadata that locate is using is effectively behind the scenes and that it presides, presents the results.

Chris Detzel: Yeah, I just got a community. Reltio. com and post there. Will this data be persisted in the tenant as I noticed API returns entity URI?

Dan Gage: Yeah, that's a great question. So it is going to return a URI, but it is not going to persist the data. So when you're leveraging that batch cleanse API, you're effectively doing a similar to an external match. You're allowing that data to come in. Reltio is going to assign a URI to it just for the purposes of logging and tracking within the Reltio platform.

Dan Gage: But that URI would not be available in your tenant. If you attempt to access that record via the URI that got returned via that batch cleanse, you would not find that record stored within the Reltio platform. So this is just leveraging the API service to to take the data outside of Reltio for your needs.

Chris Detzel: And last question for now can locate provide the [00:29:00] output in English after cleanse the address if the input comes in German or non English characters.

Dan Gage: Yes, so you do have that ability. There are some optional parameters that you can put in there. I don't have that in any of our examples here today, but locate is definitely does support if you're bringing in data in simplified Chinese or traditional Chinese characters.

Dan Gage: It will attempt to cleanse and standardize that using the the same language or same character set that's coming in. But there are some optional parameters where if the metadata is available, it will basically transcode that data into a Latin based character set. If the Medicaid is supported, it'd be interesting because we had locates CTO on about a year ago to talk about their roadmap.

Dan Gage: So maybe we'll I'll try to push to get them on again, to see what that roadmap looks like. Sure. A lot of interest. I think. Thanks, Dan. All right, great. So let's take a look at this. So a couple things, some of those websites that I referred to [00:30:00] along the way. The Reltio address cleanser you see here again, that link is was in the the slides that will be provided to you.

Dan Gage: And you can see that being able to configure the tenant the address cleansing properties via CAS SERP understanding those address verification codes. All that is here. It's very well documented. This is a very commonly used service within the Reltio platform. So it's well documented and we provide you some direction to that.

Dan Gage: I mentioned that locate service directly. In any case where you may be cleansing an address and you're not getting the results that you expect from Reltio, you can always go directly to locate and you can just again, this website is in those slides that we provide, we'll be providing, and you can just put in that data directly and have it passed to locate, and it will attempt to parse and process that record.

Dan Gage: Using the default locate parameters and again in some cases you'll get an address that just doesn't cleanse and sometimes you wonder whether, is it because something is not [00:31:00] configured properly or is it because it's just not a valid address and going directly to locate allows you to bypass whether it is a configuration option or whether it's just a bad address.

Dan Gage: You can also sometimes just take that address And send it over to Google. So if you bring in an address here and just, I right click sometimes and just do a Google search for it and look on the Google maps to see whether or not that address is coming back. And that lets you know whether or not it's a configuration issue or whether that address just doesn't exist.

Dan Gage: So I did mention again, being able to take that address and put it directly into your Reltio console and being able to run that and seeing those results coming back. And again, you'll see that there's a wealth of additional data that's available that you may or may not be taking advantage of. So things like time zones, transaction codes, MSA codes all that data is available is part of that configuration.

Dan Gage: And we can also include this default configuration that we're using in the [00:32:00] deliverables here for the community show that you can download. So here you can see, zoom in on that a little bit. This sample here where you're defining a default output mapping where the parameters from locate are being mapped directly into your Reltio data model.

Dan Gage: And then when you get into the information here, the infos is the individual cleansers that are being defined. So here's my default cleanser. And here's my other cleanser. And again, you saw in the default cleanser in those previous examples, the data was all coming in via address line address input. And if that data from your original source is coming in via address line, 1 address line to a country administrative area via your state province or city, you can see where the parameters within locate might be administrative area or locality.

Dan Gage: But in your Reltio tenant, it may be called city, state, province et cetera. So being able to map those based upon the data that is available to you as your input, [00:33:00] the more data that you can pass into locate, the more likely it is to cleanse and standardize that. And again, in this case, we're using the input mapping specifically for this cleanser, but we're using the output mapping reference.

Dan Gage: That's pointing back to that mapping one that was defined in that top section here in the mappings themselves. So this is just helping you understand where you can define that mapping one time in the mapping section and then leverage it multiple times as your output mapping reference. The parameters that we talked about, the verification status, that address verification code coming back from locate the verification status mapping is using a regular expression.

Dan Gage: So here in this case we're being a little bit loose, but we're going to call verified anything that starts with the letter V followed by a dot meaning any characters. Occurring any number of times are going to be classified as V, but if we have an address verification, that's partially verified, followed by a 234 or [00:34:00] five, if it ends with a 90 or 100, then we're indicating here that we're still going to call that verified.

Dan Gage: So even if it's only partially verified, according to locate. If locate has a high degree of confidence that it's 90 percent confident or above, then we're still going to allow that to be classified as verified. So this is where, when you're getting data for other countries. Being able to go in there and say, hey, for partially, or even unverified addresses that locate is 100 percent or even you can go in here and say 95 percent confident.

Dan Gage: We may still want to classify that as verified. And again, based upon that status. Of your verification mapping, you have the ability to control is data going to be returned back to my Reltio tenant. So in cases where records are getting partially verified, that information may not be returned to your tenant.

Dan Gage: So you can either adjust that verification status mapping, or you can just say, you know what, even if Reltio default configuration [00:35:00] says that it's partially verified, I still want to take whatever data is coming back. And then based upon my output mapping, bring that data back in and enrich it. And of course, it will be associated with that Reltio crosswalk that's returned.

Dan Gage: So whether you're doing this for data that's being persisted within Reltio on a new Reltio crosswalk, or whether you're leveraging that API that's going to return that data, you're controlling whether or not the data will be returned via this return data by status parameter. All right, so based upon that, we're going to go into the Reltio integration hub, and we're going to look at that first process.

Dan Gage: In this case, we built out a recipe in this case, I'm just triggering it every 5 minutes. It's going to go out and look for a file. And I've just simply created a parameter here that just says, it's called sample addresses. 7. We're going to connect to our server. We're going to grab that file, parse it down and then loop through each of those records.

Dan Gage: Now within Reltio Integration Hub, we do have the ability to optimize the task utilization [00:36:00] by using this batch option. So in the case where we're processing through those records, I'm indicating here via options of Reltio Integration Hub that I want to pass 10 records at a time to that Reltio API. So where that becomes important is when you look at your Reltio tenant and your utilization thereof.

Dan Gage: When I see here, I'm calling that batch cleanse via postman here. I have two addresses. And you'll see that those two addresses are both being returned. So from a Reltio entitlement standpoint, this counts as one API call. So that becomes very important if you're processing large volumes of data, is if you have passed only one address per API call, and you're processing a million records, then you'll consume one million of your allocated Reltio API calls.

Dan Gage: But by passing multiple records in a single API call, you're leveraging the Reltio service more efficiently, And it will affect your entitlement and your utilization quotas. So that's going to affect the [00:37:00] API quotas here, as well as your ability to leverage your Reltio integration hub task quota. So in this case, by batching those records together and sending multiple records via each API call, that counts as one API, but it also only counts as one task.

Dan Gage: Within the Reltio integration up. So in this case, you'll see where I have the the on the fly cleanser for Reltio, and I've mapped in that batch cleanse API. I imported that schema, and it's going to bring in the address input. Here we also brought in the country, and then the output. Here we're bringing in the data from that file, and we're just concatenating everything together.

Dan Gage: Address line one to string, address line two to string, and it's important to put the two s's On here, because if one of those fields is null, if it's missing, then the concatenation option here may potentially fail. By using the default 2 string [00:38:00] operation on a null value or a numeric value, it will still concatenate it together as a string to generate that full value.

Dan Gage: And then you can see in the response body, all the various different fields that could potentially come back are there available. And again, for our demonstration purposes, because we are passing multiple records over in that single API, because we want to be able to look at them in our results here, we are going to loop through those results.

Dan Gage: And again, this is for demonstration purposes, you could pass those records into a CSV and you also use the batch process as well. But again, those would be ways that you would optimize your Reltio integration of task utilization. So if I just sample test recipe, you can see that it's going to go pretty quickly here.

Dan Gage: It's going to go out, get that file, batch those records up, send them over to the Reltio API. And again, by clicking in here, I can see that the input is 7 records of a batch of potentially 10 records. [00:39:00] And the output is the records that can get moved through. And you'll see here that all of those addresses are being passed over and the results are coming back to provide those rich details.

Dan Gage: In this case, the Milton Street address in Pennsylvania. And again, for demonstration purposes, I've placed them into a list here where you can see what the original data that we were passing over to locate and how locate past that data back. So again, we'll provide a copy of this recipe that you can import into your instance of Reltio integration hub.

Dan Gage: But the process here, of course, is just connecting to that source data using any of the available connectors. You could be connecting to a sequel database. You could be connecting to an Excel sheet to Google Doc to box to any of your variety of different sources. So the Reltio A. P. I. Is going to be accessible, transparent to where that data is actually being sourced.

Dan Gage: By using the Reltio [00:40:00] integration of low code, no code environment, you're just mapping that incoming source data into those parameters. And again, you saw that where we were just concatenating those fields together from that inputting file. Whether that input is coming from a CSV file, as we're doing here today, whether it's coming from a SQL database, whether you're connecting to your sales force environment, however, you choose to get that data that's passed over to the Reltio cleanser is a previous step.

Dan Gage: Within this recipe. So I'll stop here for this recipe. Yeah, good. There's a lot of questions specifically about this. How can the cleanser output for address find one be changed. So for example, to exclude the apartment number and store and the address line to output. Yeah, and that's definitely a question that we get a fair amount.

Dan Gage: And the short answer is that locate is going to decide the locates parameters, how that data is going to be brought back. So if we go back into our [00:41:00] postman and right now I've got a select in here where you can see I'm selecting only the label to come back. But by removing that parameter, it's going to return everything back by default.

Dan Gage: And again, my token is timed out here. So I'm going to re get that token and reclaim that data. And you'll see here. Now we're actually getting all of those various different fields. You'll see that address line one does have that apartment concatenated to it. But if you want to get more granular by going down to other fields that such as street, you can see that the street there was Dorchester Road, the street number was 66 0 1.

Dan Gage: So if I don't want that apartment number to be there, I can simply look at the premise number and the street name. Independent of that full address line 1.

Dan Gage: So the delivery address is going to be [00:42:00] all that information pre concatenated. But if you look at those individual fields, and again, we can see this. If we go back into the Reltio console where that data was brought out. Here, we have 6301 Milton street. P. A. If we use that address and I'll just type it in.

Dan Gage: By default, the address line 1 is taking the delivery address. But you can see here that by looking at the premise number and the street name, which is Dorchester Road, by concatenating together only that street name and premise number, you can exclude that apartment building number from that. So it would just allow you to to concatenate that data a little bit differently in your mapping.

Dan Gage: So by default, the mapping back to address line one is going to take your delivery address, which has got all those fields pre concatenated. But by using an alternate mapping, you can control how that data is brought back. What is the [00:43:00] optimal value of batch size here? Yeah, and that's definitely going to depend upon the size of your records.

Dan Gage: For addresses the size is going to be Reltioly consistent of around probably about 100 characters, which once you format that into a JSON structure, is going to be maybe around 100K. So I think the numbers that I've seen is between 50 and 100 is going to be your ideal batch size. So the only negative consequence of doing larger batches is you run the potential for a timeout.

Dan Gage: So within Reltio integration hub, you do have some some abilities to set some of those parameters around what is your expected timeout whether you're running parallel tasks. The performance of your actual Reltio tenant. But the short answer is I would think I would start at around 50, and then if you go up to around a hundred, you're going to get a little bit better performance, but you need to put more error handling in there.

Dan Gage: And if I go back into my recipe here, you'll see that Reltio does support the ability to create error handlers. So in this case, by taking [00:44:00] those tasks steps that I was performing below here and dragging that up into that monitor section. It allows me to monitor for errors such as a timeout and to either retry that step or to just choose to log that step as potentially having performed an error.

Dan Gage: So the idea here is you can go a little bit larger and then account for error handling, or you can use a more conservative number where you're going to get more consistent process.

Chris Detzel: What is the upper limit of a batch cleansing address in a single payload? Is it size or number of arrays?

Dan Gage: Yeah, it's definitely going to be the total size of the record of the payload. So if you have addresses that are very large and complex, if you're passing a lot of information over, then the size of that payload is going to be more restrictive. Then the number of records itself. So if you can pass a thousand records and it doesn't exceed [00:45:00] the the optimal record the payload size, then Reltio is going to cleanse all 1000 of those records and send them back in a single task or a single API utilization.

Dan Gage: So again, it's something where you're going to get a little bit of trial and error, and if you want to public, if you want to post a message to the Reltio community, we can ask for a little bit more specific guidance from our product management team. But it's something you can definitely test out and validate, or you can work with your Reltio customer success team to get more details.

Dan Gage: Great. And although we're going to share the recipe with everyone, is the recipe available in the community library? That's a great question. I can certainly work with Chris to have it shared there. Not. The online community, but the library within. Oh, yeah. So no, I, by default, I don't think it's in the community library here on the left hand side.

Dan Gage: Yeah, so we will put it into the the Reltio community as a import a zip file. So you can go into your recipe lifecycle management, and there [00:46:00] will be a package will give you the ability to import that package into a new folder, and you'll be able to get that zip that you can download as part of the receivables that you have.

Dan Gage: And I'll send that an email tomorrow. What order does the cleanser work. So we also have noticed noise words, sorry.

Dan Gage: So absolutely. So yeah the Reltio cleanser is going to occur before matching takes place. You're. If you're matching on address information, which is very likely to be the case of the Reltio cleanser is definitely going to be applied before any of the matching is done.

Dan Gage: You can have multiple sequences that the order in which the cleansing is done again, whether you're bringing that data in via the address input, whether you're using parameters. Like the the country codes. To determine which data is being processed and again, I think that was in our top example here where we had that locate default country.

Dan Gage: So here we're only going to [00:47:00] process records from United States, Canada and France. But in the 2nd, 1 here, because we do not have that country code past, it will attempt to cleanse all addresses via this cleanse configuration. Whereas if you wanted a specific cleanse configuration for US, Canada and France.

Dan Gage: That uses this clarification for verified, but for Dominican Republic, you wanted to use a more stringent verification mapping, then you could simply set that up as a an additional option here. You could just come in here and copy that section, put a comma, add a new one in, and then we could bring that in for Dominican Republic.

Dan Gage: And in that case, I may take some of these other options here. And I may move those into the partially verified category. So therefore, only a an address verification code beginning with V for Dominican Republic would be considered verified. Any other [00:48:00] configurations that match these parameters now would only become partially verified for Dominican public addresses.

Dan Gage: Thanks, Dan. That's all the questions for now.

Dan Gage: Great. So let's move on to that last thing. I know we're running low on time, but in the scripting options here. So you can see, I have a script. It's just called address cleanse that S. H. and when I run that file, it's just going to echo those addresses back.

Dan Gage: And then it's going to say I'm done. So if we go out and we look at that script itself. Here you can see the results. My computer is detected that the data has changed. So I can reload that. And here's that data that just came back. And if I ask my editor to parse it out, you can see that it's parsing out that address using that same information.

Dan Gage: Note here with the address reference data management is determining what the ISO 2, ISO numeric, even though locate will be returning that data as well. Here, you can see that data is coming back from RDM, but the [00:49:00] script itself. Again, I mentioned, and I'll see if I can increase the size here.

Dan Gage: So the first thing I'll notice is that this is not my actual password. So you can, in this script, you can provide your username and password. It's going to leverage the Reltio authentication service. And you can see a sample payload of what an access token might look like. And Reltio is now using the JWT tokens.

Dan Gage: So they're a little bit longer. But then, again, I'm not super efficient with my shell script. So I'm using actually a couple of iterations Of parsing to get that JWT token from that authorization API to break it down so that when I call That batch cleanse API here You can see I'm calling the cleanse and I'm just passing the authorization bearer as that token that came from above.

Dan Gage: So you can see here, I'm just getting in this case, I'm reading the data from a [00:50:00] file called data dot CSV, and we can see that here's that data that's coming in. In this case, we're parsing out the file header, and we're just gonna ignore it. We're gonna bring all that address line data that's coming from that file, and we're just gonna bring it into that payload.

Dan Gage: We're echoing that address, which you saw coming onto the screen there at the top. And then we're just passing it over to that cleanse service. And if we scroll away to the right here, you'll see, we're just appending the data into that result dot Jason. So in this case, our scripting file is not making any effort to parse out the specific results.

Dan Gage: It's just taking the entire results. Jason. That was returned by Reltio and it's appending it into this file so we can see that those three address returns are just being returned as JSON. But if you get a little bit more sophisticated in your script here, you can parse out that data that's being returned from [00:51:00] Reltio.

Dan Gage: So in this case, instead of bringing that data into results. json, you could parse it out using something like awk. Which is another command line utility. So for those of you that are very script savvy, you can get more sophisticated. For those of you who are less script savvy, you can just take this example that we are providing here.

Dan Gage: You can put in your own credentials and again typically we would recommend that you leverage a service account. So within your security for that service account that's gonna be accessing this, you need a minimum of create. And update permissions on the entity type where that cleanse is being deployed.

Dan Gage: In this case, we're leveraging the default cleanser on the location object. So the user credentials that I have need to have a minimum of create and update permissions on the location entity, but it does not need to have any additional parameters. In cases where you're doing a service account, rather than using the base location object, [00:52:00] if you created an alternate object that would be used by your cleansing service APIs, then you can restrict the parameters so that service account is not able to perform other actions within Reltio, but is just able to access the service.

Dan Gage: that cleansing service. So I know we're towards the end of the time. We'll take a few more questions and then we'll respond to any additional ones through the community. Can we reverse the order if possible to custom cleansers? It's back to the other question that Brian had. Yeah. Within the Reltio matcher, the short answer is no.

Dan Gage: So the Reltio cleansers are going to be applied before matching. So when I define my match rules. If my match rules are using a cleanser here, any cleanser, so if you're using a noise word cleanser or word replacement cleanser as part of your matching rule, the Reltio address cleanser would be applied first.

Dan Gage: in creating that [00:53:00] record. So if we go and look at a record within this hub,

Dan Gage: oops.

Dan Gage: So as this data is brought into the system, in this case, you can see that the data, so this matching here that you're referring to, this would only be relevant for data that's being loaded and persisted within Reltio. The processes that we've talked about today are just bouncing data off of the Reltio Cleansing Service, the either Reltio Integration Hub, Postman, or the batch shell script, but for data that's being physically persistent and Reltio here, I have data that's originating from salesforce and that Reltio cleanser is appending this data.

Dan Gage: So this, these cleansed values would be applied 1st and then within your match rule, any noise words or exclusions within the natural would then be applied 2nd as part of the matching process.

Chris Detzel: Great. Dan. That was amazing. So thank you so much for [00:54:00] providing this information. Look, we have another one next week.

Chris Detzel: And you're going to be at, so we're excited about that one, but thank you everyone for coming. So please at the end, take the survey, your feedback is helpful. We have one of these every week around specific topics. So if you go to community. Reltio. com and go to upcoming events at the top events, you'll see all the events coming up.

Chris Detzel: And this is recorded and I will be sending this out early tomorrow. It should be in your inbox. So thank you everyone. And thank you for for attending today's session. And Dan, awesome as usual. So thank you. Thanks for feedback already and thanks. It's great feedback. Yeah, we do. All right. Thanks everyone.

Chris Detzel: Take care.

[00:55:00] I see a lot of people getting on early, so must be a lot of excitement here. It's good.[00:56:00]

I enjoyed it. And true, too.[00:57:00]

As I was saying, there's some early birds. So, welcome. Got about 10 minutes or so.

Welcome, Dan.[00:58:00]

How are you? We're both wearing our shirts. So, that's positive on the same page. Yeah.

Can you say something? I think you were a little light on, I don't know. Yeah, no, I'm sorry. I just, I mumbled. No, no, no, I just want to make sure you're mine. Yeah, I think it's good. Yeah, I think so.

So all these nice little new buttons over the last month or so from Zoom Workplace. Yeah.

Dan, did you know that we have three to four shows on Realty Integration Hub in some form or [00:59:00] fashion? So that's Quite interesting. You're doing two and then get another one right after 10, you know, what the. I think the Databricks kind of connector or whatever that's built upon Reltio Integration Hub as well, so it's interesting.

RIH is, uh, the way to go, I think. Oh, it's definitely, uh, a big part of our strategy on, uh, extending and orchestrating, uh, new processes to leverage the existing services, which is, will be a big focus of our session today. Good.[01:00:00]

Yeah, I changed up the slides a little bit just to look and feel a couple of them for mine. Okay. Yeah. I saw that on that two and three. Yeah.

A little bit more professional. It's like, I need to change it up.

I'm saying there's a lot of early birds and a lot of interest, I guess. [01:01:00] I just can't wait to get on.

So everyone will get started in about four minutes, five minutes or so.[01:02:00] [01:03:00]

Yeah, about 3 minutes everyone. So,[01:04:00]

well, we got about a minute left. So, thanks for getting on a little bit early. Some of you a lot early.

Welcome to, you know,

well, this gives me an opportunity to say good morning to Charita Phillips. Hi, [01:05:00] Charita. Morning. How

you doing. Yeah, very good. Thank you.

All right, well, we probably have 30 seconds so. But he's starting to pop on. So that's good, Dan. Thanks for sharing the screen here.

Chris Detzel: All right, well, why don't we go ahead and get started? So thank you everyone for, uh, coming to another Reltio community show. And this one's called enhancing data integration, address cleansing techniques with Reltio integration hub.

Chris Detzel: Um, and so Dan Gage, he's a principal solutions consultant here at Reltio. So Dan, welcome back. Thanks Chris. You're welcome. So the [01:06:00] rules of the show, keep yourself on mute. Uh, all questions should be asked in the chat or you can take yourself off mute and ask there. Uh, as usual, these are recorded and posted to the Reltio community.

Chris Detzel: And I will send the follow up. To everyone that, um, attended and, uh, registered. So, um, next slide. And as usual, we have a jam packed, uh, summer full of community shows and one or two that I still haven't even, uh, pushed out to the community yet. But, uh, today's show, uh, is enhancing data integration with Realty Integration Hub, and then we have two, two more shows specifically around, um, Realty Integration Hub as well next week, and then the following And then we have a show around if you're a life sciences company, you'd be very interested in this one, patient centricity, a pharma industry trend, defining new ways of patient data management.[01:07:00]

Chris Detzel: And then on July 9th, We are very, uh, excited to show that we are, uh, we've got a new kind of product called Roteo Business Critical Edition. So it's enhanced security and resilience, more to come on that. And then on the 11th, really excited to show a new kind of, um, integration with, uh, Roteo Data Pipeline for Databricks.

Chris Detzel: That's really exciting. And then one on, uh, Improve data discovery with Reltio integration for Calibra. So that should be really exciting. If you haven't signed up for those, please do. Next slide. I'll post this in the chat, but, uh, we have a conference coming up called Data Driven 2024, and feel free to scan that and you'll get these slides as well.

Chris Detzel: Um, and then I can get you 200 off. If, since you're part of the community, just put in the community code on there. So, uh, [01:08:00] Uh, do that and then, um, I'll put that, put the information there in the, um, in the chat. So, these are all the customers that and or companies that are presenting, uh, as of now. So, really excited.

Chris Detzel: And that's all I have, Dan.

Dan Gage: Hey, thanks, Chris. So what we're going to be focusing on today is primarily around address cleansing. So the idea is when you have data that is brought into Reltio, Reltio is going to automatically cleanse and standardize that data using the default configuration uh, within the Reltio cleanser.

Dan Gage: But your, your existing license also entitles you for the opportunity to, to take that external data. And if you choose to not load that data into Reltio, but instead to leverage that address cleansing service, you can just, uh, leverage those APIs. And we're going to go through a couple of different techniques on that today.

Dan Gage: To bring that data into Reltio service to cleanse and standardize it and then bring it [01:09:00] back to your source via those APIs and a couple of things we'll talk about is locate. So the Reltio address cleansing service is powered by locate behind the scenes. So that's something that for any of you that weren't aware of that, it's it's completely transparent to you.

Dan Gage: Reltio maintains and operates that instance of located again. It's typically included as part of your license, whether you're using U. S. based records or international records. your entitlements. You can certainly work with your customer success team if you have any questions about what regions you're entitled to.

Dan Gage: But, um, uh, all U. S. based data would be included by default for all customers. So, um, you can also leverage the, as part of your testing process that we're going to talk about today. There is a public website here, support. locate. com. That will allow you to go through and test the locate service outside of Reltio.

Dan Gage: So if you're getting results that are inconsistent with what you're seeing in Reltio, you've got some techniques there that you can test the default configurations. And we'll look at that. [01:10:00] So what we're going to focus on today is configuring the address cleanser. And what you're going to do is you have the ability to have multiple different, what we call cleanse configurations within your Reltio tenant that gives you the ability to have different thresholds that are leveraged.

Dan Gage: So again, there's a default configuration that's typically used when data is brought into Reltio, but when you're leveraging the APIs, you explicitly have the opportunity to tell Reltio, I want to cleanse this record via an alternate configuration. Which uses different parameters, different thresholds, different mappings.

Dan Gage: And again, we'll go through some examples of that here shortly. The testing can be done directly through the Reltio console. So as you configure the Reltio cleanser to leverage your specific parameters, the locate service is still being used behind the scenes, but those parameters will be passed into that service.

Dan Gage: And again, you can do that directly through the Reltio UI. Uh, that service, of course, is going to be available via an API. So you [01:11:00] can also test that with tools like Postman. Um, we're going to go through a process today where we're going to use the Reltio integration hub to take an external file. In our scenario, we're going to grab that file from an S.

Dan Gage: F. T. P. site. We're going to pull it in. We're going to cleanse it and for demonstration purposes. We're just going to show you the results inside Reltio integration hub. But you can certainly append that data. To the original source file or write it out to a new file based upon your business needs. And then you also have the ability to leverage those APIs through scripting tools like bash or Python.

Dan Gage: And we'll show you an example of that today. So we're, uh, we're going to show you the results as part of our community deliverables. We will provide you, uh, the, the Reltio integration hub recipe, as well as the bash recipe that we're going to use here today. But again, I would emphasize that, you know, the, the Reltio integration hub is a tool that Reltio provides scripting tools like bash or Python or something that would be supported by your organization.

Dan Gage: So we would provide the [01:12:00] API services and ensure that they're working correctly. But it would be up to your organization to decide how you do that scripting. So I will provide the example here today, but that's purely for guidance and direction. So looking at that cleanse service. So first thing that we're going to point out is configuration of the Reltio address cleanser is done through the JSON here today.

Dan Gage: So While there's a wide variety of configuration options that can be done directly through the Reltio console, such as adding attributes or defining match rules, the Reltio Address Cleanser is something that still requires you to do your base configuration in the JSON file itself, and you would access that by going into your Reltio console data modeler, and you'll see there's two options on the left for you to download the configuration, Which allow you to bring that Jason out of Reltio where you can tweak and change any of the configuration options and then to import that configuration back into the instance.[01:13:00]

Dan Gage: And, uh, here at the bottom of the screen, we've got a link. That's going to highlight the, uh, the base cleansing services and the general parameters that are available. So we'll go into some specific examples here. Around defining some input mapping and some optional parameters, but again, all of that is going to be fully documented.

Dan Gage: Through that link on the Reltio documentation site. So the 1st thing we'll talk about is the service that locate provides has default parameters that locate can support for both inputting the data that you're choosing to cleanse as well as the data that's returned. So, your default things that you would certainly expect is to have the cleansed and standardized version of the address city state zip.

Dan Gage: Um, but there's also additional parameters that can be brought back to enrich that record. So things like latitude, longitude, geocoding, RELTI also supports the ability to return a time zone, uh, to return census data with regards to a congressional district or demographic [01:14:00] data for the regional territory based upon that census evaluation.

Dan Gage: And then, if you're leveraging services like CAS, uh, which is the, uh, uh, the U. S. Postal Service for Advanced Address Stenzing, you also have additional fields like Residential Delivery Indicator. That can be optionally provided if you're leveraging the the cast subscription. So what we're focusing on is the ability that you're mapping data from your existing tenant.

Dan Gage: So you can see here the attribute URI that's indicating where data is coming from inside the Reltio tenant when a record is created and where the output is going to go. Um, Is going to come from with the locate service. So there's some locate parameters like delivery address 1 delivery address to time zone name.

Dan Gage: Those are the pieces of information that are going to come back from locate in this output mapping is going to allow you to define a default. Set of mappings between the locate data and where you want to put that data in Reltio. [01:15:00] So if you're leveraging the Reltio velocity packs, you're again, you're going to have these, these cleansers are going to be predefined.

Dan Gage: But in the case where you may choose to use an alternate address structure, or if you create a new entity type that you wish to do cleansing on, then the ability to use these output mapping files is very important. So one of the things I want to focus on here is the idea that within the cleanse configuration, you can define a global mapping.

Dan Gage: So in this case, I've called it mapping one, and it's an output mapping. And when I go to the next screen here, what you're going to see is you have the ability when you're creating a specific cleanse configuration. In this case, this one's called default. And you see the name that's highlighted in red here becomes very important because when you're leveraging those APIs, you're going to want to explicitly tell Reltio which cleanse configuration to use when cleansing the data that you're providing.

Dan Gage: And you see here at the bottom in the mapping, the input mapping is explicitly being defined that, uh, data is [01:16:00] going to come in via field called address input, and it's going to be passed to the locate parameter address. A value is going to come from the country field and be passed to the locate field country.

Dan Gage: And you'll notice the mandatory and, uh, parameter for the address input is set to true, which means that this cleanse config will only be executed If an address input is provided, if no address input is provided, then this cleanse configuration will be effectively skipped because it is a mandatory parameter.

Dan Gage: However, country is not a mandatory parameter. So if it's not provided, locate will make its best effort to estimate the country based upon a couple different things. So you have the ability you're going to see in the next screen where you can set a default country. But locate will also attempt to parse that country from the address input.

Dan Gage: Uh, one last thing to point out here is the idea that you're chaining together these cleanse configurations. So in your Reltio cleanse config, your, uh, your [01:17:00] configurations are in a sequence that's a chain of the different cleanse configurations that can be looped together. And then you'll see this parameters here.

Dan Gage: It says proceed on success or proceed on failure. So where you define multiple cleanse configurations. Uh, you have the ability to indicate whether or not it's gonna leverage the 1st configuration. And if that 1st configuration does not have the required parameters, then it can automatically fall into the next configuration within that chain, which would still be within your default configuration here.

Dan Gage: So you have the ability to create multiple different configurations. And then within any configuration, you can chain together different mappings that allow you to look for data in different incoming fields.

Dan Gage: So when you look at the parameters that allow you to optimize how that data is going to be accessed, the first one that I want to focus on is return data by status. So if you're going in and you're tuning your address cleansing by default, it's only going to [01:18:00] enrich the records with additional data if the verified status is, uh, if the verification status is returned as verified.

Dan Gage: And for testing purposes, there may be additional reasons why you may want to enrich partially verified, unverified or ambiguous addresses to make sure that the verification status is being brought back. And in many cases, an ambiguous or partially verified address may be able to extract country or even city or state information from an address and do a partial enrichment.

Dan Gage: So that is something you'll want to consider as part of your, your cleansing strategy. And again, that's one of the reasons why. You can sometimes chain together multiple different configuration settings. To allow you to account for the, the different verification statuses that will be, uh, possibly suggested.

Dan Gage: So you'll see that you have some optional parameters in here around setting your default country, your licensed countries. Uh, it is your responsibility to make sure that [01:19:00] you, you're leveraging the, uh, records that you are entitled to access. So for United States data, that's going to be supported for all customers, but for customers that are also, uh, leveraging data outside of that region.

Dan Gage: You, uh, if your entitlement allows you to support that, you can go in here and you can add additional countries to your license configuration code. If it is not in a licensed country. So if, for example, I was bringing in data from Brazil, if Brazil is not in my licensed country list, and I have an address that explicitly says that it's from Brazil.

Dan Gage: Reltio will not attempt to cleanse that address because it's not in our list here. So be aware of that as part of your configuration, and you can again, of course, have different configurations to support different countries. And that's again where that chaining together may be very relevant. Yes, please.

Chris Detzel: Can I ask two questions? Sure, please. All right. How can we cleanse address for India and China regions, we are using locate feature, but not [01:20:00] satisfied with kind of the result for countries like India and China, can you

Dan Gage: help

Chris Detzel: help us with any of that. Is there another option or is there.

Dan Gage: So a couple of things that I would emphasize is using some of these additional parameters.

Dan Gage: So you can see in the process here, there's a pre processing environment. There's enrichment, there's CAS certification, there's verification, geocoding, using these different parameters and then using primary aliases, using duplicate handling masks. So again, there's definitely some advanced parameters that you see.

Dan Gage: There's a link down here at the bottom of the first link says address cleansing parameters. So there's additional details out there where you may be able to tune the cleansing engine for certain regions to get better results. And in some cases, when you get into rural areas,

Dan Gage: the,

Dan Gage: the postal data is just not good enough that you may always get,

Dan Gage: uh,

Dan Gage: only partially verified addresses or addresses verified only to a locality.

Dan Gage: And we'll see some definitions that on the next slide

Dan Gage: of,

Dan Gage: of what the level of verification is being returned. [01:21:00] But in some cases, you can enhance your, the results that you get for different countries. And again, this is where chaining together can be beneficial is when you look at the one configuration.

Dan Gage: One configuration could be US, Canada, and France, and you could have a different configuration that looks for country code

Dan Gage: IN

Dan Gage: in India, as well as China, and then being able to use different parameters, different verification status mappings for those different,

Dan Gage: uh,

Dan Gage: configurations again is definitely something that can be done to try and enhance

Dan Gage: your,

Dan Gage: your enrichment in those countries.

Chris Detzel: Thank you. Can you,

Dan Gage: uh,

Chris Detzel: quickly provide like a real world example where chained cleanse configurations are used?

Dan Gage: Yeah, and I think the example we just gave there is where you may want to use different parameters for different regions, and that's going to get into your verification status mapping.

Dan Gage: So, uh,

Dan Gage: you'll see here that the verification status that is assigned to a record is based upon this verification status mapping.

Dan Gage: If this [01:22:00] field is omitted from your cleanse configuration, it will use the locate default mapping. But you have the ability here to,

Dan Gage: uh,

Dan Gage: based upon those address verification codes that locate returns, you can set up how that information is going to be scored. And so you'll see here that it's a four part result in that address verification code.

Dan Gage: The first portion tells you the level of verification that locate was able to do. The parsing technique that it was used,

Dan Gage: uh,

Dan Gage: whether or not it was able to pre process that data to produce the results that were expected whether it was able to find postal code data and very importantly here, this last number sequence is going to represent the match for the level of confidence that locate had in the cleansing that was done.

Dan Gage: And then looking for these for the first two sections for the verification and the parsing, there's going to be three digits. So the first one is going to indicate the verification status. So whether it's verified, partially verified, [01:23:00] ambiguous or so forth, the 2nd digit is going to tell you the level of verification.

Dan Gage: So 5 is going to be to a delivery point, which is specifically a sub unit within a building.

Dan Gage: So,

Dan Gage: if you're in an apartment complex, it could be the apartment number. If you're in an office building, it could be the suite or unit number.

Dan Gage: So,

Dan Gage: in that case, if you're able, if located is able to determine. That specific delivery point, you will get a V5 or even potentially a P5.

Dan Gage: If it's able to discern that there is a sub building designator, even if it's uncertain about other factors, you could potentially get a level 5 verification, indicating that it's identified it to a delivery point for is going to be to the premise level, which is basically your rooftop. So it knows that it's going to get delivered to the right building.

Dan Gage: But within that building, it may or may not have. Additional details necessary

Dan Gage: to

Dan Gage: to parse and identify a sub delivery point. Now, in some cases, that could be [01:24:00] because it's just a residential address and it's a single unit dwelling.

Dan Gage: Uh,

Dan Gage: but in cases where you have an apartment complex, but if an apartment unit is not specifically designated, it will simply indicate that it's a premise level.

Dan Gage: Now, if you're using cast certification. You do have the ability to have parameters that tell you that this is a multi unit dwelling, and if it is a multi unit dwelling, but you only have a premise level verification, then that does allow you to identify where you have verification,

Dan Gage: uh,

Dan Gage: that, that is potentially not detailed enough for,

Dan Gage: uh,

Dan Gage: assured delivery.

Dan Gage: So being able to use things like the Reltio Data Validation Framework, To identify records where it has only been verified to a premise level. But if the delivery point verification system indicates that it is a multi unit building. You can leverage those parameters that are coming back from locate to identify.

Dan Gage: When,

Dan Gage: uh,

Dan Gage: a subunit designation should have been possible, but is not currently present, you can identify that [01:25:00] with the data verification codes. And again,

Dan Gage: if,

Dan Gage: if anybody is looking for more specific details, I'd encourage you to post a message into the Reltio community and we'll respond back to you with additional details.

Chris Detzel: So there are a bunch of questions, but we'll get to those after some of this.

Dan Gage: That's fair. Yeah.

Dan Gage: So, um, uh,

Dan Gage: if there are questions around the configuration process, I think we can go ahead and take those now. And then we'll get into some testing and examples.

Dan Gage: Um,

Chris Detzel: okay. So does locate consider the change of address in US and return those address parameters after standardization?

Dan Gage: Yeah, so that the national change of address registry is not currently leveraged by locate using the Reltio default services.

Dan Gage: So,

Dan Gage: is exploring the ability of providing those change of address services as part of our platform. It's not yet available.

Dan Gage: So,

Dan Gage: today, it's only going to cleanse and standardize that address independent of who actually occupies that building, whether it's an [01:26:00] individual or an organization.

Dan Gage: So,

Dan Gage: locate is just going to standardize based upon the building that exists. Independent of the occupant. So the answer there would be, no, it's not going to take into consideration change of address data. It's only going to take into consideration the,

Dan Gage: uh,

Dan Gage: the building unit itself. Cool. Thank you.

Dan Gage: Yeah,

Dan Gage: so, you know,

Dan Gage: keep those questions coming in and

Dan Gage: we'll,

Dan Gage: we'll keep addressing them,

Dan Gage: uh, you know,

Dan Gage: where appropriate. And then certainly we'll wrap it up at the end and come back and address them all.

Dan Gage: So,

Dan Gage: things to point out is I mentioned the,

Dan Gage: uh,

Dan Gage: the fact that when you're defining those cleanse configurations, Reltio's velocity packs are going to deliver 2 configurations by default.

Dan Gage: 1 that's going to be labeled as default is going to expect all of the information to be concatenated together into a single field called address input. The other common approach is to we call other is where the information is already pre past pre parsed to [01:27:00] the ability that your source data can map it in as address line one address line to city state zip,

Dan Gage: uh,

Dan Gage: postal code, whatever information is specifically broken out.

Dan Gage: If you already have that level of designation, sometimes that's. A more accurate way of allowing locate to,

Dan Gage: uh,

Dan Gage: to find hints as far as being able to parse and identify

Dan Gage: that

Dan Gage: that address correctly.

Dan Gage: So,

Dan Gage: in situations where you have the address information in a single field, you would leverage that default configuration and just pass all that data into a single parameter.

Dan Gage: And where you have the ability to break it out, you'll be able to pass that in using an alternate configuration. And again, by default, Reltio is going to chain these two options together. And it's going to check first to see if you have all that information can data together into the address input field.

Dan Gage: And again, that would be a mandatory field in that configuration. So if that field is not populated, it would then fail over to that other configuration where it expects it to be in the individual fields. [01:28:00]

Dan Gage: So,

Dan Gage: again, just showing you here that in the Reltio data model, or you have the ability to just copy and paste and put an address in there and run that test where you can see based upon your configuration, whether it's able to break out that data.

Dan Gage: And again, you'll see an example here where it says 6601 Dorchester number 12. And,

Dan Gage: uh,

Dan Gage: we were able to parse that out and say, yes, address line 1 should actually be 6601 Dorchester road apartment number 12. So that hashtag 12. Is indicating an apartment unit, and you'll see, it's also going to take that data and put it into a sub building field.

Dan Gage: That

Dan Gage: that was apartment 12, if this was an office suite, it may call it suite 12 or unit 12. Depending upon the,

Dan Gage: uh,

Dan Gage: the building designation. So again, just emphasizing that,

Dan Gage: uh,

Dan Gage: there's

Dan Gage: a,

Dan Gage: a wealth of additional information that can be brought back things like census indicators, census codes, the sub building indicators, the latitude, longitude, the ISO codes for your [01:29:00] country's zip for information.

Dan Gage: So,

Dan Gage: again, all this would be controlled through that mapping process. Of leveraging those parameters that are available from locate and again, many of these fields are going to be pre mapped as part of your velocity pack. But if you're using an older configuration where they may not have been there, you can certainly go in and extend your existing environment.

Dan Gage: And add those fields,

Dan Gage: the next option would be leveraging that API directly. And here you can see in the postman tool. We're just going in and we're leveraging the environment that you're accessing.

Dan Gage: So,

Dan Gage: whether it's your dev test prod.

Dan Gage: Uh,

Dan Gage: whether you're in a European tenant or U. S. based tenant that would control the environment and then, of course, your specific tenant I.

Dan Gage: D. and then the U. R. L. that we're leveraging here is the batch cleanse. And you'll see that there's a link here at the bottom to the Reltio developer page that gives you the swagger definition of all the parameters that are available to be passed into that API, as well [01:30:00] as the expected schema. And in this case, we're passing in that information into an address input field.

Dan Gage: And on the right hand side, you can see that it's parsing and enriching it. In this case, we're asking only that the,

Dan Gage: uh,

Dan Gage: the label. be returned, but you could certainly allow it to return each of the individual fields, city, state, zip, zip four as individual return parameters. And again, that's all done via the,

Dan Gage: uh,

Dan Gage: the input here.

Dan Gage: It's,

Dan Gage: it's actually a little bit off the screen on this URL, but we'll see some examples of that in the,

Dan Gage: uh,

Dan Gage: the demonstration.

Dan Gage: The Reltio Integration Hub gives you the ability to have a low code, no code environment where you're orchestrating this process Through the Reltio integration of interfaces. So the first thing to point out is that the Reltio connector within the Reltio integration of that connects to your tenant has predefined actions for many of the tasks that you would perform within the Reltio hub.

Dan Gage: So it would be things like [01:31:00] create entity, create Reltionship

Dan Gage: You know,

Dan Gage: looking for additional data, leveraging certain services, but the address cleansing service is not defined as a prebuilt action. So in order to leverage this service, you would leverage what was called a custom action within the Reltio integration hub.

Dan Gage: So you're using the standard Reltio connector, but you're choosing that custom action option. And then you would use a post method. And then you would bring in the URL. In this case, I'm using an environment called G U S sales. And my tenant ID begins with P L X. So you would put that path directly into that custom action path.

Dan Gage: And then you would also, because you're using a custom action, you would need to define the input and output parameters. So in this case, I would just get a sample schema that I'm going to use for passing that data in. And within the Reltio Integration Hub, I would use JSON to just provide that sample schema.

Dan Gage: And Reltio will generate the [01:32:00] array attributes and values so that you can use that drag and drop,

Dan Gage: uh,

Dan Gage: low code, no code methodology within the Reltio integration hub to source the data from your incoming file into that structure. And likewise, you would,

Dan Gage: uh,

Dan Gage: take the resulting structure that Reltio returns that data in and you would import that into your response body so that you can map that data back out.

Dan Gage: To the appropriate place where you want to return that date,

Dan Gage: uh,

Dan Gage: we're also going to see an example here where you can use scripting tools. In this case, I'm going to use just a command line tool called bash,

Dan Gage: uh,

Dan Gage: shell script, and you can use the curl operator where you could pass into again that same API, the batch cleanse API.

Dan Gage: You would provide a header parameter with the,

Dan Gage: uh,

Dan Gage: the Reltio token and again in that script example that will provide. I will include the,

Dan Gage: uh,

Dan Gage: the sample code that requests a token specific to the Reltio [01:33:00] environment and then uses that token concatenates it back together when calling that data from the address input into that tenant.

Dan Gage: So again, we'll see an example of that here in a few moments. And the last thing to focus on is just our demonstration. So before we jump into that demonstration, Chris, do we want to address any,

Dan Gage: uh,

Dan Gage: questions about the process? Yeah,

Dan Gage: let's,

Dan Gage: let's do it. So for default and licensed country parameter, I see that you have U.

Dan Gage: S., and this was

Dan Gage: kind of

Dan Gage: going back, United States, and

Dan Gage: like, uh,

Dan Gage: CAN and Canada. Is that because the list is trying to mimic all the variations that may appear in my data source? Yes. Yeah, that's correct. So

Dan Gage: when you're,

Dan Gage: when you're passing data into that country field, if the data that you've mapped in is matching one of those existing fields.

Dan Gage: Then it's going to attempt to, to use that country,

Dan Gage: uh,

Dan Gage: in passing that data over to locate. And again, where that becomes important is if you have multiple cleanse configurations that you have chained [01:34:00] together, and you want to use that country code to, to basically guide the Reltio cleanser to using a different configuration.

Dan Gage: For different countries, you have the ability to define those there. You can also, of course, that parameter can be,

Dan Gage: uh,

Dan Gage: leveraging the Reltio reference data management solution to standardize that incoming value. So if you have country codes that are coming in via your ISO numeric. So if you have a country code, this is eight four zero that represents the United States being able

Dan Gage: to,

Dan Gage: to map that via RDM to us is supported.

Dan Gage: Um,

Chris Detzel: can we somehow in the config as to get back the locate address ID of the locate address reference data against which our address was verified?

Dan Gage: Uh,

Dan Gage: I don't think that's supported to my knowledge. That's definitely something where if you post that into the community, I can work with our product management team.

Dan Gage: And find out if that detail is available. But,

Dan Gage: uh,

Dan Gage: to my knowledge, [01:35:00] it's not,

Dan Gage: uh, the,

Dan Gage: the metadata that locate is using

Dan Gage: is,

Dan Gage: is effectively behind the scenes and that it presides, presents the results.

Chris Detzel: Yeah, I just got a community. Reltio com and post there.

Dan Gage: Um,

Chris Detzel: Will this data be persisted in the tenant as I noticed API returns entity URI?

Dan Gage: Yeah, that's a great question. So it is going to return a URI, but it is not going to persist the data. So when you're leveraging that batch cleanse API, you're effectively doing a similar to an external match. You're allowing that data to come in. Reltio is going to assign a URI to it just for the purposes of logging and tracking within the Reltio platform.

Dan Gage: But that URI would not be available in your tenant. If you attempt to access that record via the URI that got returned via that batch cleanse, you would not find that record stored within the Reltio platform. So this is just leveraging the API service to,

Dan Gage: uh,

Dan Gage: to take the data outside of Reltio for your needs.

Chris Detzel: And last question [01:36:00] for now can locate provide the output in English after cleanse the address if the input comes in German or non English characters.

Dan Gage: Uh,

Dan Gage: yes, so you do have that ability. There are some optional parameters that you can put in there. I don't have that in any of our examples here today, but locate is definitely does support if you're bringing in data in,

Dan Gage: uh,

Dan Gage: simplified Chinese or traditional Chinese characters.

Dan Gage: It will attempt to cleanse and standardize that using the,

Dan Gage: uh,

Dan Gage: the same language or same character set that's coming in. But there are some optional parameters where if the metadata is available, it will,

Dan Gage: uh,

Dan Gage: basically transcode that data into,

Dan Gage: uh,

Dan Gage: a Latin based character set. If the Medicaid is supported, it'd be interesting because we had locates CTO on about a year ago to talk about their roadmap.

Dan Gage: So maybe we'll,

Dan Gage: uh,

Dan Gage: I'll try to push to get them on again,

Dan Gage: you know,

Dan Gage: to see what that roadmap looks like. Sure. A lot of interest. I think. Thanks, Dan. [01:37:00] All right, great. So let's take a look at this. So a couple things, some of those websites that I referred to along the way.

Dan Gage: Uh,

Dan Gage: the Reltio address cleanser you see here again, that link is was in the,

Dan Gage: uh,

Dan Gage: the slides that will be provided to you.

Dan Gage: And you can see that being able to configure the tenant,

Dan Gage: uh,

Dan Gage: the address cleansing properties via CAS SERP,

Dan Gage: uh,

Dan Gage: understanding those address verification codes. All that is here. It's very well documented. This is a very commonly used service within the Reltio platform. So it's well documented and we provide you

Dan Gage: some,

Dan Gage: some direction to that.

Dan Gage: Uh,

Dan Gage: I mentioned that locate service directly.

Dan Gage: So,

Dan Gage: in any case where you may be cleansing an address and you're not getting the results that you expect,

Dan Gage: uh,

Dan Gage: from Reltio, you can always go directly to locate and you can just,

Dan Gage: uh,

Dan Gage: again, this website is in those,

Dan Gage: uh,

Dan Gage: slides that we provide, we'll be providing, and you can just put in that data directly and have it passed to locate, and it will attempt to parse and process that record.

Dan Gage: Using the default locate [01:38:00] parameters and again,

Dan Gage: uh,

Dan Gage: in some cases you'll get an address that just doesn't cleanse and sometimes you wonder whether,

Dan Gage: you know,

Dan Gage: is it because,

Dan Gage: uh,

Dan Gage: something is not configured properly or is it because it's just not a valid address and going directly to locate allows you to bypass whether it is a configuration option or whether it's,

Dan Gage: uh, uh,

Dan Gage: just a bad address.

Dan Gage: You can also sometimes just take that address And send it over to Google. So if you,

Dan Gage: uh, uh,

Dan Gage: bring in an address here and just, I right click sometimes and just do a Google search for it and look on the Google maps to see whether or not that address is coming back. And that lets you know whether or not it's a configuration issue or whether that address just doesn't exist.

Dan Gage: So I did mention again, being able to take that address and put it directly into your Reltio console and being able to run that and seeing those results coming back. And again, you'll see that there's a wealth of additional data that's available that you may or may not be taking advantage of. So things like time [01:39:00] zones, transaction codes, MSA codes,

Dan Gage: um,

Dan Gage: all that data is available is part of that configuration.

Dan Gage: And we can also include this default configuration that we're using in the deliverables here for the community show that you can download. So here you can see,

Dan Gage: you know,

Dan Gage: zoom in on that a little bit. This sample here where you're defining a default output mapping where the parameters from locate are being mapped directly into your Reltio data model.

Dan Gage: And then when you get into the information here, the infos is the individual cleansers that are being defined. So here's my default cleanser. And here's my other cleanser. And again, you saw in the default cleanser in those previous examples, the data was all coming in via address line address input. And if that data from your original source is coming in via address line, 1 address line to a country administrative area via your state province or city, you can see where the [01:40:00] parameters within locate might be administrative area or locality.

Dan Gage: But in your Reltio tenant, it may be called city, state, province,

Dan Gage: uh,

Dan Gage: et cetera. So being able to map those based upon the data that is available to you as your input, the more data that you can pass into locate, the more likely it is to cleanse and standardize that. And again, in this case, we're using the input mapping specifically for this cleanser, but we're using the output mapping reference.

Dan Gage: That's pointing back to that mapping one that was defined in that top section here in the mappings themselves. So this is just helping you understand where you can define that mapping one time in the mapping section and then leverage it multiple times as your output mapping reference.

Dan Gage: Uh,

Dan Gage: the parameters that we talked about, the verification status, that address verification code coming back from locate,

Dan Gage: uh,

Dan Gage: the verification status mapping is using a regular expression.

Dan Gage: So here in this case,

Dan Gage: we're,

Dan Gage: we're being a little bit loose, but we're [01:41:00] going to call verified anything that starts with the letter V followed by a dot meaning any characters. Occurring any number of times are going to be classified as V, but if we have an address verification, that's partially verified, followed by a 234 or five, if it ends with a 90 or 100, then we're indicating here that we're still going to call that verified.

Dan Gage: So even if it's only partially verified, according to locate. If locate has a high degree of confidence that it's 90 percent confident or above, then we're still going to allow that to be classified as verified. So this is where, when you're getting data for other countries. Being able to go in there and say, hey, for partially, or even unverified addresses that locate is 100 percent or even you can go in here and say 95 percent confident.

Dan Gage: We may still want to classify that as verified. And again, based upon that status. Of your verification mapping, you have the ability to control is data going to be [01:42:00] returned back to my Reltio tenant. So in cases where records are getting partially verified, that information may not be returned to your tenant.

Dan Gage: So you can either adjust that verification status mapping, or you can just say, you know what, even if Reltio default configuration says that it's partially verified, I still want to take whatever data is coming back. And then based upon my output mapping, bring that data back in and enrich it. And of course, it will be associated with that Reltio,

Dan Gage: uh, uh,

Dan Gage: crosswalk that's returned.

Dan Gage: So whether you're doing this for data that's being persisted within Reltio on a new Reltio crosswalk, or whether you're leveraging that API that's going to return that data, you're controlling whether or not the data will be returned via this return data by status parameter. All right, so based upon that, we're going to go into the Reltio integration hub, and we're going to look at that first process.

Dan Gage: So,

Dan Gage: in this case, we built out a recipe in this case, I'm just triggering it every 5 minutes. It's going to go out and look for a [01:43:00] file. And I've just simply created a parameter here that just says, it's called sample addresses. 7. We're going to connect to our server. We're going to grab that file, parse it down and then loop through each of those records.

Dan Gage: Now within Reltio Integration Hub, we do have the ability to optimize the task utilization by using this batch option. So in the case where we're processing through those records, I'm indicating here via options of Reltio Integration Hub that I want to pass 10 records at a time to that Reltio API. So where that becomes important is when you look at your Reltio tenant and your utilization thereof.

Dan Gage: When I see here, I'm calling that batch cleanse via postman here. I have two addresses. And you'll see that those two addresses are both being returned. So from a Reltio entitlement standpoint, this counts as one API call. So that becomes very important if you're processing large volumes of data, is if you have passed only one [01:44:00] address per API call, and you're processing a million records, then you'll consume one million of your allocated Reltio API calls.

Dan Gage: But by passing multiple records in a single API call, you're leveraging the Reltio service more efficiently, And it will affect

Dan Gage: your,

Dan Gage: your entitlement and your utilization quotas. So that's going to affect the API quotas here, as well as your ability to leverage your Reltio integration hub task quota. So in this case, by batching those records together and sending multiple records via each API call, that counts as one API, but it also only counts as one task.

Dan Gage: Within the Reltio integration up. So in this case, you'll see where I have the,

Dan Gage: uh,

Dan Gage: the on the fly cleanser for Reltio, and I've mapped in that batch cleanse API. I imported that schema, and it's going to bring in the address input. Here we also brought in the country, and then the output.

Dan Gage: Uh,

Dan Gage: here we're bringing [01:45:00] in the data from that file, and we're just concatenating everything together.

Dan Gage: Address line one to string, address line two to string, and it's important to put

Dan Gage: the,

Dan Gage: the two s's On here, because if one of those fields is null, if it's missing, then the concatenation option here may potentially fail.

Dan Gage: So,

Dan Gage: by using the default 2 string operation on a null value or a numeric value, it will still concatenate it together as a string to generate that full value.

Dan Gage: And then you can see in the response body, all the various different fields that could potentially come back,

Dan Gage: uh,

Dan Gage: are there available. And again, for our demonstration purposes, because we are passing multiple records over in that single API, because we want to be able to look at them in our results here, we are going to loop through those results.

Dan Gage: And again, this is

Dan Gage: for,

Dan Gage: for demonstration purposes, you could pass those records into a CSV and you also use the batch process as well. But again, those would be ways that you would optimize

Dan Gage: [01:46:00] your

Dan Gage: your Reltio integration of task utilization. So if I just sample test recipe, you can see that it's going to go pretty quickly here.

Dan Gage: It's going to go out, get that file, batch those records up, send them over to the Reltio API. And again, by clicking in here, I can see that the input is 7 records of a batch of potentially 10 records. And the output is,

Dan Gage: uh,

Dan Gage: the records that can get moved through. And you'll see here that all of those addresses are being passed over and the results are coming back to provide those rich details.

Dan Gage: In this case, the Milton Street address in Pennsylvania. And again, for demonstration purposes, I've placed them into a list here where you can see what the original data that we were passing over to locate and how locate past that data back. So again, we'll provide a copy of this recipe that you can import into your instance of Reltio integration hub.[01:47:00]

Dan Gage: But the process here, of course, is just connecting to that source data using any of the available connectors. You could be connecting to a sequel database. You could be connecting to an Excel sheet to,

Dan Gage: um, uh,

Dan Gage: Google Doc to box to any of your variety of different sources. So the Reltio A. P. I. Is going to be accessible, transparent to where that data is actually being sourced.

Dan Gage: By using

Dan Gage: the,

Dan Gage: the Reltio integration of low code, no code environment, you're just mapping that incoming source data into those parameters. And again, you saw that where we were just concatenating those fields together from that inputting file.

Dan Gage: So,

Dan Gage: whether that input is coming from a CSV file, as we're doing here today, whether it's coming from a SQL database, whether you're connecting to your sales force environment, however, you choose to get that data that's passed over to the Reltio cleanser is a previous step.

Dan Gage: Within this recipe. So I'll stop here for this recipe. Yeah, good. There's a lot of questions [01:48:00] specifically about this.

Dan Gage: Um,

Dan Gage: how can the cleanser output for address find one be changed. So for example, to exclude the apartment number and store and the address line to output. Yeah, and that's definitely a question that we get a fair amount.

Dan Gage: And the short answer is that,

Dan Gage: uh,

Dan Gage: locate is going to decide the locates parameters, how that data is going to be brought back.

Dan Gage: Um,

Dan Gage: so if we go back into our postman and right now I've got a select in here where you can see I'm selecting only the label to come back. But by removing that parameter, it's going to return everything back by default.

Dan Gage: And again, my token is timed out here. So I'm going to re get that token and reclaim that data. And you'll see here. Now we're actually getting all of those various different fields. You'll see that,

Dan Gage: uh,

Dan Gage: address line one does have that apartment concatenated to it. But if you want

Dan Gage: [01:49:00] to,

Dan Gage: to get more granular by going down to other fields that,

Dan Gage: uh,

Dan Gage: such as street, you can see that the street there was Dorchester Road, the street number was 66 0 1.

Dan Gage: So if I don't want that apartment number to be there, I can simply look at the premise number and the street name. Independent of that full address line 1.

Dan Gage: So the delivery address is going to be all that information pre concatenated. But if you look at those individual fields, and again, we can see this.

Dan Gage: Uh,

Dan Gage: if we go back into the Reltio console where that data was brought out. Here, we have 6301 Milton street. P. A.

Dan Gage: Uh,

Dan Gage: if we use that address and I'll just type it in.[01:50:00]

Dan Gage: So,

Dan Gage: by default, the address line 1 is taking the delivery address. But you can see here that by looking at the premise number and the street name, which is Dorchester Road, by concatenating together only that street name and premise number, you can exclude that apartment building number from that. So it would just allow you to,

Dan Gage: uh,

Dan Gage: to concatenate that data a little bit differently in your mapping.

Dan Gage: So by default, the mapping back to address line one is going to take your delivery address, which has got all those fields pre concatenated. But by using an alternate mapping, you can control how that data is brought back. What is the optimal,

Dan Gage: uh,

Dan Gage: value of batch size here? Yeah, and that's definitely going to depend upon the size of your records.

Dan Gage: Uh,

Dan Gage: for addresses,

Dan Gage: the,

Dan Gage: the size is going to be Reltioly consistent of around,

Dan Gage: uh, uh,

Dan Gage: probably about 100 [01:51:00] characters, which once you format that into a JSON structure, is going to be maybe around 100K. So I think the numbers that I've seen is between 50 and 100 is going to be your ideal batch size. So

Dan Gage: the,

Dan Gage: the only negative consequence of doing larger batches is you run the potential for a timeout.

Dan Gage: So within Reltio integration hub, you do have some,

Dan Gage: uh,

Dan Gage: some abilities to set some of those parameters around what is your expected timeout,

Dan Gage: uh,

Dan Gage: whether you're running parallel tasks.

Dan Gage: The,

Dan Gage: the performance of your actual Reltio tenant. But

Dan Gage: the,

Dan Gage: the short answer is I would think I would start at around 50, and then if you go up to around a hundred, you're going to get a little bit better performance, but you,

Dan Gage: uh,

Dan Gage: need to put more error handling in there.

Dan Gage: And if I go back into my recipe here, you'll see that Reltio does support the ability to create error handlers. So in this case, by taking those tasks steps that I was performing below here and dragging that up into that monitor section. It [01:52:00] allows me to monitor for errors such as a timeout and to either retry that step or to just choose to log that step as potentially having,

Dan Gage: uh,

Dan Gage: performed an error.

Dan Gage: So

Dan Gage: the,

Dan Gage: the idea here is

Dan Gage: you can,

Dan Gage: you can go a little bit larger and then,

Dan Gage: uh,

Dan Gage: account for error handling, or you can use a more conservative number where you're going to get more consistent process.

Chris Detzel: What is the upper limit of a batch cleansing address in a single payload? Is it size or number of arrays?

Dan Gage: Yeah, it's definitely going to be the total size of the record of the payload. So if you have addresses that are very large and complex, if you're passing a lot of information over, then,

Dan Gage: uh,

Dan Gage: the size of that payload is going to be more restrictive. Then the number of records itself. So if you can pass a thousand records and it doesn't exceed the,

Dan Gage: uh,

Dan Gage: the optimal record,

Dan Gage: the,

Dan Gage: the payload size, [01:53:00] then Reltio is going to cleanse all 1000 of those records

Dan Gage: and,

Dan Gage: and send them back in a single task or a single API utilization.

Dan Gage: So again, it's something where you're going to get a little bit of trial and error, and if you want to public, if you want to post a message to the Reltio community, we can ask for a little bit more specific guidance from our product management team. But

Dan Gage: it's,

Dan Gage: it's something you can definitely test out and validate, or you can work with your Reltio customer success team to get more details.

Dan Gage: Great. And although,

Dan Gage: uh,

Dan Gage: we're going to share the recipe with everyone, is the recipe available in the community library?

Dan Gage: Uh,

Dan Gage: that's a great question. I can certainly work with Chris to have it shared there.

Dan Gage: Well,

Dan Gage: not. The online community, but the library within. Oh, yeah. So no, I, by default, I don't think it's in the community library here on the left hand side.

Dan Gage: Yeah, so we will put it into the,

Dan Gage: um,

Dan Gage: the Reltio community as a,

Dan Gage: uh, uh,

Dan Gage: import a zip file. So you can go into your recipe lifecycle management, [01:54:00] and there will be a package will give you the ability to import that package into a new folder, and you'll be able to get that zip that you can download as part of the,

Dan Gage: uh,

Dan Gage: receivables that you have.

Dan Gage: And I'll send that an email tomorrow.

Dan Gage: Um,

Dan Gage: what order does the cleanser work. So we also have,

Dan Gage: um,

Dan Gage: noticed noise words, sorry.

Dan Gage: So absolutely. So yeah,

Dan Gage: the,

Dan Gage: the Reltio cleanser is going to occur before matching takes place.

Dan Gage: So, uh, if

Dan Gage: you're. If you're matching on address information, which is very likely to be the case of the Reltio cleanser is definitely going to be applied before any of the matching is done.

Dan Gage: So,

Dan Gage: you can have multiple sequences that the order in which the cleansing is done again, whether you're bringing that data in via the address input, whether you're using parameters. Like the,

Dan Gage: uh,

Dan Gage: the country codes. To [01:55:00] determine,

Dan Gage: uh, uh,

Dan Gage: which data is being processed and again, I think that was in our top example here where we had that locate default country.

Dan Gage: So here we're only going to process records from United States, Canada and France. But in the 2nd, 1 here, because we do not have that country code past, it will attempt to cleanse all addresses via this cleanse configuration. Whereas if you wanted a specific cleanse configuration for US, Canada and France.

Dan Gage: That uses this clarification for verified, but for Dominican Republic, you wanted to use a more stringent verification,

Dan Gage: uh,

Dan Gage: mapping, then you could simply set that up as a,

Dan Gage: uh,

Dan Gage: an additional option here.

Dan Gage: So,

Dan Gage: you could just come in here and copy that section, put a comma, add a new one in, and then we could bring that in for,

Dan Gage: uh,

Dan Gage: Dominican Republic.

Dan Gage: And in that [01:56:00] case, I may take some of these other options here. And I may move those into the partially verified category. So therefore, only a,

Dan Gage: uh,

Dan Gage: an address verification code beginning with V for Dominican Republic would be considered verified. Any other configurations that match these parameters now would only become partially verified for Dominican public addresses.

Dan Gage: Thanks, Dan. That's all the questions for now.

Dan Gage: Great. So let's move on to that last thing. I know we're running low on time, but in the scripting options here. So you can see, I have a script. It's just called address cleanse that S. H. and when I run that file,

Dan Gage: you see,

Dan Gage: it's just going to echo those addresses back.

Dan Gage: And then it's going to say I'm done. So if we go out and we look at that script itself.

Dan Gage: Well,

Dan Gage: here you can see the results. My computer is detected that the data has changed. So I can reload that. And

Dan Gage: you see,

Dan Gage: here's that data that just came back. And if I ask my editor to parse it out, you can see that it's parsing out that address [01:57:00] using that same information.

Dan Gage: Uh,

Dan Gage: note here with the address reference data management is determining what the ISO 2, ISO numeric, even though locate will be returning that data as well. Here, you can see that data is coming back from RDM, but,

Dan Gage: uh,

Dan Gage: the script itself. Again, I mentioned, and I'll see if I can increase the size here.

Dan Gage: So the first thing I'll notice is that this is not my actual password. So you can, in this script, you can provide your username and password. It's going to leverage the Reltio authentication service. And you can see a sample payload of what an access token might look like. And Reltio is now using the JWT tokens.

Dan Gage: So they're a little bit longer. But then, again, I'm not super efficient with my shell script. So I'm using actually a couple of iterations Of parsing to get that JWT token from that authorization API [01:58:00] to break it down so that when I call That batch cleanse API here You can see I'm calling the cleanse and I'm just passing the authorization bearer as that token that came from above.

Dan Gage: So you can see here, I'm just getting in this case, I'm reading the data from a file called data dot CSV, and we can see that here's that data that's coming in.

Dan Gage: Uh,

Dan Gage: in this case, we're parsing out the file header, and we're just gonna ignore it. We're gonna bring all that address line data that's coming from that file, and we're just gonna bring it into that payload.

Dan Gage: We're echoing that address, which you saw coming onto the screen there at the top. And then we're just passing it over to that,

Dan Gage: uh,

Dan Gage: cleanse service. And if we scroll away to the right here, you'll see, we're just appending the data into that result dot Jason. So in this case, our scripting file is not making any effort to parse out the specific [01:59:00] results.

Dan Gage: It's just taking the entire results. Jason. That was returned by Reltio and it's appending it into this file so we can see that those three address returns are just being returned as JSON. But if you get a little bit more sophisticated in your script here, you can parse out that data that's being returned from Reltio.

Dan Gage: So in this case, instead of bringing that data into results. json, you could parse it out using something like awk. Which is another,

Dan Gage: uh,

Dan Gage: command line utility. So for those of you that are very script savvy, you can,

Dan Gage: uh,

Dan Gage: get more sophisticated. For those of you who are less script savvy, you can just take this,

Dan Gage: uh,

Dan Gage: example that we are providing here.

Dan Gage: You can put in your own credentials and again,

Dan Gage: uh,

Dan Gage: typically we would recommend that you leverage a service account. So within your security for that service account that's gonna be accessing this, you need a minimum of create. And update permissions on the entity type where that,

Dan Gage: uh,

Dan Gage: cleanse is being [02:00:00] deployed.

Dan Gage: So,

Dan Gage: in this case, we're leveraging the,

Dan Gage: uh,

Dan Gage: default cleanser on the location object. So the user credentials that I have need to have a minimum of create and update permissions on the location entity, but it does not need to have any additional parameters.

Dan Gage: So,

Dan Gage: in cases where you're doing a service account, rather than using the base location object, if you created an alternate object that would be used by your cleansing service APIs, then you can restrict the parameters so that service account is not able to perform other actions within Reltio, but is just able to access the service.

Dan Gage: that cleansing service. So I know we're towards the end of the time. We'll take a few more questions and then we'll respond to any additional ones through the community.

Dan Gage: So, um,

Dan Gage: can we reverse the order if possible to custom cleansers? It's

Dan Gage: kind of

Dan Gage: back to the other question that,

Dan Gage: uh,

Dan Gage: Brian had. Yeah.

Dan Gage: So, uh,

Dan Gage: within

Dan Gage: the

Dan Gage: the Reltio matcher, the short answer is [02:01:00] no.

Dan Gage: So the Reltio cleansers are going to be applied before matching. So when I define my match rules. If my match rules are using a cleanser here, any cleanser, so if you're using a noise word cleanser or word replacement cleanser as part of your matching rule, the Reltio address cleanser would be applied first.

Dan Gage: in creating that record. So if we go and look at a record within this hub,

Dan Gage: oops.

Dan Gage: So as this data is brought into the system, in this case, you can see that the data, so this matching here that you're [02:02:00] referring to, this would only be relevant for data that's being loaded and persisted within Reltio. The processes that we've talked about today are just bouncing data off of the Reltio Cleansing Service, the,

Dan Gage: uh,

Dan Gage: either Reltio Integration Hub, Postman, or

Dan Gage: the,

Dan Gage: the batch

Dan Gage: Um,

Dan Gage: shell script, but for data that's being physically persistent and Reltio here, I have data that's originating from salesforce and that Reltio cleanser is appending this data.

Dan Gage: So this, these cleansed values would be applied 1st and then within your match rule, any noise words or exclusions within the natural would then be applied 2nd as part of the matching process.

Chris Detzel: Great.

Dan Gage: Well,

Chris Detzel: Dan. That was amazing. So thank you so much for providing this information. Look, we have another one next week.

Dan Gage: So,

Chris Detzel: and you're going to be at, so we're excited about that one, but,

Dan Gage: uh,

Chris Detzel: thank you everyone for coming. So please,

Dan Gage: um,

Chris Detzel: at the end, take the,

Dan Gage: uh,

Chris Detzel: survey, your [02:03:00] feedback is helpful.

Dan Gage: Uh,

Chris Detzel: we have one of these every week,

Dan Gage: uh,

Chris Detzel: around specific topics. So if you go to community. Reltio com and go to upcoming events at the top events, you'll see all the events coming up.

Chris Detzel: And,

Dan Gage: uh,

Chris Detzel: this is recorded and I will be sending this out early tomorrow. It should be in your inbox.

Dan Gage: Um,

Chris Detzel: so thank you everyone. And thank you for,

Dan Gage: uh,

Chris Detzel: for attending today's session. And Dan, awesome as usual. So thank you. Thanks for feedback already and thanks. It's great feedback. Yeah, we do. All right. Thanks everyone.

Chris Detzel: Take care.


#CommunityWebinar
#Featured

0 comments
29 views

Permalink