Reltio Connect

 View Only

Real-Time Data Quality - Show

By Chris Detzel posted 06-20-2022 09:17

  






In this Reltio Community Show, you'll learn how you can get started with the Real-Time Data Quality tool by working with a dataset designed for this demonstration. We will showcase how easy it is to analyze and spot trends that would otherwise be a challenging and cumbersome task.

Transcript: 

Chris Detzel (00:05):

Hello, everyone. I'm going to go ahead and get started. My name is Chris Detzel. I am the director of customer community and engagement. Today we have, I would say special guest, but Michael, you've been on, this is your third time to do this, but it's a little different today. Usually we're doing some Q&A sessions. Today Michael will be talking about Real-Time Data Quality with Reltio. And so Michael is a senior director of AI and ML here at Reltio. And this is one of the features that came out in the, or one of the features that's coming out in the release on June 22nd. And so he's going to go into more of a demo and presentation around what it looks like, what you can do with it, and things like that.

 

Chris Detzel (00:51):

It's something that we're building upon here also in the future, so you should see that over the next several releases. We're certainly excited about here at Reltio. So as usual, please keep yourself on mute. All questions should be asked... There's no Q&A panel, so my apologies, I didn't change that, but in the comment section. The show obviously is going to be recorded and shared out as soon as I can get that out. And I haven't put all of July events out on there, but as you can tell, June has been just packed.

 

Chris Detzel (01:25):

Today we have the show of Real-Time Data Quality. On the 22nd we're going to go into more of the Reltio Integration Hub and how easy building recipes is. It's easy as pie, as a matter of fact. And then something a little different that I think that a lot of you, and really would love for you to got one, How to Build an MDM Practice from the Ground Up because we want your feedback and even your questions around that. And then on the 28th... Which the 23rd is if you want to become part of the CCAB, which is a, it's a content kind of architecture board that we have for just select people that want to come in, and be a part of the documentation team to help us with our documents to get better. It's more of a closed group. It's one of our first closed groups. But if you're interested, let me know. I can get you in touch with the right person that manages that.

 

Chris Detzel (02:21):

And the last two things is the New Modern Reltio User Experience. We want to make sure that you're using that, that you understand how to use it, and we'll go into deep dive in there. And then the last will be Reltio Community Show around UI Configuration, Self Service. And then I have three shows to book today that's not on here at all. So that said I'm going to stop sharing, but a lot of good stuff coming out on the community and thank you everyone for coming. And Michael, I'm going to give it to you.

 

Michael Burke (02:56):

Excellent, Chris. Thank you so much for having me again, really excited to be here and talk about data quality. And before I get started, I'm just going to share my screen here and we're going to walk through a couple quick slides. One moment. Can everyone see my screen?

 

Chris Detzel (03:17):

No. Not yet.

 

Michael Burke (03:28):

All right. How about now?

 

Chris Detzel (03:28):

Yep.

 

Michael Burke (03:29):

Excellent. Perfect. So before we jump into the demo, just really quickly, I want to share the Safe Harbor slide. We will be talking a lot about not only things that we're releasing in '22.2, but some of the things that we're working on moving towards the future. Really excited to give you guys a little sneak peek into that. So thank you for joining today. And we will walk through that shortly after this. So let's jump into the demo.

 

Chris Detzel (03:56):

Wow, that was like quick. Two slides you were right. All right.

 

Michael Burke (04:04):

So this is the data quality dashboard and the purpose of this and why we're creating a data quality dashboard in Reltio instead of using an external to tool, is that traditionally data quality has been something that's done once as you're making the migration into a MDM. And we really see this as something that traditionally we've seen customers who run their data through a data quality tool, and then they import it into Reltio, and then they're done, and quality checks stop there. The advantage of the real-time data quality platform is that we are going to be continuously monitoring customer's data quality. So any day, any change in your data upstream, any source change will automatically be reflected here in this data quality dashboard. And ideally, in the future, some of the things that we're going to bring to the table that we're really excited about are ways that we can monitor and alert of changes of this information automatically.

 

Michael Burke (05:05):

So getting started, this is a entity that we're looking at called Professional Association. We've got 1,097 active profiles, 0 inactive profiles, 92 attributes, which are these attributes on the left here, and then we have 6 source systems. And this first chart that we're looking at here is the consolidation rate of the entity Professional Association. And what that means is these are all of the input sources that are feeding this MDM for this entity. And what we're seeing here is that this data source RPA and DTPA are consolidating down from 2.2 thousand records to 1.1 thousand consolidated Reltio of profiles. And what that means is that likely these two data sources contain either supplemental or duplicative of records. And in this case, it's likely supplemental because you can see that they both are consolidating down to 1.1, which means they're probably both merging into one Reltio profile. I'll stop there for minute. Any questions about consolidation rate and how that works?

 

Chris Detzel (06:18):

There's no questions as of now.

 

Michael Burke (06:21):

Excellent.

 

Chris Detzel (06:23):

Well, there is actually, sorry. Is there a way to reject records from coming into Reltio that do not meet certain criteria? So example reject all records that have names containing a special character.

 

Michael Burke (06:38):

There certainly are ways to reject records. This data quality analysis is currently happening once data has landed in the MDM. I think that's an area that we are considering and looking at and exploring, but to date our data is being, the quality of the data is what is already landed in the MDM.

 

Chris Detzel (06:59):

Is there a way to determine which records have multiple contributing sources?

 

Michael Burke (07:06):

At the record level, no, we've done this at a consolidated view to date, but you can actually, if we drill into, for example, a specific record here, you can actually drill into that record to view that consolidation. And I believe we can see that through this source's view. So for example, of this record American Association for Respiratory Care, you can see that which sources came in for each attribute. But looking at that as a consolidated view would be tricky just because the sources could vary so much.

 

Chris Detzel (07:46):

Would there be a way to tie consolidation rates into savings from a data management standpoint?

 

Michael Burke (07:51):

That is a great question. We had a conversation earlier about what does data quality mean and the definition of data quality. And today some of the things that we're capturing and looking at are statistics about the data itself. But one of the ideas that we've been exploring is, can we tie that back to actual business value? And a good example of this was the marketing campaign example where even though an email address might be valid to a marketer downstream, who's receiving data from DMDM, it may be whether or not these customers are actually opening emails, so are they the right emails? And so I think that when we talk about optimizations and data quality, in order to really monitor that full life cycle of business value, we have to go both upstream and downstream. And that's something that we're certainly exploring ways that we could do that in the future.

 

Chris Detzel (08:48):

Great. So John from D&B says, interested in to know if the analysis is against just the Reltio tenant, or D&B tenant, or both?

 

Michael Burke (09:00):

The analysis is on both actually. So we're looking at any data source coming in.

 

Chris Detzel (09:07):

I'm not sure if this is going to be for today, but it would be interesting to see how data quality rules can be set up.

 

Michael Burke (09:15):

So we can show that today. And we will do a brief overview of what we call custom data validation functions, which are custom rules, so that you can have your own parameters set up, monitored, and tracked.

 

Chris Detzel (09:26):

And the last question, then I'll let you keep going. Is there a way to export the dashboard data?

 

Michael Burke (09:34):

There is a way. There are two ways. Today, currently we have not published the APIs because this is still something that is in its infancy, but there are ways to utilize and leverage those APIs through the GUI. And then you can also export the output of these automatically by drilling into each additional data source. And we can walk through that quickly later today.

 

Chris Detzel (09:56):

Great. Thanks, Michael.

 

Michael Burke (09:58):

No problem. So jumping into the custom data validation functions, this second box right below consolidation rate, is profiles with invalid data. And what this list is, and right now there's only one simple validation profile, is a list of all the custom rules that are created and associated with every attribute in this entity. And so this is the space where, if you did have multiple rules, you would be able to look at a holistic view of every custom definition and rule that's associated with each attribute in this entity.

 

Chris Detzel (10:36):

Hey Michael, can you clarify something real quick? I apologize. So the entity shows that there's 92 attributes, but the right list indicates there are only 40.

 

Michael Burke (10:46):

It's a great call out. So there are, this ability right now we are hiding attributes that don't have any data in it. And so if you uncheck that, you'll see a match of the one-to-one to the attributes.

 

Chris Detzel (10:58):

I just don't want anybody to be confused. Thanks.

 

Michael Burke (11:03):

I see another question's popping up. I love all the questions. This is great.

 

Chris Detzel (11:06):

I can't get off. Would that also cover address standardization? And if so, would that be available only for locate implementations? That's a good question.

 

Michael Burke (11:17):

It's a good question. So I believe it's available for both. You can have other address standardizations. There are some challenges with reference data, and there's some ways that we handle it a bit differently within the data quality platform. And we can dive into that a little bit later on today as well.

 

Chris Detzel (11:36):

Great. Giving you a lot to do. Thanks, Michael.

 

Michael Burke (11:39):

Excellent. So this right hand column here is the attributes' pane. And you can see that this has every attribute listed today that has data in it. And so we can click into this acronym field, as an example, and one of the things you'll notice is that there's these little icons too, this is included in a match rule, and it's also has data validation functions. And let me jump back quickly. There are ways to filter for specific reference data, data that has validation functions, which are these custom rules, and also information that's included in a match rule or reference data manager and also required. So this is a great way to quickly sort and search if you do have 100s of attributes associated with a specific entity.

 

Michael Burke (12:27):

So if we dive into the acronym attribute, one thing you'll see is there still are 1.1 thousand profiles. This data type is a string. We've identified that this data type's associated with a match rule and has validation functions. This attribute has a fill rate of 98%, meaning that 98% of these users have data or these records. And 22 or 2% have missing attributes, acronym attributes. So one of the cool things that you can do here is you can actually drill into any of these charts, and certainly the fill rate, and see what is that missing, who are those missing users. And we can do that simply by selecting the acronym. And you can see that all of these have the missing value for an acronym. So really cool way if you're trying to, and let me just jump back there for one more moment, if you're trying to analyze which users or records are associated with a certain validation rule or flag, you can drill into it like this. And then you can just save that search and export that data to do further analysis.

 

Michael Burke (13:42):

So jumping back to the acronym page, we also have this chart for uniqueness, and this is really talking about, are the records unique or not? And so in this circumstance, 89.8% are unique. 10.2% are not. And if we jump down to the chart below that, you can see through the frequency analysis, which records actually are not unique. And so you can look at this and say these acronyms AACC is unique to one record. And you can drive in and see that. But if you look at some of the other records, for example, AAO, you can see that this is not unique because there are five records that share this AAO acronym. And it looks like they're all associated with the same organization as an abbreviation. So this is a really powerful tool especially if you have some sort of standardized records that you are expecting to either be unique or not be unique, you can quickly identify which ones need to be looked at and drill into the actual data to see if there's a correction or something else that needs to be changed within the data set itself.

 

Chris Detzel (14:56):

Hey, Michael real quick, or a few questions actually.

 

Michael Burke (14:59):

Sure.

 

Chris Detzel (15:00):

How do you enable the DC dashboard? Now that's not available till the 22nd, is that right or...

 

Michael Burke (15:06):

Yeah. So a great thing to point out is that the data quality dashboard is going to be automatically enabled for everybody, and it is completely free and included with the base package of Reltio. So we're really excited to be able to do this. We think that data quality not only enables a better story for you to understand and make the quality of your data better, but it also helps us as kind of a guiding process to be able to understand what's going on in your tenant and how we can also assist you to make the product work as best as possible.

 

Chris Detzel (15:38):

Great. June 22nd, just to make sure that that's there. Can we set the validation criteria to indicate the RDM values not resolved as one 1 DQ validation parameter?

 

Michael Burke (15:52):

So to date, we cannot do that. We are actively investigating more around RDM and how we can set specific customizations on those filters. So more to come on that. But I think that's TBD for now.

 

Chris Detzel (16:07):

And I think if it's not there and we're just researching it, it might be really good to, for you to go, not you Michael, but to go into Ideation Portal and put that in as a idea, unless you already working on it.

 

Michael Burke (16:24):

Absolutely. And Chris, maybe, that's something we could send out at the end because I don't know if everybody has that link.

 

Chris Detzel (16:29):

I will. I will. Trust me.

 

Michael Burke (16:29):

But the Ideation Portal is huge. And any ideas that come into there, we automatically get notified about those ideas and they can be upvoted. So that's a big driver to our roadmap. We do talk to individual customers, but if you do submit ideas through there, anybody can upvote them. And it's a great way for us to help prioritize what to work on next.

 

Chris Detzel (16:49):

We have 63,000 profiles in our dev tenant. How long will a view like this one take that you're sharing?

 

Michael Burke (16:59):

It's a great question. So initially it may take a while to first consolidate and generate this report because obviously we have to do all of this processing on the back end to generate these statistics. One of the things that we are working on is a daily cash snapshot of this information so that it will be instant. But for right now it is something that loads so it may take 5 to 10 seconds to load up.

 

Chris Detzel (17:27):

Well, that's not that long. Can we initiate an activity based on the various fields being reviewed? So for example, would we be able to launch a data steward activity based on the data being reviewed?

 

Michael Burke (17:42):

I would need to check with engineering on that. I think that you can, but it's a great question, and I'll have to get back to you on it.

 

Chris Detzel (17:51):

More questions, lots. Is the DQ dashboard up to date dynamically, so real time? Or do we have to run anything to make sure that the dashboard metrics are up to date?

 

Michael Burke (18:04):

Anytime you load the data quality dashboard, we are loading all this information and processing it in real time. So that was one of the big feats that we wanted to make available to everybody here is this instantaneous ability so that if you make a change, something happens within your data store, you can go to the dashboard and see those changes reflected quickly.

 

Chris Detzel (18:25):

Sandro, feel free to unmute yourself. Yes, absolutely.

 

Sandro (18:29):

Sorry, Michael. Good material by the way. One of the challenges that we have is that the quality of the data's very dependent on the crosswalk type, the relationship, essentially, and so depending on the relationship, the quality of the data can be really good, then it's not so good for other kinds of relationships. Is there any way, or is that something that you guys are planning, that will allow us to filter that so that we can assess the quality of the data based on relationships or crosswalk types?

 

Michael Burke (19:01):

That is a great question. We are actively working on how to address relationships. I think that if you think of this is the first iteration of data quality, and we're in this crawl-walk-run stage. Relationships is, there's a vast amount of work to be done there, and to make those searching and statistics effective, and not overload any of the existing processes, there's a lot to be done there. So yes, that's definitely on our roadmap. It is not available today.

 

Sandro (19:29):

No, thanks. Thanks, Michael.

 

Chris Detzel (19:29):

Thanks for the questions. So can you see uniqueness for specific identifier types, such as, see uniqueness for social security number or for TIN?

 

Michael Burke (19:43):

That's a great point. And let me just actually jump over to another tenant where I have some samples, social security numbers loaded in, obviously fake, but this is something really cool that we can do. Social security numbers we should assume everyone has a unique one. So you will be able to see if there were duplicates for whatever reason in your data store. But one of the other really interesting pieces that we've launched recently is pattern analysis.

 

Michael Burke (20:10):

And so this will allow you to say we don't just want to look at the frequency, but we want to look at what is the format of the information coming in. And so you can see this most common format is number, number, number, number, number, like 324, which is a standard social security and all of these other records, which are, spaces might be included or letters in some circumstances. And again, this is demo information, but this is a really good way to be able to flag if you have a standardized pattern for how information should be coming in, that there are issues that need to be worked on that your standard data quality rules may not pick up.

 

Chris Detzel (20:51):

Great. Thank you. And I have some more. I will provide the link to the Ideation Portal after all these questions. Can data quality results be filtered for selected entity records? So example I'm interested in the US records only.

 

Michael Burke (21:09):

So filters is another thing that we are working to add right now. There are ways that you could, if you were looking at a specific attribute, export that information, and then filter it yourself to date. But there is no global filter other than the ones provided here on the attribute list.

 

Chris Detzel (21:30):

If somebody wants more filters and things like that, that could be another idea. Is there a way we can track consistency of attributes on profiles that have more than one source system?

 

Michael Burke (21:44):

So another great question. So tracking... Can you repeat the question?

 

Chris Detzel (21:51):

Is there a way we can track consistency of attributes on profiles that have more than one source system?

 

Michael Burke (21:59):

Got it. There isn't today in the data quality dashboard. But I believe that you can do that at the individual record by looking at those historical changes to the survivorship. So that pane that I showed you before of an individual record, I think, that you can actually look at the historical changes there.

 

Chris Detzel (22:20):

There's a comment that you probably didn't see, but you mentioned that it should only take 4 or 5 seconds, but they're seeing it actually taking 10, in environment. This is live stuff. Will the DQ dashboard have permissions for attribute level visibility, so can we limit who, what roles can see what data?

 

Michael Burke (22:45):

Everything on the data quality dashboard you can control through permission configurations. Actually controlling individual graphs, it may not be able to do that, but I think that you can do it at the attribute and entity level.

 

Chris Detzel (23:02):

So somebody, it mentions, global filters are key to adoption of the capability.

 

Michael Burke (23:09):

Absolutely. We completely agree. I mean, I think that it's not only the ability to filter, but also the ability to group. So those are the two pieces that, there are ways that you can do some filtration, which I can dive into at the attribute level and individual entity level, but that global filter is something that is very high up on our priority list.

 

Chris Detzel (23:32):

And then last question, but let's do this one. We have seen some challenges to review the different relationship type. Do we, or can we define my own DQ role across entity and relationship type?

 

Michael Burke (23:52):

Could you elaborate that one? One, I don't know who gave it, but-

 

Chris Detzel (23:56):

If you want to open it up, if not, we'll just keep going.

 

Speaker 4 (24:01):

So, Michael, I wanted to check if we can define my own DQ rules, not crossing entity and relationships. So some entity and we wanted a mandatory relationship, we were present, kind of, can we do that? And I can define my own.

 

Michael Burke (24:22):

Got it. So I think what you're saying, data validation functions can help with that. But to date, we can't go across entities, so it would have to be within the same entity.

 

Speaker 4 (24:34):

Okay.

 

Chris Detzel (24:36):

Also, Mohamed, if you want to speak up, you mentioned, I didn't answer the question, so maybe we just didn't understand it properly.

 

Mohamed (24:45):

My question is actually related to if you have, let's say, RDM values on profiles and we have the same values in multiple source systems that have the same profile. Let's say, for example, an organization record is in Salesforce and then it is in SAP, and this organization has a certain RDM value. And I want to understand if we can track on this data quality dashboard a real time, if the value of this RDM value in Salesforce and SAP is the same? Do you know what I'm saying?

 

Michael Burke (25:37):

Yes. So to date, no, because we are looking at the consolidated view for these attributes. I think that there are ways that you could do that through search. But it would be challenging for us to create a unified kind of representation of that for the data quality to date.

 

Mohamed (25:56):

I mean, it's just that what we are looking for is consistency of certain attributes on our profiles across multiple source systems. And I think that one of the biggest measure of MDM, so would be a good idea to bring you to data quality, I guess.

 

Michael Burke (26:18):

Absolutely. I mean, being able to see that consistency across sources... We can definitely look into that. And again, I highly recommend, these are some great comments, really appreciate it. If folks could post those to the link that Chris is sending out later, it would really be beneficial for us to dive in. And maybe even meet and have a deeper discussion on how we can partner in some of these things.

 

Chris Detzel (26:39):

Absolutely. And there's two more questions then I'm going to let you get back to the demo. Are you using new or classic UI? So we've had issues with the new UI, with privacy maybe, for profiles, maybe it's profiles, with invalid data. Is this a known issue? We do have a support ticket open.

 

Michael Burke (27:02):

Got it. So data validation profiles, we'll have to check with the engineering team on that. But this data quality dashboard is supported on both the classic and new UI.

 

Chris Detzel (27:16):

And is the out-of-the box DQ analysis available for relationship level attributes as well as via reference attributes? I think you answered that.

 

Michael Burke (27:24):

It is top on our list. But it is not supported to date. It's coming shortly.

 

Chris Detzel (27:30):

All right. And Mark just mentioned, it looks like they're on the classic mode, maybe in the call. All right, go ahead.

 

Michael Burke (27:36):

Excellent. So let's dive in a little bit more on how do we create data validation profiles, for those that haven't done this before, and how that would appear on this data quality dashboard. So if we jump into any attribute that doesn't have data validation profiles, you're supplied with a link that will jump you right to the data validation profiles for this entity, Professional Association. And you have the ability here to create custom data validation functions. And for this example, we were looking at acronym to do a number of different things and really represent the types of attribute flags that you would like to see on an individual attribute and also across multiple attributes of the same entity.

 

Michael Burke (28:27):

So what does this mean? In this circumstance we're looking at the attribute acronym, we've created this title called ASA, and the descriptions we're flagging that we want to identify all acronyms with ASA, and we've selected the acronym from the attribute list. We've said that it equals. And then we've just provided the string ASA. And when you save and run this, and you will have to go and revalidate your data to have these update regularly, but you will then have this chart appear under the profiles with invalid data. And you'll see that the title here ASA is what we set in the data validation profile. And that actually matches, to kind of confirm that, what we have here on the frequency analysis, which is for customers or profiles that were flagged. Any questions on this?

 

Chris Detzel (29:26):

Yes. Can the user save filters like country or list of countries to be used every time he or she opens a DB?

 

Michael Burke (29:38):

So we can create filters in the data validation profiles. And what you would want to do if you were looking at something like country, is use the function here that would be in the list, and then provide a array of those countries, or codes, or abbreviations, whichever way you want to represent it.

 

Chris Detzel (30:07):

Great. Thank you. That's all the questions.

 

Michael Burke (30:12):

So finally, a few other charts here I just wanted to explore quickly. This is the source systems. So this will tell you for this individual attribute, what are the sources that are contributing to it. And then beneath that is this length statistics chart. And this will show you a distribution of the length of these, in this case, acronyms or names that we're looking at. And so you can see that like the majority of names are between 36 and 43, and 28 and 35. And at the low end you have 4. And at the upper end you have 157. And so this is kind of a good way to manually spot anomalies. Should a name be 157 to 164 characters? That might be a red flag. And we can drill into that and actually see what those names are. And so in this circumstance, it looks like it's a company. It probably is okay to be displayed on the list. Very long name though. Any questions there?

 

Chris Detzel (31:24):

Nope.

 

Michael Burke (31:26):

All right. So let me jump in then and give you a little bit of a sneak peek into what's coming.

 

Chris Detzel (31:36):

This is later, like several months down the road, right?

 

Michael Burke (31:40):

Several months, plus. These are things that we're working on and kind of the areas that we think we're going to be able to drive the most value for customers moving forward. So again, Safe Harbor, we don't have any timelines associated with this work. We're working towards it as strategic insights right now. And we will release more information about the things that we're building as they come along and come to fruition.

 

Michael Burke (32:05):

So the longer-term goal that we have for real-time data quality is really to harness these insights to drive both upstream and downstream business impact. And there's three main areas that we see that we can drive the most value compared to most other data quality tools. One is this idea of being fully real time and completely integrated. And we talked about this idea of traditional data quality processes, being something where you load data into a tool, you do your munging, and then you export this information to an MDM or other source.

 

Michael Burke (32:38):

And we really believe that this idea of real-time monitoring, merging with the MDM creates a much more complete and holistic picture of what kind of value you're delivering your downstream customers. And also where issues may reside in changes to data from upstream customers. We're also thinking and ideating around this idea of being able to give one-click remediations to certain problems and also rapid issue detection.

 

Michael Burke (33:06):

This second column is really this idea of being able to provide benchmarks. And so this is industry specific insights around global trends in data that will allow you to find better key performance indicators and also increase your effectiveness. It is so hard right now to be able to benchmark or understand where you are compared to the industry. Or where you are compared to the global average. Is your data really good? It might look good to you, but you really have limited context into what that means in the larger scope of things. So our goal is to be able to report out statistics about industries, to our customers, and be able to give back these insights in a way that is meaningful and valuable.

 

Michael Burke (33:48):

And then finally, this third pillar is recommendations. And this is the idea that you can make more informed decisions effortlessly using AI. And our goal is to be able to help you identify bad data, detect anomalies, and reduce manual effort. So how are we going to do that? If you look at today, this is kind of a mock of the entity level and what we just walked through in the demo.

 

Michael Burke (34:14):

But in the future, where we're trying to head to, is this idea of being able to capture all of this information in time series. So no longer will you just look at the snapshot of data from today, but you'll be able to see this whole story of how your data quality has transitioned over time. And also you can compare this to what actions have you taken and how have those created an impact on quality.

 

Michael Burke (34:39):

And then finally, this idea of being able to add monitoring and benchmarking will allow us to give you a lot more insights into an expected value where your data should be trending to and alert you when there's any kind of volatile change to that. So in this example, a customer has loaded some bad data into the fill rate of email addresses. And we have this dotted line, which was our forecast of where we expected the data quality to climb to, and this light blue, wider line is actually the threshold of what would trigger an alert.

 

Michael Burke (35:19):

And so when this data quality dropped, we instantly give you a notification that says your email address fill rate fell below forecast. And when you click into that, the idea would be that you're presented with this option out-of-the box to enrich your data fill rate with a third party enrichment tool. Maybe this was data that came from an acquisition or a place where you don't have control of restoring that data on your own. We've just provided a solution that will ideally bring your data quality back up to the expected range. And something that would've taken 100s of lines of code, or somebody to make and create an API connector to connect and enrich this data manually, has now been done in a matter of seconds with a click of a button. So that's all we have today. Any questions on this?

 

Chris Detzel (36:08):

I mean, I think we all like buttons that just do it for us, so it's awesome. So somebody said, "Very nice. It's a great presentation." And somebody said, "This would be a game changer when it releases. Amazing stuff." So if we could have this tomorrow, it'd be great. Any other questions? No. Well, Michael, this is really good. I really appreciate you coming. Everyone thank you for coming to this community show. And there is a question. Can you give some examples? Do we have any examples or anything like that?

 

Michael Burke (36:52):

Of enrichment or...

 

Chris Detzel (36:58):

There are some more questions. I don't know, Georgie, if you want to speak up, what are you looking for as examples?

 

Michael Burke (37:05):

Absolutely. So I think that enrichment is an easy one. Being able to identify an issue and provide enrichment. But there are other areas where machine learning might be more prevalent. For example, what if you had data that was just in, swapped between columns? That would be an area where we might be able to identify that for you, that somebody manually put first name in the last name and last name and the first name. And we could identify that and say, "Hey, there's these records or these anomalies that have been detected, it looks like somebody changed the schema." I think these are the areas that we're trying to get to, is much more around automation and understanding of the context of data and how that changes over time.

 

Chris Detzel (37:50):

And then somebody says, "Nice. DQ is great initiative to bring inside the tenant. So it's really cool." So question, can the end user filter in the analysis to be performed only on OVs or are these all source values? So similarly, can the end user add filter for inactive entity records? That's two questions actually.

 

Michael Burke (38:14):

So right now, kind of these global filters we don't support. But it's something that we're actively working on and is high our roadmap.

 

Chris Detzel (38:24):

Good. Is the DC feature, will that be part of the 2022.3 release?

 

Michael Burke (38:32):

We have a high order of things that we're trying to get done. And we're really excited to work on all of these projects. But I cannot guarantee that anything is going to be done in '22.3. At this point, I think that as we get closer to '22.3, there will be announcements on what's coming. And we're certainly working towards a lot of these goals.

 

Chris Detzel (38:51):

So what Michael presented on today is going to be released on the dashboard and things like that. That's going to be released on the 22nd. So that'd be good to go. Now, all this other stuff that he said, future, is just future. And Georgie wants to know what data can be enriched?

 

Michael Burke (39:14):

Sure. I mean, I think that there are 100s of integrations. And if you look at the Reltio integration hub, as an example, of opportunities for us to enrich data. And correct me if I'm moving off in the wrong direction here, but our idea is can we identify the opportunities where data needs to be enriched ahead of time, before the user has to recognize it, or get a complaint from a downstream customer? And by having these integrations, we can immediately say, "Let's pull in this data from this other source and have that mapping, that one-to-one mapping." So for example, that enrichment of email address, I brought up in there, the tool of Clearbit. Clearbit has a ton of profile information on customers from first name, last name, phone number, email. And all of this stuff is available from a third party enrichment tool. I hope that was helpful. Let me know if I'm off little bit here.

 

Chris Detzel (40:12):

That was good. Another question. So currently the analytics is performed on the operational value only, is that right? Or is it both OV and the source values?

 

Michael Burke (40:23):

Hey Alexey, do you want to jump in? I just saw that you joined. Since you're here.

 

Alexey (40:31):

Actually I'm just like answering in the channel. So the analytics performed both, OV and source values.

 

Michael Burke (40:39):

Yeah.

 

Chris Detzel (40:41):

Great. And I'm putting this link in the chat because we are doing a Reltio integration hub, building your recipes is as easy as pie since Michael mentioned that. So that's the link... Let me put it for everyone for it be helpful. There's a link there. So it looks like there are no other questions. And Lexey, thanks for coming on. That was really cool.

 

Chris Detzel (41:09):

We all help each other. But everyone, thank you so much for coming. Next week, we're taking off, I'll be on vacation, so we're not doing a show. But we have lots of shows coming up after that. And hopefully this was helpful. We're certainly excited about it. We do have another question. I'm going to go ahead and read it, Michael, since we have some time.

 

Michael Burke (41:29):

Sure.

 

Chris Detzel (41:31):

So Mohamed says, "For example, if the information on company records and DMB changes, will this new capability easily be able to identify this change and have us enrich our records on demand?"

 

Michael Burke (41:43):

So that's interesting, the idea of historical change. We were more thinking of change from the source system, but that's a great idea. And again, like another thing that we could certainly add to our roadmap as we build this out.




#dataquality
#communitywebinar
#Featured
0 comments
2838 views

Permalink