Reltio Connect

 View Only

How Google Is Improving their Data Quality - Show

By Chris Detzel posted 10-27-2022 14:56

  


Data leaders must take pragmatic and targeted actions to improve their enterprise data quality if they want to accelerate their organizations’ digital transformation. During this virtual session, Michael Burke, Senior Director of AI and ML at Reltio, will ask a series of questions to Google’s Matthew Cox, Enterprise Data Management Lead, on how they are Improving their Data Quality efforts. We will also welcomed questions from the community.

Below are some questions that were asked during the session:


 - How do you and your team think about Data Quality?
- How do you communicate to the business that improving data quality impacts business decisions?
- Perfection is hard to come by, can you talk about what a “good enough” standard of data is for Google, knowing that the responsibility of describing what can be defined as “good” lies with the business?
- How do you go about establishing a Data Quality standard across the organization?


Transcript

Chris Detzel (00:07):

Welcome to another Reltio Community Show. My name is Chris Detzel and we have a couple of special guests on our call today. Our first special guest, Matthew Cox from Google. He's an enterprise data management lead. It's our first time to have him on one of our shows. Welcome, Matthew.

 

Matthew Cox (00:26):

Thank you, Chris. Happy to be here.

 

Chris Detzel (00:28):

And then Michael Burke, which he's not new to the Community Show, but it's been a little while since we've had him on our show. But he's the senior director of AI and machine learning here at Reltio. Welcome, Michael.

 

Michael Burke (00:42):

Thanks, Chris.

 

Chris Detzel (00:44):

And then, of course, me. And then before we get started, just some ground rules. Please keep yourself on mute. These are live lines, so we don't need lots of background noise. All your questions should be asked in chat, or feel free to take yourself off mute. Note, this time though, some of the questions, we might not be able to answer, just due to some confidentiality. So just keep that in mind when you ask your questions. The community show will be recorded and posted to the Reltio Community. So some of our upcoming shows, and we're really excited about, of course today, How Google Is Improving Their Data Quality. Also ... Ut-oh, we have some background noise there. Okay. We, on November 3rd, Venki, our product leader will be talking about managing your core data as a product. And then we have a click case study, done by Joe DeSantos on November 8th. He is the CDO there at Qlik. So really excited about that.

 

(01:49):

And then on November 10th, we have a Q&A show, somewhat similar to this, with data quality management, commercial pharma, MDM landscape with Takeda. And then, on December 1st, we have a show coming up, and this will be more of a in-depth show around how data stewards, analysts, and business users can use or get the most out of the new Reltio UI. As most of you know, we now have a new UI that's going to come out, and it's already out, but I think February will be the time we kind of move everyone over to that. Is that right, Michael?

 

Michael Burke (02:27):

I think so, yeah. I think February is the go-live date for that.

 

Chris Detzel (02:32):

Great. So without further ado, I'm going to put myself off of video and audio, and I'm going to let Michael take this over, and then Matthew just be the really smart guy and answer a bunch of questions. So thank you, Matthew, again.

 

Michael Burke (02:49):

Yeah, Matthew, thank you so much for joining. We really appreciate your time. Starting off, could you tell us a little bit about yourself, your role at Google and maybe something personal? Detzel had mentioned something about a farm. I'd love to hear more about that.

 

Matthew Cox (03:05):

Sure. So I've been in the high tech, I think we've got a echo going on, but if everyone could turn off their audio and mute themselves, that would help, please. I've been in the high tech, I guess, data industry for over 20 years. So I've had the joy of seeing many different stages of how data, data management, in this case, data quality has evolved over time. And so, it's been an exciting atmosphere and exciting industry to be a part of, certainly to see where we've reached so many new heights going into 2023. I've been at Google for two years now, so we call them Google-versaries. If you ever see anything on LinkedIn where people are celebrating their Google-versary, I'm having my second one, just finished my second one. It's been a really amazing experience to be part of Google and to see the variety of businesses, the variety of customers, the variety of data, data elements, data environments, data sources that we get the pleasure of dealing with.

 

(04:11):

So it's been quite an adventure going through, and getting my arms around what is data and how do we organize it, to manage it and improve its view over time. I have my, in the organization that I'm working, we're part of what's called a core or the core organization within Google, and that we provide this horizontal set of services.

 

Speaker 4 (04:32):

No, I think you got it, man. I'm good.

 

Matthew Cox (04:34):

I'm good, too. Thank you, though. And we provide the horizontal set of services across Alphabet. So our remit really is all of Alphabet. And if you look at Alphabet or understand anything about Alphabet, Google is part of it, but we have a number of, variety of different businesses that incorporate or are all incorporated under Alphabet, We call them other bets. So you'll hear things like Waymo, and Deep Mind, and YouTube and all the names that are out there. But, our remit really covers all of those organizations, which is pretty amazing.

 

(05:05):

But in that remit and in that service, it covers a number of things. I'm sure all of you are familiar with data governance, looking at compliance. Certainly compliance is becoming a much more fervent part of our existence these days, as we think about data and interacting with third parties. Master data and that master data being across data sources. But in particular, I have a responsibility for our master data set within our SAP environment. So there's MDG, if you're familiar with SAP MDG, and that core master data set that drives so much of SAP, but then multiple other platforms that we have in place, as well. Obviously, Reltio being one of them.

 

(05:44):

And then I also have a responsibility for all of our third party interactions. So, I believe in leveraging third party data authorities to help us improve both our data structure but also to our data quality. And so we have a number of third parties that we work with, to help us build the best relationship with our data and also understand our data the best means we can, by helping leverage them as a external authority.

 

(06:10):

And then finally, I have a functional responsibility over the applications, capabilities and products that we deliver through our data organizations. So, we're not only doing data services, but we have internal products that we deliver across Alphabet. So I have the functional responsibility for overseeing that, and then an architecture board of how those things get structured, organized, and then delivered across the enterprise.

 

(06:35):

So that's kind of my Google world and Google existence. So yes, couple of things personal. So number one, I'm a military veteran, so I get involved with a lot of veteran affairs and veteran actions. So that's been something I've enjoyed through the course of my life and staying close to that community. And then secondly, the ranch, yes. So I became a blissfully happy ranch owner, as of April of this year. So just outside, I'm actually out of Dallas for any of you that are in Texas.

 

(07:04):

We recently purchased a ranch, about two hours east of Dallas. We've spent the last six months really doing some of these things. I have a son and his wife and their children actually, run the ranch for me. But we've added cattle, and goats, and chickens and ducks and all varieties of different animals. Sometimes I feel it's more like a zoo than a ranch, but it's been a really amazing process to go through. And I try and head up there at least sometime every weekend. And it's really something that has blessed my family, and all of my kids and grandkids are getting a chance to get out there and spend time, and get away from the run and the sprawl of the urban environment for a little while.

 

Michael Burke (07:48):

That's so incredible. That's really cool. Yeah, I know that as somebody who grew up way out in the country, I think that, and I think a lot of people who work in tech, you need that kind of escape outside of the tech verse that we're all in right now, and dare I say the Metaverse? But with all this-

 

Matthew Cox (08:08):

Real quickly on that, the interesting thing I think a lot of people don't realize, is how much data you can generate from a farm or ranch. I mean, oh, there's a lot more tech, I think, than people realize, that you begin to collect and work in a farm, a more rural environment. And what sort of things I'm finding particularly intriguing is, how do you now look at data from an agricultural lens? I mean, I've always seen it from a corporate lens, so it's really given me a whole new lens, which I think is pretty intriguing, in how I can apply that to the corporate environment as well.

 

Michael Burke (08:37):

And there's so many amazing emerging companies starting out right now, that are really in that agri-science of data, right? That's incredible.

 

Matthew Cox (08:47):

Robotics. Robotics and ML, I mean really it's, sometimes I laugh in front of you who are Star Wars fans. I look at all the droids that they had. I'm sitting here wondering how I can add more robotics to a ranch. So it's a kind of fun little, fun space for me.

 

Michael Burke (09:03):

That's great. That's really great. So thinking about your role at Google and obviously, being probably one of the most expansive roles in this master data management space, apart from many other things, how does your team think about data quality, especially at this scale?

 

Matthew Cox (09:25):

Well, I think first of all, what I try and explain to everyone is, data quality is really a journey. It's an unending race, really without a finish line. You really can't say, put a point in time and say, "We're done with data quality. In fall of you Q4, 2023, we're going to be done, project's complete." I really try and get them to understand, it's something, it's a continually evolving environment where we have to look at how the business changes, how data changes, how our tools change. And we're constantly measuring. But really, for us, it really starts more with a data quality strategy. So I know a lot of times people think of data quality and they, it's measures or metrics. What's the standard I'm having? How do I measure against it? We really look at it as kind of more of a strategy, of how do we make data quality part of the life of what we do, both in our projects and our interactions with our internal customers, our interactions with our third parties?

 

(10:19):

And so we kind of really sit back and say, "Let's determine a level of maturity. What does maturity mean for us?" And then, "What is that expected from the organization, both our line of businesses, our internal business partners, but also with, again, back to compliance," as we see compliance having more and more of a role in defining data quality. How do we put a strategy around that? What kind of North Star can you set for data quality, that while we don't necessarily have an end state or a finish line, how do we begin to continually strive toward getting to the best point that we can, from a measurement standpoint? And we go through, typically we think about data quality, we want to assess, get a good grasp of where we are today. Where we are today, what do we think that North Star is, how do we begin to mature toward that point, and then begin to document how we can get there, who are the regulatory bodies, may have input into it?

 

(11:14):

Are there policy makers internally to the organization? What are the data consumers actually looking for? What is their goals? I mean, we are a processor, for the most part. So we have consumers and we have producers. So what are their expectations? And then as part of that, begin to look at, what are the metrics? This is, we get to the metrics points. We define the strategy, look at our maturity, define our maturity, establish that North Star, and then begin to look at metrics that we can use to measure our success. I mean, a metric isn't necessarily this final goal, but how do we begin to watch, monitor, and measure our success toward that North Star? And that's kind of how we think about data quality. It's really a fluid activity that continues to evolve within the organization, but we always want to tie it to every aspect that we're engaging across our organization.

 

Michael Burke (12:03):

That's really interesting. So would you say that data quality is really embedded far beyond your team within the organization? And that being said, do you run into similar challenges that I think many organizations face, of communicating how improving data quality, or specific actions to improve data quality will have an impact on the business?

 

Matthew Cox (12:27):

Absolutely. I think there's, I've never been in an organization where the data literacy was consistent across all the entities, and then understanding its impact. So from my perspective and what we do, is really try and speak the language of the business. Let's get into their world and begin to make connections that represent areas of measures that they have. So if you think of sales and marketing, if you think about finance, if you think about operations, all of them have different aspirations, different measures, different goals that they're trying to achieve.

 

(13:02):

And so, one of the things that we try and do is really go into their realm, understand how, understand what they're looking for, and then create examples, targeted examples and what kind of connections that data can make to improve their specific objectives and their specific goals. We have a thing called OKRs at Google, and so we try and partner with our business organizations in those OKRs and say, to your point of embedding, "How can we help embed ourselves into those OKRs, into their priorities?" And so, we can help them establish that. And when they feel that level of commitment, but then also the fact that we can speak in their terms, it literally helps us to translate those very tip or traditional means of dashboards and metrics. We can begin to translate that in a way that they can understand, and that data literacy ultimately gets improved across the org.

 

(13:53):

And so between the connections, happen to meet their business goals, but then also offering where process barriers are occurring, based on their business needs, their business workflow. Those are areas that we've helped really get them involved, and understand, and want to get active in helping us improve data quality. Because for me, quite honestly, I don't see one group as being responsible for improving data quality. I think it's an enterprise, it's an ecosystem effort, and you really have to marshal everyone into that objective. But you have to translate in a way that means something and not just a dashboard you throw up and explain why a particular third party product or platform's not performing well. That really doesn't get you very far. You really got to explain what is the impact of that data, what is the impact of the quality that's coming out, and how can it actually help them move toward their milestone, when you try and move toward a data quality lens in their activities and the processes.

 

Michael Burke (14:46):

That's really interesting, and I think that when you talk about your stakeholders defining milestones, obviously everybody is searching for perfection in data quality, and building the most robust and most accurate data sources to support specific business needs. How do you define good? And when you're working with your stakeholders, are you advising them on how they can make their systems better?

 

Matthew Cox (15:14):

So good enough, this is always an interesting question. So, good enough to me is a moving target. I don't think we ever said good enough once, and then just rinse and repeat. Good enough is an evolving behavior. And I think, what I've seen historically, and this is just from a historic standpoint in different organizations I've worked in, it's really a balance between trying to drive data improvement, but then what effect do you have on the business process?

 

(15:41):

So a lot of times what I will do when we try and define good enough, is bring the consumer or the producers, so who's producing the data, who's processing the data, and then who's consuming the data? And that collaboration becomes good enough. Is what the producer is enabling or delivering, is it effective enough for the process it needs to fulfill? Is it complete enough? Is it broad enough? Is it high enough from a literate standpoint for the consumer that's trying to leverage that?

 

(16:11):

And I think that's where I end up with good enough, because if you talk with different groups, at least in my background has been, if I talk to different areas, whether they're functional or they're business lines, all of them have a different passion for what good enough actually means. If I talk to a salesperson, versus someone who in compliance, versus someone in finance, they have a very different lens into good enough. And what I've typically done historically is brought those parties together, because in a lot of cases, they're all consuming and they're all producing at the same time. And so I'll bring them together into a forum or to a council and really begin to expose, this goes back to your last question. You'll begin to expose what data quality effects are in place, and I'll just give you an example.

 

(16:53):

So think of duplicate records. So most people would say, "Hey, having duplicate company records is a problem, because that's a data integrity challenge." And I would say, "Absolutely. We don't want to be creating duplicates," but there are times with different business processes that, because of the constraints that we could be placed on them, it doesn't help the business process. Or, it becomes a barrier to the business process, or it becomes a barrier to the ability to act, whatever is necessary from a workflow standpoint. So there are times when you say, "Well, good enough is, we have some level of duplicates, but there's a process in which we cleanse them out over time," blah blah blah. So I think it really, for me it's having a very open collaborative conversation with those producers and consumers, to make sure that we're aligning on what that service level should be across all those three parties. And that's where I found success, in getting them involved.

 

Michael Burke (17:40):

Really, I think that the idea of switching to realizing that you have to make compromise in order to solve a specific business need is fascinating. When you're thinking about working with these individual stakeholders, you also must have to balance that with the need for standardization. How does Google employee standards, and keep that balance between just getting things done and meeting a customer's specific needs, versus saying this is a definition everyone needs to follow?

 

Matthew Cox (18:11):

Yeah, I think a lot of the times, and again, I'll go back to that same kind of body. So in this counsel, or in this list, this group of stakeholders, we always start with the stakeholders. What is their perspective, what is their care abouts? And then we'll promote a standard. So let's say like compliance, because we have, if you look at the different businesses, we'll decide based on a particular data quality element or data element, what is the North Star we have to achieve for that? And it kind of depends on the expectations from that line of business. So for instance, finance or compliance, I'll typically have a much higher bar for what that data quality element needs to be.

 

(18:49):

But part of it also is our ability to get access to some sort of reference standard. So if you think about ISO standards, right? So there are some things where you can say there's a very definitive standard, but I thought I can pull against and I can refer, and I should make that the standard and everyone typically, as well, on path to achieve that. But there are other areas where there's not a reference standard, but there's a data create standard that we try and put in place. So we say, "Listen, you need to collect this amount of information, everyone likes that standard," and there's some sort of minimal viable set of data that we bring in. And that's where we'll set, what that data quality piece and the kind of standard for it. But it really, a lot of times with the challenges is setting these very hard lines.

 

(19:31):

And so, what we'll try and do is create a transition, almost like an on ramp cycle where you say, "Listen, our North Star for maybe collecting a phone number is this standard." And we begin to create this map out, these high level steps on that timeline to achieve. And I think to the point we talked about earlier about business impact, as long as there's a partnership between the processing group, which would be ourselves and those producers and consumers, that we've set a goal, it's not tomorrow, we're not going to create a crazy standard in place, but give them an opportunity to engage, on ramp, and on board. And we've set those timelines, I found that to be the most important part of the process to get them involved.

 

(20:11):

But then having measures throughout the cycle, to say how are we achieving it while being pragmatic. I think with so many business changes, with so many compliance changes coming through, that pragmatic and practical approach is really going to be important to continue to have that relationship with your business stakeholders. And let them know that both we're trying to achieve, they're trying to win more, and here's the measures in which we look at over time, to see if we're achieving that North Star.

 

Michael Burke (20:38):

And, when you're working with these customers and these stakeholders, obviously there's going to be different levels of scale. Like individuals, and individual organizations are going to be doing their own munging, and work on this data before it makes it into your ecosystem. How do you go about things like profiling at the global level?

 

Matthew Cox (21:01):

So we profiled quite a bit of the data flowing into our environment, and let's kind of separate that into a couple areas. We profile both the source data that's coming in, so think of your first party data that you're creating. So as we receive in sources, we profile that, and I'll talk to why we do that in a minute. We profile the work that we do against that information. So I'm sure many of you're familiar with data stewardship activities. We have manual intervention into the process. We measure that as well, to see what is the impact of the stewarding we did. Because, manual intervention doesn't always lead to a positive outcome. Sometimes it can lead to a negative outcome. And so we try and measure that and profile the results of that data. And then, we also profile our third party data. We don't assume that, I work with a lot of external data authorities, but I don't assume that everything they send us is perfect.

 

(21:51):

They have the same data quality challenges we do, they work through their processes. So for me, profiling's really, really important, because that's how we ultimately monitor and measure the input they're receiving into the overall engine. And what I use that for, I used that, the profiling piece both, I mean, we used some in-home models against those data sets to say, "Is there drift? Are there patterns that we're seeing happening that's leading us toward a a lower quality level of data quality? Are there some things we're seeing that's coming through, that we need to be going and interact with? Or, we need to go back to the business and interject some sort of alteration or some kind of change."

 

(22:29):

But we also use that the profiling and the measures from that as evidence, to go back to all those different groups. So I'll go back to the first party sources and talk to one group. I may even sit down and say, "Okay, sales and marketing, you're giving us this. We see the data levels are decreasing, we're having these problems and this problems." And we leverage our different stakeholders, like say operations may come back and say, "Hey, I don't have to be the force function. I allow those peer, and that peer pressure across our PAs and our partners, to work through those challenges." So they can actually look to each other and say, "Listen, I need you to step here. I need you to alter this or I need you to do that." We look at the overall business process. So the one key profiling for me is very important, because it becomes a point of conversation, a point of measurement that we can put forth into those stakeholders, and let them understand the current status of who's contributing what and when, into the overall landscape of data.

 

(23:22):

But then also from a three piece standpoint, it's a really important measure because we have SLAs. So I want to understand, and I actually have comparative metrics against our different 3P's, to explain what I'm seeing from them, what I'm expecting from them. Are our match rates right, and the enriched data correct that's coming in? Am I seeing their data slowly begin to drift towards some problems? And so, for me it's almost like flying blind if I don't have data profiling going on. Because if I'm not watching what's coming in, I really can't understand why the results are coming out that I'm seeing. So if I have dashboards showing issues, but I don't actually have a way of articulating that back to the source, or the authority, or to the process that generated it, I'm really unable to track that breadcrumb back to or that data lineage back to where I can actually make a corrective action. And so for me, profiling is a pivotal part of what we're doing across our data landscape

 

Michael Burke (24:15):

And, enforcing that accountability. When you've got probably thousands or tens of thousands of sources rolling up into these profiles, how do you communicate with both your upstream and downstream stakeholders? Do you do that through dashboards? Is it obviously, probably an automated system, but what does that look like?

 

Matthew Cox (24:33):

So for the most part it is dashboards. And that's the part that we're moving toward, is more the dashboard piece. Some of it is actually, and interestingly enough, some of the communication actually comes back to us, because as we feed that information back into source platforms, if it's a sales and marketing platform or it's a finance platform, there's evidence that comes through in both arenas. It comes from both the dashboarding piece that we do, reviews. It comes from the evidence that they see when we're putting back into those platforms. And so, I think it's really a matter of saying, "Listen, let's look at the myriad of ways in which this data is being formatted and delivered back into those partner platforms. And then let's use the dashboards as a key indicator of what we're seeing over time."

 

(25:16):

And yes, so from a dashboarding piece, it's really important to have that roll up sequence, but at the same time, there's also both the reality and the perception that comes through when this data is being delivered back into those platforms, that we always are having to deal with and interact with. And so there's always that communication line of, "Hey, I'm seeing this in my finance platform, I'm seeing this in my sales and marketing platform. How is this being affected by what's coming into that process or element?" And then, "What can we do to help improve that over time?" And so again, those all come back to driving questions, which I think is amazing, and I would certainly suggest, everyone that's part of this community be very open to those communications, or very open to those collaborations, very open to that feedback. As I personally, for me it's, "I want to hear."

 

(26:03):

So I think about being at a restaurant, and if the food's not good or the service isn't good, I want to be able to tell someone at that restaurant, "Hey, I'm having issues here," because how else am I going to know that there's some challenges going on, if that information and those communications don't come back. So I welcome, and I'm very open to our partners, our internal partners and even some industry best practice third parties to come in and help us see, give opportunities, give reviews, give critiques so we can constantly improve the overall engine. But at the basis, it comes from dashboards because we use that as a great visual measure. That data visualization into the core competency, that our data quality process is fulfilling today.

 

Chris Detzel (26:43):

Michael, do you mind if I jump in just a little bit?

 

Michael Burke (26:46):

Yeah, jump in. Go for it.

 

Chris Detzel (26:46):

We have some questions that I think they're very interesting. So one question from the audience is, how are you improving data quality and how do you see the augmented data quality as the next level of data quality solution?

 

Matthew Cox (27:04):

So data augmented, so I'm going to assume that's augmenting with third party data sources. So if that's the path, then I would say I'm a big proponent of leveraging third party authorities to augment, or enhance, or to enrich our data because there's a lot of vetting. So there's a lot of trusted third parties out there, that they can help you define your hierarchy. They can help you define clusters of connections between different company records. They can help you identify connections between contacts, contacts to accounts, to companies. So for me, there's a richness of augmentation that you can do when you add third parties to the mix, that's well beyond what you understand from a 1P perspective. So if that's the perspective, I say I'm very active in that. We've signed some very large agreements with third parties with the idea that, as good, as great as Google is and as big as Google is, we can take insight and improvements from third parties to help augment our data. So, it's certainly part of our data strategy.

 

Chris Detzel (28:06):

Thank you. And two more questions, and then I'll let Michael get back to his. Do you manage Reltio as a product with a prioritized backlog, or business user stories, or changes prioritize only via OKRs? Or is there a center of excellence?

 

Matthew Cox (28:24):

Wow. Can I say, all of the four? I think it's a myriad of that. So, I mean, we do. It's really important, because I think it gets back to the notion that you have to understand that it's a shared resource. Your MDM platform, Reltio in this case, is a shared tool across the ecosystem. So do we have milestones based on product? Absolutely. I mean, we're very actively involved with Reltio and Reltio leadership, one of the reasons we're here. So we look forward to the new capabilities that come through. So we do have a product roadmap and how that is, does that get implemented? Do we look at the processing that comes through it, and look at the user journeys, whether it's maybe a supplier domain or it's a customer domain, and we look at how they're trying to evolve? Absolutely. We take that in as how we put that through.

 

(29:08):

Do we look at how we're, the matching in our survivorship rules and how does that play into operational excellence? Absolutely. That's the key part of it as well. And do we look at industry trends, right? Industry trends like ML and other things that are coming in, and how do we do an overall improvement, on how do we actually extend the platform for better use across the enterprise? And I think the other part is the balance. I've got to balance how we leverage Reltio in a way that serves that general ecosystem. I could perfect it for one, but that may actually make it a negative for another. If I make the rules too strict because compliance, does that actually hurt my sales and marketing activity? So I, it's really a series of levers. And so, that's a great question because really they all become levers. And I think the question is just is knowing which lever to pull, based on what is that OKR and what is that objective, so you can meet those accordingly.

 

Chris Detzel (29:58):

And another question that came in, and so some of the questions just so you know, are not specific to data quality, and I'm not going to ask those particular ones, but-

 

Matthew Cox (30:07):

If they're about the ranch, I'm happy to answer that, by the way.

 

Chris Detzel (30:14):

That's a good one. So is your team leveraging Reltio for data quality, like profiling, cleansing, prior to sending to downstream systems?

 

Matthew Cox (30:22):

The answer would be yes. And plus, we're using the dashboards in order to represent the different data quality attributes that we have. So yes, we're using Reltio for all those pieces. Now, we augment with other 1P technology that we're using internally, but yes, we leverage those pieces within Reltio.

 

Chris Detzel (30:40):

And are you using AI or ML as part of your cleansing activities with Reltio?

 

Matthew Cox (30:45):

So we do some ML prior to bringing the data into Reltio. So we'll take signals from ML to try and enrich our matching and survivorship within Reltio. So yeah, we use ML as part of our process, and we're certainly working both with Reltio and their ML objectives, but then also things we can use internally to optimize that sequence.

 

(31:08):

I mean, it's really about modeling. I mean, in my mind we're really trying to move away from the very rigid survivorship rules. You set a data, you set a survivorship and you just, that's the way it is. We're really trying to make it more dynamic and we're realtime adjusted, based on what we're seeing. And ML models help us to do that. So we can see if one source is drifting, or one attribute is drifting, we move it to another source or we move it to another one, all as part of the model and not having to go back and look at just adjusting rules on some sort of periodic basis. So I think there's a lot of opportunity there, as relates to both data going into Reltio, but then also how Reltio processes what's been sent into it.

 

Chris Detzel (31:45):

Okay, thanks. There are some other questions, but I'm going to hold off on asking those Michael, so if you want to go ahead. Thank you.

 

Michael Burke (31:51):

Yeah, no I mean, I think that it's so interesting, this shift that we're seeing more and more across organizations, of moving from rules to models, and the sheer complexity of data and how it's grown over the past five years even. We're seeing more and more edge cases that can't be represented by rule based systems. Speaking of that, go ahead.

 

Matthew Cox (32:13):

Just to add that Mike, so the other piece is that I want my rules to be dynamic, based on my source changes. And so we're constantly improving sources, we're constantly working to get better 3P engagement, or maybe other alternative 3P's to come in. And I really want the data to stand on its own. I want the data, and I don't want my interpretation of what I've seen, and just have some sort of periodic review. I really want the system to learn from how we're, and again, based on a journey, there's a data journey with every source.

 

(32:47):

There's a data journey with every 3P, there's a data journey with our processes, and I want the system to be able to respond and react to that as we move down that process. And not be a situation where, we learn that we've got to come back and spend a whole lot of time updating a hard and fast rule set, then we go back and learn. So, that the time to measure, that time to value of seeing how data quality is incorporating into the overall process is much more improved, where I can make it dynamic through models, versus a rigid rule set. And that's what we certainly see as the future.

 

Michael Burke (33:15):

Absolutely. I couldn't agree more. On that note, using machine learning and modeling and bringing in all of these sources, there's kind of a balance between strictly regulated and deregulized models, and using models to help make proactive decisions. How do you communicate these with your governance board, and what does a conversation look like when you're trying to assign access, or talk about data quality or changes to data quality? What does that look like from a Google perspective?

 

Matthew Cox (33:47):

So from a broader side, we do have data governor's boards. We don't necessarily call them data governance boards, but we have these forums in which we review. And most of, our real approach to those governance boards is really about again, the strategy. Where are we trying to go? What are we trying to achieve? And so in those environments we'll talk to how specific models, whether it's a model we're putting in place, or a rule change we're putting in place, or it's a new compliance that's coming through. Begin to discuss, how are we shaping the data quality movements? How are we shaping the changes we're making from a data processing standpoint, or a data compliance standpoint? And how is that moving for us, forward on that toward our data strategy North Star?

 

(34:31):

It's really important from, a lot of times our stakeholders, I go back to that data literacy, they're not living our world, as data processing experts or data enablement experts. We really have to begin to translate how these changes, whether it's model updates over time, or it's source changes over time, or it's data quality improvements over time. How is that moving us toward a strategy, so they begin to see the needle move? Because in most of the cases I've seen, what they really want to understand is, "All that's great, I really appreciate the fact you're standardizing on this attribute, but what does it mean to me? Does it help us achieve the level of compliance that we don't have before? Does it help us actually drive toward an OKR, or drive toward a business process, or towards some sort of understanding we didn't have before?"

 

(35:19):

So we really envelope data quality into data strategy, and really talk about, how does data quality move us more than just, "Okay, let's talk data quality." Because, I think a lot of times there is a challenge around, how does data quality translate to an understanding? And if we can involve that into strategy and involve that into data process, it makes it much easier for them to consume. And that's what I see in our data, our governance board meetings. And we have architecture boards, we have governance boards, we have functional boards, we have a number of boards we go through. But I find that the more we can translate that into the overall roadmap and overall strategy, it's easy for them to consume and understand the impact

 

Michael Burke (35:56):

It's amazing how much alignment, I'm sure, has to take place to drive those changes. And it's interesting how you've set up with this embedded mindset, an ability to create alignment faster. And that brings us to another point that I think a lot of people are interested in and probably why they came to this meeting, is this scale. Google is operating at a scale bigger than anybody else, likely in the industry. So what differentiates how Google operates, and compared to the average enterprise?

 

Matthew Cox (36:28):

And it's interesting, because we talk a lot about Google scale, and I think we talk about how does ... So, let me kind of break down some things about how Google operates from that stone, because we do have scale. But the other part is, what makes us a little bit different, I think, for most of the enterprise organizations I've been around, whether I've consulted or I've been an employee at, is the variety of ways in which we go to market, and the ways and of the different types of products we have. Whether it's have some organizations that are pure B2C, some that are B2B, some that are mixed, some that are go through partnership channels. We have various levels of data and data expectation between, and again, Google being a very entrepreneurial company, there's multiple business types.

 

(37:16):

So I think for me, on the scale of processing is one part, but for me really, the variety of data, the variety of data expectations, and the variety of data quality that's being generated by those different organizations and the different types of businesses that they go to market with, to me that's probably what makes it. If there's a scale, it's the scale of complexity, and the scale of variety that we have to deal with within one organization. That probably merits the biggest challenge for us. I mean, the just scale of processing, I mean we have Google Cloud. So I can scale and scale, and scale and just continue to add virtual machines. But for me, it's more of, it goes back to what we talked before with the stakeholders. It's really, the challenge is how do, you're aligning all these different various organizations, with various expectations, and various goals towards some kind of common standard, and some sort of common activity, some sort of common measurement. That I think is probably the Google scale challenge that we have to deal with.

 

(38:16):

And since our remit is across all those organizations, it does tend to be probably one of the biggest challenges I have to face, is getting that alignment. And that's where it's really an achievement with some of the stuff we've done even the last two years, at beginning to mold these enterprise level views, and enterprise level standards, at enterprise level agreements. That's probably the things that I'm probably most proud of in the last two years, is really getting that alignment across such various different types of organizations, and actually bringing that down into a data landscape that we can begin to produce for them, in a way that's consistent across the ecosystem. So, that to me is the Google scale challenge. Google scale challenge, more than volume, is complexity and inconsistency.

 

Michael Burke (38:55):

It's so interesting, too, because I think that even at in much smaller enterprise, we all struggle with that alignment. Can you give any advice to any additional considerations or points that these data stores and folks that own smaller master data management platforms could use to gain alignment across their business stakeholders? Is there any kind of common template that you follow on these types of procedures?

 

Matthew Cox (39:19):

So there's hard to say there's a template. I think a lot of cases, there are forms in which you can communicate. What I think what I have found in my time is really, I think it's, we talked about embedding data quality. For me, it's about embedding yourself into these groups as well. So whether you're a small or medium or large, it really doesn't matter the scale. The question is, are you in the middle of their world? Are you understanding what their passion is? Are you understanding what they're trying to achieve? And are you able to bring those groups together in a way that says, "Listen, we're really trying to achieve a global outcome, a collaborative outcome," and that's at any tier, right? I've seen it from all the different forums. It's really about a dialogue.

 

(40:02):

So, it's funny because we talk a lot about the data models and the technology, but it really comes down to relationship in a lot of cases as well, when you're trying to make movements and make progress towards some of these data objectives. For me, first and foremost, it's about relationship. Then it's about understanding North Star, and then you come with all the templates and everything else that follow that. But getting into their business, letting them understand that you have a care about how they're successful versus just, "Hey, I'm meeting my data objectives." It's really about meeting their goals and then loping your goals into that, is how I found success at the time, in my career.

 

Michael Burke (40:37):

Really interesting. We have a few questions from the chat here. One of the first ones was, how do you monitor data quality after profiling? Once data's in production, do you leverage reactive versus proactive approaches? And if you're proactive, do you partner with any sources to help building and maintaining that data quality monitoring?

 

Matthew Cox (41:00):

Absolutely. So again, we can't, kind of like data quality is an ever evolving thing. You can't fire and forget, per se, in your data quality profiling. So we actually do, there's a couple things. So we do in our ML models, we do the drift. We do drift comparisons. How are we seeing the data that's actually being stored, versus data that's being referenced? So you have to monitor over time. So number one, it is looking at your 1P data, and having that data profiling and making it a live activity and embedded activity, but very much from a third party standpoint. One of the challenges with engaging a lot of third party authorities is the data getting still? Right. I mean there's a lot of, you have to build into your process a monitoring capability. And with our 3P's we have a monitoring sequence, so they will tell us on a real time basis, something has changed within their dataset and then we have to flow that through.

 

(41:51):

So, it's not enough, to hit to that point, is a really a great question because it's it to the point, it's not just enough to say thank you third party, I see a change, I made a change into the data set you sent me. I now have to make that retroactively available within all the platforms that I originally put that information through. So if I don't send that all the way through, I'm now inconsistent, I'm stale, and I've just basically began to break down. I'm drifting myself, because I'm not continuing to update the data set.

 

(42:19):

Now it gets a little complicated because there's certain groups that don't want necessarily, the real time changes. So you think about territory changes in sales. They don't want to know every instance something changes because they have a plan. So I think, again, you've got to understand the persona, the expectation and the goals of those different subscribing consumers of data. But absolutely, you need to be monitoring post profile and post receipt from third parties, what that data is. Or eventually your data is just going to continue to fade and drift from what your standard is.

 

Michael Burke (42:50):

And going off of that, do you collect feedback? I'm assuming you do from these third parties, these downstream consumers. And if so, how does that make it back into the mix, right? Because your downstream consumers could be different from the source system providers upstream, right?

 

Matthew Cox (43:06):

Correct, correct. So there's really two parts to that. So when it comes to our first party data, I'll bring those consumers and producers together. So we have that conversation and then there's a chance for them to communicate the expectations. And really, in several cases it's more of, "Here's what I'm experiencing and here's the challenges I have. How can you help me as a producer? Is there additional piece of information? Is there a level of standards you can apply? Is there a process change that can help me not have to fix or change or do something downstream?" So we have those conversations. But I think, one of the very important things to do, and I think this is part of third party management, which I have, I manage relationship with our third party data providers, is taking that same conversation back to the third party.

 

(43:48):

So, I'll take our consumers and have a direct conversation with the third party agencies, excuse me, as well, to give that direct feedback, "Hey, I'm seeing match rates are down, I'm finding this data's been stale. I know this company moved and I know they're out of business, and I'm not seeing the data." So I try and create as many channels of that conversation so that the consumers receiving everything, feel that they have a voice. First of all, they have a voice into how the process is fulfilled, but also as a lesson learned, or a retrospection that goes back to those parties.

 

(44:17):

Because from my perspective, I expect improvements from my third parties. I do. I expect them to take that feedback and I want to see material changes through the process and understand, because at the end of the day, I'm here to serve my consumers. I'm a processor, I fulfill, I gather things for consumers. But at the end of the day though, or the producers. But at the end of the day, those consumers are who I'm ultimately serving, because they're seeing the benefit or lack of benefit of what I'm producing through this processing.

 

Michael Burke (44:47):

Really, really interesting. And so with providing that feedback back to the consumers, do you also track OKRs associated with the performance of those third party data providers?

 

Matthew Cox (44:58):

Absolutely. Absolutely. So batch rates, enrichment rates, monitoring rates. So we monitor, receive updates. What are those looking like? I can't get too much into details, but we have a very broad data set that we've incorporated, 1P and 3P data into. And so it's, we're measuring not just, so we have different levels of measures. We measure what we're receiving in through third party. We measure how we're articulating that. How we were actually able to combine that information and then receiving input from the consumer.

 

(45:29):

So it's like a data lineage metric. You're actually looking and seeing that lineage, and how it flows. How am I seeing its segment, and how am I measuring it as its segments to the organization is a really important part. Or you're eventually, it's going to be like a black box. You're not going to see into what's happening. You'll see some changes happening at the consumer. The producer's claiming certain things, whether it's a 3P or one 1P. And you're kind of in the middle going, "I don't know, I know what happened." And for me, I want to have as much evidence as I can through that process. So at the end of the day, the goal is not to point fingers, but the goal is to solve the riddle. The goal is to fulfill the expectation, and that's why we track and profile.

 

Michael Burke (46:07):

So you've obviously got a super complex role ahead of you between relationships that you're managing, data quality, OKRs, governance. Tell me a little bit more about your farm, how you track data quality there. You said that you've been implementing a lot of tech. I'm throwing a curve ball, now.

 

Matthew Cox (46:24):

Yeah, you are. Yeah, yeah. Yeah.

 

Michael Burke (46:27):

Tell me about it.

 

Matthew Cox (46:28):

So we do two things. So we're doing measurements on weather. So we're tracking weather materials. So there's data sources that you can track weather. So we're pulling that information in and gathering it against our crops, what we're seeing from production from that standpoint. We have metrics about our animals. So the animals we're receiving in, are we having any issues with health, or has there been, just use a die off? Is there anything that we're seeing that is problematic to that particular species, or that particular where we receive, because very much like from a data standpoint, we receive from other organizations, our products. We receive our ducks, and our chickens, and our pigs and our cow. We receive those through, because we're just literally creating all of this now. Eventually, we'll be producing our own, but we're measuring that. How good is the health that received. If we bought this stuff from this particular source, how's its health been?

 

(47:25):

So we're pulling that through. We're pulling barometric stuff, we're pulling weather information, and we're beginning to track with drones, which is really kind of fun. So we're now having drones fly over the property. We're taking videos of that and having that track through, because we're looking at consumption of energy, we're looking at consumption of, we have propane and other things. So yeah, it's really interesting how all of those become data elements to come in, and you begin to understand how does weather affect, how does this affect, does that affect, as we make plans for 2023. So, my cow and my bull, I'm a producer, I've been a producer. I'm like all of it in one. So, I get to critique myself in on the farm. So, it's pretty fun.

 

Michael Burke (48:00):

All right, Detzel, next talk that we do, we have to go to Matt's farm and just get a tour of this. I think the whole audience, if you're interested in this, please comment in the community chats, but we should totally do something. I think this is just super interesting, and seeing a little plus one here. That's great. If there are no other questions from the chat, that's all. I don't know if you had anything you wanted to wrap up with, but Matt, thank you so much. This is super cool and your insights not only at Google, but on your farm, are really intriguing, and I think definitely eye opening. So, thank you. We really appreciate the time.

 

Matthew Cox (48:36):

You bet.

 

Chris Detzel (48:36):

Yeah, this is really, really good. Thank you, Matt, and thank you, Michael for the questions. And the audience, thank you so much for participating in this. There will be at the end, make sure you take the survey just for different topics, how you liked it and those kinds of things. So whenever you leave this, boom, it will just pop up automatically. But thank you, Matt, and thank you, Michael. I'll be on for the next minute or two, just in case there's some other questions or some thoughts. But wow, this is really good and I can't top that last question, so we'll just keep it.




#dataquality
#google
#Featured
#communitywebinar
0 comments
2714 views

Permalink