Reltio Connect

 View Only

Data Mesh, Data Fabric and the role MDM in Modern Data Management - Show

By Chris Detzel posted 10-14-2022 14:23

  
Data Mesh, Data Fabric and how Master Data Management fits into these architectures


Find the PPT:  Data Mesh, Data Fabric and the role of MDM in Modern Data Management

As big organizations become more agile, centralization is becoming more as a thing of the past. We are starting to hear terms like Data Mesh and Data Fabric. Ansh Kanwar, SVP, Technology, Reltio will he help answer questions around terms like Data Mesh, Data Fabric and how MDM fits into these architectures. This session will be a Q&A session and we will welcome questions from the Community.

Some questions will include:

What is Data Mesh architecture?
What is Data Fabric?
How do the two intersect?
How does MDM fit into the data mesh architecture?
What types of orgs are using data mesh and why?
What questions do you have about Data Fabric, Data Mesh and the role of MDM in Modern Data Management?

Check out the Reltio Community and ask your questions there:
https://community.reltio.com/home



Transcript

Chris Detzel
(00:00:08):

Well, everyone, thank you for coming to another RELTIO Community show. My name is Chris Detzel, and I'm the director of customer community and engagement. We have a special guest today, Ansh Kanwar. He's a SVP of technology here at RELTIO, and our topic is Data Mesh, Data Fabric, and the role of MDM in Modern Data Management. And we're really excited about this show because we've not done one on it as of yet. So super excited. Next slide. It's funny, me telling Ansh next slide. So as usual, keep yourself on mute. All questions should be pushed into chat. I will make sure to ask Ansh any questions that you might have around the topic after the presentation so that we can just get some of that done. And then even after the presentation, if you want to take yourself off mute, please do that as well.

 

(00:01:10):

So this is a conversation and we want to get some of your questions and answer some of those the best we can. As usual, this will be recorded and posted to the RELTIO Community, hopefully by the end of day today, and/or by Monday. And next slide. So in addition to this particular show, we have several shows coming up. We do have a technical show coming up on RELTIO Name Cleansers for Matching. That's going to be next week, on Wednesday. We do have three shows coming up that our customers are going to help host, and one of them is how Google is improving their data quality. That's going to be Matthew Cox and Michael Burke here at RELTIO, so Matthew Cox is Google.

 

(00:01:58):

And then on November 3rd, we have Manage Your Core Product as a Product, which our product leader, Venki will be sharing some stuff around that. And then we have a case study with Qlik, Driving Qlik's MDM Program Success. You don't want to miss that either. And then we have another customer doing Data Quality Management, Commercial Pharma MDM Landscape with Takeda. So lots of really great stuff coming up with the real community. And then who is Ansh? Ansh should I just kind of talk a little bit about that or do you want to say something about who you are?

 

Ansh Kanwar (00:02:40):

Oh, I can take it over from here. No problem.

 

Chris Detzel (00:02:42):

Awesome, to you bro.

 

Ansh Kanwar (00:02:45):

Thanks Chris. Little bit about myself, I'm not going to read my bio, but I head up technology for RELTIO and it's been a great pleasure and a privilege to be part of this big data management space for the last theater and a half. Previously, a lot of my background has to do with everything SaaS, so delivery via the cloud in sort of high performance kind of spaces. And here I'm responsible for engineering for our core platform. So I know Chris has built me up a little bit like an expert, but I just want to come clean.

 

(00:03:27):

I am learning as much from you, as you may be from me or this presentation. Really the goal of our community is to bring all of you in as practitioners, as people who are really close to the wire, who live and breathe the concepts or the pain points that we're going to talk about every single day. And we would love nothing more than to have your opinion, your point of view on some of these topics. Today's presentation or today's topic is a little bit different. It's fairly broad. We're talking about not specific products, definitely not vendors, well a little bit of RELTIO, but not generally about vendors. But we're really talking about broader industry concepts and emerging concepts. I have a few. If I come to a talk like this, one of the things I really want out of it is are these breadcrumbs that I can go and follow up on in terms of references or things. So we've tried to pepper our conversation with that so it's not a lecture, but really a starting point for a journey of discovery for you and for all of us together.

 

(00:04:43):

So with that preface, let's get into it. And I was, when I started thinking about this, I wanted to step all the way back to first principles. We as practitioners in this space, vendors, practitioners, people who think about data every day we wake up, in the end, what do we want? We want data to drive actions. Between those two end points, there are a couple of steps that we go through either explicitly or implicitly in, once we receive the data, we somehow either impute or compute meaning to that data. And to do that, we need to understand the context from which that data has arrived.

 

(00:05:37):

We all talk about insights, we talk about going from data to insights to actions. But I think this critical step of deriving meaning, contextual meaning that sometimes is implicit in conversations, just wanted to say that out aloud. That leads to insights, and insights leads to action. To me, insights really are, if I do this based on my understanding of the meaning, something else will change. So a simple example we've laid out here is the element of data you receive is we sold some units of some product. The context which is important to understand what that event means is what are my inventory levels for that particular product? What is the threshold at which we reorder that from our suppliers? What is the price in the market right now, if you were to order?

 

(00:06:33):

Is there a spot market that we could leverage or this is in terms of then deriving insights from this, we can say, given the context, given the meaning, we can either decide that the next best action is to place an order or to delay and deal with absence of inventory for a little bit because that is the better ROI. And then in terms of action, once the next best action is clear, you go execute that action. And the beauty of what we do is that every single action will create more data, will create more events, will create this feedback loop that in an ideal world, we're going through this feedback loop thousands, millions of times, an hour or a minute, just depends on the context. And that continuously provides value to our businesses and helps them solve the problems of either growth or if driving efficiency or ensuring compliance or any other sort of business objectives.

 

(00:07:42):

So the reason I laid this out is as we talk about, we'll get very specific on data meshes, data fabrics and so on, but it's this focus on meaning that really is driving a lot of those architectural concepts. So what we're trying to do is fairly simple. It's like you want to get slim, well eat less, exercise more, right? Sounds pretty simple, but really the devil's in actually the execution, and you all know perhaps even better than me, all of the complications that get in the way when you're trying to achieve that set of actions that lead to business outcomes. Data's growing. Mountains and mountains of data everywhere siloed. It follows Conway's law. Your data is siloed according to your organizational structure.

 

(00:08:38):

More and more we recognize this, that this data is always moving, it's always morphing. We are transforming it, systems are transforming it, and invariably a lot of it is moving to the cloud, whether that's cloud data warehouses, cloud data lakes, it's SaaS solutions or platform solutions or solutions like RELTIO. There is a secular and massive movement to the cloud. All of your enterprises are also demanding speed from you. It's a competitive advantage if you can get from that data to that insight and the action faster than your competitor. And your innovation teams are demanding the ability to experiment faster, not just if once you set up a flow from data to insight, you are done. It really is about experimenting, about, especially in the context of machine learning, of being able to tweak models so that you can constantly optimize or you can constantly feed that growth engine.

 

(00:09:48):

And the third big complication from my point of view, which vendors like ourselves don't help sometimes, is buzzwords. Every where there's buzzwords, especially as we started getting deeper into some of the data fabric, data mesh type concepts, terms overlap. Sometimes terms, the same term means two different things, or sometimes two different terms may mean the same thing. So it is confusing, it is evolving, but I think the core concepts are clear enough. And if as we go back from, especially from vendor materials go backward to some of the origin materials, I think the intent of these approaches become very, very clear and what they're solving for become very clear. And hopefully I can walk you through some of that today. So with that, let's go back a little bit to meaning. And I love this quote by Reinhold Niebuhr says, "Every time I find the meaning of life, they change it."

 

(00:10:51):

And that really is a tongue in cheek reference to how hard it is to consistently and find meaning in our data, which is consistent across an organization or across an industry. And the key insight, especially that the data mesh folks had was that meaning requires a deep understanding of the context or the domain, to be able to get to the right answer sometimes you require years and years of understanding of the specific sub-industry, the vertical, the company, and be able to put it all together to then move on to the insights and actions. So keep that in mind. So talking about source materials, the most interesting document in this space is how to move beyond a monolithic data lake. And this do a distributed data mesh. This is from Zhamak Dehghani, who originated this idea of the data mesh. But I think the problems that she highlighted really are general enough.

 

(00:12:06):

You can level them up and you can think of those as the problems that need to be solved in the context where we were coming from on-prem data warehouses and data marks to data lakes, which were much less structured and much bigger and starting to move to the cloud. To then, well, that set of technologies didn't really solve all the problems the way it was built to. And what are those failure modes? In that space, the failure modes then led to tall promises with the actual business outcomes not matching up to those promises. So that's where this started. The promises of the data lake, what else needs to happen to be able to fulfill them at scale. And that scale piece is very important because where this started was big data, really big data. But as we can discuss perhaps in a little bit, the implications of these architectural patterns actually apply to even the smallest shops that are trying to derive value from data.

 

(00:13:24):

So three problems. First one has to do with that big, in the big data. The focus of these data warehouses, data lakes is to aggregate all the data into one location. Which is fine, as a central location where you could do things that is fine, but the operational systems have spent the last 15 years going in the opposite direction. Operational systems have become more and more domain specific. There is a billing system, an accounts payable, an inventory system. All these systems, we've been constraining the boundaries of these systems so that their behavior can be more predictable, the interfaces that they present can be more predictable. And most importantly, the investments that we make either in buying these systems or in building these systems, those can be, the ROI on that can be easily determined. So this, the opposite direction of, it was the opposite direction from centralization. And so the motion of going from these operational systems into the data warehouses, data lakes by design was a little orthogonal.

 

(00:14:44):

The other thing that was not really fully aligned was we broke down generally getting data into these data lakes into three steps. There's the ingestion step, processing that happens within the analytics engine, and then data is made ready to be served out through whatever interface. And if you think about it, it doesn't matter what domain you are servicing, every domain basically needs to go through the pipeline in those three steps. So it became a bottleneck where any change needed to go through the whole system as opposed to being confined to a particular domain and having to create as little change in the rest of the system. The third source of this unfulfilled promises has to do with people. It has to do with, we then mapped teams roughly speaking to teams that were pre the data lake, big data teams that were hyper-specialized in the mechanics of the data lake and then teams that consumed outputs from the data lake.

 

(00:16:01):

And this is the single biggest factor, connecting it back to meaning, who derives meaning from all the data in that central data lake, that falls to this big data team. The problem is the big data team, they're not domain experts, they're platform experts. They know how to run the data warehouse. So again, this pull into opposite directions where this specialized ownership of the technology was at odds with the need for a domain specific understanding of meaning. What is the data that's coming in? And teams adapted. There are organizational structures, sort of multilayer structures where you have a platform team. And on top of that, you effectively have domain teams and think of the discussion that is to come as more of a formalization and a recognition of the need for that. Okay, so we're going to take a little bit of a pause here because I described three different problems. And what we would love to know is you as practitioners, do you see this in your organization? Are you experiencing this today? These sets of issues today that the distributed data architectures I'm going to talk about they try to solve for.

 

Chris Detzel (00:17:29):

So I'm trying to put the pull out and it's saying I'm logged in some other place, so I want to apologize.

 

Ansh Kanwar (00:17:40):

No, I see it up here on my screen.

 

Chris Detzel (00:17:42):

Oh, you do? Okay.

 

Ansh Kanwar (00:17:43):

Yeah, yeah. We'll give you another 10 seconds here.

 

Chris Detzel (00:17:50):

And maybe you'll have to, or maybe you did it and then-

 

Ansh Kanwar (00:17:54):

No, I didn't touch it. It magically happened, Chris.

 

Chris Detzel (00:17:59):

I mean, I pushed something, but it looks good. Okay, good. It worked for you. That's good.

 

Ansh Kanwar (00:18:04):

So we have right 10 more seconds. We have 36 responses from 70 participants.

 

Chris Detzel (00:18:13):

Great.

 

Ansh Kanwar (00:18:14):

  1. All right, we can close the poll now. So of 37 responses, 38 responses that we got, 62% of you identify with these issues. 62%. That's amazing. That's amazing. 35% of you said not sure which we could count either way, and only 5% said no. So there you go. That's a demonstration A, of the power of our community, but also of a validation of this really being an issue that is worth solving and that really is what we're going to talk about next. Ending the poll here and sharing the results out with all of you. All right, So let's share-

 

Chris Detzel (00:19:07):

I can share the... Okay, I don't see the share, but maybe everybody else does. I don't know.

 

Ansh Kanwar (00:19:13):

Can someone confirm that you're able to see the results of the poll?

 

Raj (00:19:18):

Yes, we can.

 

Ansh Kanwar (00:19:19):

Okay. Raj, thank you very much. I'm going to stop sharing and continue back with the screen. Okay, so let's descend one level deeper now, data mesh. This concept started in 2019 with a paper by Zhamak, and the definition that she provided early on is data, or actually this is more contemporary, but it says, "Data mesh is a decentralized socio-technical approach in managing and accessing analytical data at scale." It sounds good, but I like the next definition a little bit better, which is also by Thoughtwork. So it's kind of the same group of folks, but I think for at least the way my brain works, the next one is easier for me to absorb. So, "Data mesh is an analytical data architecture and operating model where data is treated as a product and most importantly is owned by teams that most intimately know and consume that data."

 

(00:20:27):

So the second part of that definition is what Zhamak means by the sociotechnical approach. This is a pattern, an organizational pattern, as much as it's a technology pattern that implies cultural change and a very specific approach to data ownership and processes that are built around that data. Specifically I think, when I was reading through it really seemed to me like the transformation we went through in software development maybe 15, 20 years ago as agile was becoming normalized and absorbed. That's very much the approach here where people who know the work need to own the work. That's sort of my take on it. I think, I am not so sure that I would classify the data mesh as an analytical data architecture. I think architectures are emerging. I think there's sort of a variation of event based data meshes and other data meshes that other types of data meshes that will emerge and maybe reference architectures will emerge.

 

(00:21:42):

But I think just as the original concept, I think it's too much to call it a data architecture specifically. But there are principles, some very, very clear principles that define a data mesh. And here are the four. First is domain ownership. The book reference there is Eric Evans book of Domain-Driven Design, one of my favorite books of all time in computer science. And this was I think the precursor to a lot of microservices and sort of decomposition of monoliths into business domains that then you could connect very easily to business value as opposed to more sort of inward focused designs from the past. So domains, as I said previously, they offer the sort of bounded context. You're solving a very specific problem and a very specific interface to other domains. A team is mapped to a domain. A domain is not a thing that floats out there by itself. It is.

 

Speaker 4 (00:22:51):

Here's what I found...

 

Ansh Kanwar (00:22:56):

I apologize. I think it was my watch. So a team owns a domain, A team owns a domain and the stack, that's the point to the right of that is they own a domain as a product. And we'll talk about that here in a second. And these teams are responsible for serving their data sets in a consumable manner. It's not a central team somewhere. It is the domain's team responsibility to ensure a high quality experience for whoever is consuming that data. And so this inverts the flow from being a push where you're just as a domain team or as a organizational unit, you're pushing data into the central data warehouse it inverts it from the responsibility from that to where you are now the owner of your data domain and you are essentially publishing a product that others can pull as needed.

 

(00:23:57):

Again, maps really well on the development side, the Git model of distributed source control, where a pull really is the basic unit of interaction between different systems. Second principle, data is a product. So data sets in simple terms, it's data sets that are exposed via APIs including the metadata that comes with that product. And sometimes may even, sorry with that dataset sometimes may even include statistics or details, quality data about that dataset. So dataset, descriptive metadata, perhaps statistics about usage, freshness or other aspects of that dataset and encapsulate all of it together that's delivered, that's data as a product.

 

(00:24:49):

And then on top of that, you apply product thinking to this product. So you make sure that your product is discoverable by the right consumer. So the technical solve for that is published it into a data catalog, the low end version of it, stick it on a wiki somewhere, but somehow everybody in the organization has to understand how to discover that data. And once they discover that data, it has to be addressable. How do I connect to it in other terms, what is the address, the URL type or whatever else is needed to get to that data, those components. That product has to be trustworthy. So as you grow your data as a product capabilities, you are able to apply service level objectives about the freshness of the data. Let's say you're able to provide clarity on the provenance, be clear about the lineage so others can take smart decisions based on where the data came from.

 

(00:25:56):

And you are clear about other aspects of quality which will help others make decision as they consume your data. This data is self-describing. It's both from syntactically and semantically, but also an advanced data product will bring with it some sample, some sort of examples of how to consume that data that either maybe in documentation or maybe in code. And then finally, aspects which are common between data meshes and fabrics, sort of these global standards around interoperability, some agreement on formats and security and acceptable levels of documentation and so on. And then observability, which kind of wraps around this, which allows an operations team to really be able to monitor these data products as they're kind of proliferating throughout the company. Our friends at Monte Carlo have some really good write-ups about that if you're interested.

 

(00:27:02):

The third aspect is self-service data platforms. And this is the tricky bit. I think this is the judo bit that all of this needs to be built on some platform. That platform may be a data lake, it may be something like Databricks, it could be a number of, the tech stack could be a number of different things as long as these sort of high level principles are implemented on that data platform. And I think the best, most interesting at least description that I've seen is thinking of the data platform as this sort of low level infrastructure plane where you store your data and you have a query engine and you optimize the heck out of those queries and whatnot. Above that, you have a productivity plane for all the folks that the domain developers and owners, product owners who are publishing their data products into this platform.

 

(00:28:05):

So now I can imagine it's a combination of maybe a data catalog and a data warehousing solution, data lake or something like that. And then another plane on top of that, which really allows for this discovery for somebody to browse and understand data sets before they engage with them fully and perhaps enforce this layer enforces governance and other security policy. There's an interesting area that I'm not very familiar with yet, which is data contracts. It's super interesting. I'm exploring that right now. I would be happy to share with all of you what I discovered there, but that could be a very interesting way for that supervision and supervisory and integration plane to be able to understand what contracts are implied by different data products. And then finally-

 

Chris Detzel (00:28:57):

Ansh?

 

Ansh Kanwar (00:28:58):

Yeah.

 

Chris Detzel (00:28:58):

Quick question and I know I promise to do it at the end, but it seems very relevant to this. What's the definition of domain? Sometime same data is shared across domains like HTTP, data is shared in commercial for incentive compensation and like R&D is clinical trials domain. So can you explain that a little bit?

 

Ansh Kanwar (00:29:23):

Yeah, absolutely. I think a domain is a fuzzy concept. It really is defined by what your goal is and to get to that goal, what are those discrete units that can be simple enough to understand and process and explain the ROI on versus a different domain that they can talk through, through an interface. So I have some examples here in a second and give maybe three minutes to explain the federated governance piece and we will come back to that because that is a very important question and I think the way Zhamak has broken it down into different levels of domains, I think that's a very interesting way to look at it. So federated governance. So this is no different than any other conversation about governance across a complex data landscape. So we want, our end goal is to effective effectively be able to operate this mesh. And that comes from the definition of SLOs and what's acceptable in terms of documentation.

 

(00:30:37):

The quirky thing here, or not quirky maybe wrong word, sorry, the specific thing to data mesh is that the governance body is not a tops down body. The recommendation is to form that based on the different domain owners, product owners, and sort of a guild model, if you will, from the Spotify lingo. So getting to your question, Chris. So the domains, they are divided up at least from a data mesh perspective into source aligned, consumer aligned and aggregate domains. So their source aligned domains are sort what we've been talking about so far. That is what comes from the operational systems. So a call center is producing data about the calls that it is receiving, who is calling, how long the call was, what was the disposition of the call, and so on and so forth. That's a domain. Your accounts receivable, your invoicing folks, they're producing constant data about invoices came in, when were they closed out, when were they paid out, and so on so forth.

 

(00:31:43):

Your shipment, invoicing, fulfillment, they're producing data about your warehouse when stock was received, when it went out, and so on and so forth. So these are source aligned domains and what they do is they give you this nice boundary that maps directly to the operational system. On the very top, our consumer aligned domains, and these are the ones that may be tricky because we normally don't think of funnel analytics as a domain. But you can, because again, go back to the definition of what's in data as a product. It has to do with a data set that may be computed or derived from any of these other domains. It may have source data that takes in enrichment and enrichment from third party data sets, but it is a distinct product and that really is of your funnel analytics domain, if you will. You can think of revenue recognition as a separate domain. You can think of this dashboarding trending that helps with long-term planning as a separate consumer aligned domain.

 

(00:32:58):

So it's a little bit of a specific definition for the data mesh, but I like this mental model for thinking about domains of generally speaking. Now the interesting thing from our point of view from a RELTIO specific point of view is that aggregate set of domains in the middle, which really allow you to derive meaning at one level that is above the operational systems because the operational systems by design, they have a fragmented view of the world. So I think this then helps you stop worrying about data fragmentation as a problem and start seeing it as a pattern. Yeah, there is fragmentation, it is by design because people can be focused and can think about the call center related activities or the invoicing related activities, but then there is an aggregation layer that sort brings all of it together and think of that as a set of data products in itself.

 

(00:33:54):

And a couple of examples here are some sort of a segmentation or recommendation engine that may be based on machine learning or something like a customer 360 view that can be built of out of a product such as RELTIO or a platform such as RELTIO. And these products are ready to go. They're always available up-to-date for further systems that consume this data and really are the closest to solving for that business initiative. Yeah. Does this answer Chris, your question? I just want to take a second to go through that.

 

Chris Detzel (00:34:33):

I think. So there is another question kind of on the same lines and maybe you answered it here. Do domains correlate to master data domains as we know them? Or what is the relationship between them? Maybe you answered that here in a minute.

 

Ansh Kanwar (00:34:48):

Yeah, they correlate actually very, very nicely. For data that can benefit from being mastered, that is core data, definitely the domains can map one on one and the data sets can be within a domain, a model the way we understand a model in the MDM space, that's the data that is shared within that data domain as a product.

 

Chris Detzel (00:35:16):

And just some thought provoking stuff here with maybe some of your thoughts around it. Sandeep says, "There are organizations which are aligning to the intelligence enterprise paradigm. So fundamentally they no longer organize themselves around functional silos and instead align teams on an end to end business process. How would data mesh fabric lend itself to that?" Your thoughts on that?

 

Ansh Kanwar (00:35:49):

To me, that business process is a beautiful example of a domain because a team is limited by the specific, not limited to, they're not limited by limited to, the specific problem they're solving. And therefore if you look at it from a data analytics perspective, they're producing a data set that is very aligned with the problem they're solving. And that is the definition of domain in that context. And what something like a mesh or fabric solves is one level above. How do you bring all of that domain specific data together at the same time leaving the control down to those end-to-end teams so that they can manage their data because they know what is best for that data. I think it's actually perfectly aligned that organizational structure.

 

Chris Detzel (00:36:36):

Yeah, thank you. And then last kind of question here, if we aggregate data from multiple domains and create a product, so I.E. master data, how does it align with distributed governance?

 

Ansh Kanwar (00:36:49):

Yeah, actually we had a good conversation with Manish Sood, our founder and CEO on this topic of a few months ago. This notion of governance, how does it work with mastered data and then are the two concepts compatible with each other? And I think in terms of governance, we really want to be clear about who has access to which data, under what circumstances, why, and so on and so forth. And either you can apply, it's not an either or, where data is in a distributed environment, you need to make sure that you have governance across all of that. And for certain types of data, if they are centralizing, as we talk about core data here in a second, it's significantly easier to apply governance in that context because in an ideal scenario, what is actually moving around in the enterprise are references to some of this very, very key core data.

 

(00:37:58):

And the data itself doesn't move as much. The data stays in, let's say the data mastering system or master data management system and references to that are what are circulated through these other systems. So from that point of view, the data virtualization, which I guess is another term that's being used as across data mesh, data fabric, data hub, and so on. This sort of data virtualization concept means that we are able to think of governance through metadata. Governance of data where it is, but for very specific domains, core data as an example, bringing it together and applying this concept. It reduces data movement, which is one of the patterns. It's sort of underlying in the conversation we're having is that there isn't as much data movement in these virtualized data scenarios. It is more about discovering the data where it is and being able to apply governance there.

 

(00:39:09):

So in that sense, in simple terms, think of MDM as another piece in this bigger data fabric and approach. And it makes certain aspects of data governance much easier, but it doesn't take away the burden of needing to govern sort of the more distributed fabric or mesh. Yep.

 

Chris Detzel (00:39:30):

Yeah. Thanks Ansh. That's really good.

 

Ansh Kanwar (00:39:36):

All right, so I promised some breadcrumbs. So on the data mesh topic, I think I've gone a little bit longer on that than I intended, but please do go look at data mesh learning if you haven't. The thing I love about what they've collected are these user journey stories. That's stories from practitioners such as yourselves as they really grappled with these concepts and got to their answer for what the data mesh means to them. The other thing which I'm in the process of reading is this, Building an Event-Driven Data Mesh. It's an early release if you're the O'Reilly subscription, it's mostly done at this point. A very, very interesting take on the data mesh, which is fast forward to the future, everything that's coming in to the mesh is coming in through an event-driven mechanism and is held. That's where the data is held and all the computation happens within the domains from that input source. So perhaps for a future conversation.

 

(00:40:41):

All right, data fabric. So this is where the contrast part begins. So for the data mesh, we said context comes from SMEs, it comes from people who've been working in that space for a while. What do we have a massive shortage of? Experts. Data experts. And so the data fabric really is from my point of view, a response to that scalability challenge. You can create an organizational pattern if you have enough people, enough SMEs to apply to the problem and the domains, but what if you don't? So the data fabric comes at it from a mechanical point of view, and it really highlights the role of metadata and active metadata specifically in being able to solve a lot of these problems that on the data mesh case gets solved organizationally and through process. So Noel Yuhanna came up with this term all the way back in 2005, talk about being able to see the future.

 

(00:41:50):

And only in the last couple of years has this really been picked up and people have started to realize how interesting this could be. And I think it has to do with machine learning as it has advanced over the last 17 years. It's made it possible for us to do a lot of these things automatically that we couldn't in the past. So his definition is a data fabric "...automates the ingestion, curation, transformation, governance, integration of data across disparate data in real time and near real time." Again, very key phrases near, near real time. The other from Gartner's technology trends, data fabric is their trend number one in terms of trends to watch for 2022 and beyond. And they claim that they can reduce, the concept of data fabric can reduce data management efforts by up to 70%.

 

(00:42:50):

A lot of this needs to be born out. Data fabric, from my point of view is very early in its conception and blueprinting and definitely in its application, the thing that goes in its favor is that it is an aggregation of technology. It's not a big bang approach. It doesn't require you to change your, or at least doesn't mandate that you change your organization or your culture to map to that sort of sociotechnical contract. And in that sense, the analyst work that I've read really highlights that as the reason that the data mesh will become more and more real is because there are products to be sold in this space and an approach where we can incrementally get to this realization of a data fabric.

 

(00:43:44):

This box in the middle looks remarkably like the data lake or the data warehouse boxes that we've brought in the past. But the difference is that bring us all of your data from anywhere, but don't bring the data. Don't actually move the data. That's the difference. Don't actually move the data. Tell us about the data, give us the metadata about that data and let us catalog using our data fabric component, data catalog becomes a component of data fabric. Let us catalog all of your metadata to understand where things are located, where data is located, what is the meaning that you are trying to associate with that data, what is the utility?

 

(00:44:23):

And in that sense, the data catalog is a slim third of a data product. And then that metadata, it's not just collected once. It's actively managed that that metadata is updated with the same level of quality that we thought of data in the past. The description of the data is updated with the same level of clarity and quality and iterations. And so if we have that understanding of where the data is, what it is, then we can build these knowledge graphs that ultimately help us with pattern identification. Where are the hotspots? What data sets are being most used? Therefore, what should we go optimize? Where should we spend energy as humans? Because a lot of what is happening in this virtualized landscape is automated and can be understood by machines because of metadata. And those three blue boxes really are to me, what separate this notion of data fabric from things that we've done in the past.

 

(00:45:31):

The data stays where it is, it is distributed, and yet we know more about that data in real time than we've ever known in the past. And we can derive actions from analyzing that metadata. Other parts of it, other parts of the data fabric are the things that we're more familiar with, such as master in reference data management, the need for that clean connected data that doesn't go away. The need for loading data in, in certain scenarios, although it's metadata that we're talking about now, that data prep and orchestration that still exists. For certain variations of the data fabric, they'll actually cache some of this data within the fabric. And so it becomes this hybrid solution. And then of course, security and governance.

 

(00:46:24):

The other thing that analysts point out is it wasn't enough to have data delivered, we have one means. If it was a SOAP interface, then another team or another consumer wanted a REST API or if it was a REST API, somebody wanted a JDBC connection. And so the data fabric sort of normalizes all of those sets of integrations and sorry, delivery ability to the consumers on the right hand side. So as I said, it's an early concept. It is right now, it's in that phase where there's a lot of excitement about it. And I feel over the next 12 to 24 months, this will get crystallized around the blue boxes here with that, that be the net new from my point of view. And the components in gray will be fitted into that fabric. Of course, the data security and governance, that will be also the net new piece on the framework side that now goes across the whole fabric.

 

Chris Detzel (00:47:34):

Hey Ansh, I want to get to some questions around this because people are starting to ask and challenge you a little bit. So where is data stored in data fabric?

 

Ansh Kanwar (00:47:50):

Yeah, that's one of the assertions actually that the need for a data steward is diminished, if you will, because of the ability to derive a lot of, is the data healthy? Is it valid? Those sorts of answers from AI and machine learning based systems. So this sort of knowledge graphs pattern identification, that also applies to the data stewards role. And there is an assertion that we wouldn't need that much of active stewardship because a lot of that would get taken care of by the fabric.

 

Chris Detzel (00:48:46):

Okay. How do you differentiate the metadata management and data catalog and knowledge graph? It's a big question.

 

Ansh Kanwar (00:49:01):

Yeah. So knowledge graph really goes across all of the metadata that has been collected. To me, the metadata management piece is really about the level above the data catalog, where it's not just understanding where the data is, how it's distributed, but it really is metadata being managed as a product just given the concepts we've talked about. That's not the terminology they use, but really metadata being managed as a product. The knowledge graph to me is a derivation of that. And you can stick that into the metadata management. That's a vendor specific discussion. You can stick the knowledge graph into active metadata management, which is fine, or you can pull that out as something that you implement yourself. The important thing is on that knowledge graph, what applications do you build? And that's where the value is. Those models stuck on top of the knowledge graph that are then signaling action and signaling action now in the metadata sense, not in the data sense. That make sense?

 

Chris Detzel (00:50:09):

Yep. Is there a unified technology layer that runs across the catalog? So knowledge graph and metadata management, or do we need a fabric technology to unify the information across the three blue boxes?

 

Ansh Kanwar (00:50:26):

More and more vendors, and I promised I'm going to not name vendors here, but do your own research. Definitely vendors are unifying more and more of these boxes into, the blue boxes specifically, into what they present as the data fabric. Now of course there are players who are coming from the metadata, active metadata, which is a separate category, and they start from that point and they're absorbing the knowledge graph piece and the machine learning modeling piece. And there are others who started from a data catalog point of view and they're moving up in this diagram to then be able to also absorb knowledge graphs. I would highly encourage everybody to look at the latest announcements from Google in this regard. And also Microsoft's doing some really great work around Microsoft Purview and especially as it integrates with the Azure ecosystem.

 

Chris Detzel (00:51:26):

So one comment and Gaurav, I'll get to your question at the end, but API and API metadata, so should be considered as well to integrate inbound and outbound? Kind of a comment, I think.

 

Ansh Kanwar (00:51:38):

Yeah, that's a great point. Absolutely. A hundred percent, a hundred percent.

 

Chris Detzel (00:51:42):

Yeah, keep going.

 

Ansh Kanwar (00:51:43):

That's the thinking, where really the interactions themselves are producing what in the past used to be this exhaust right log data and these things that we just used to throw away for the most part. But that is valuable and that kind of feeds back into the fabric. Absolutely.

 

Chris Detzel (00:52:00):

Okay.

 

Ansh Kanwar (00:52:00):

Should we keep going?

 

Chris Detzel (00:52:05):

Let's keep going. I do have a question at the end though, but keep going.

 

Ansh Kanwar (00:52:09):

All right. We want to do another quick poll here. Given the discussion so far, where is your organization currently in your journey to a distributed data management pattern or architecture such as the ones we've discussed, a data mesh or data fabric? Are you currently in implementation? Are you actively evaluating for implementing this in the next 12 months or are you not in the market? And just curious.

 

Chris Detzel (00:52:35):

Ansh, you'll have to start that poll. I've got some error on mine.

 

Ansh Kanwar (00:52:38):

Okay, no problem. Let's do poll number two. Launch. Should be up.

 

Chris Detzel (00:52:48):

I like the poll aspect of it. Do more of these. So by the way, Gaurav does mention, he goes, so it's around that data stored. He said, "My question was more around where data is stored." He said, "I think the answer was more about data steward."

 

Ansh Kanwar (00:53:11):

Oh, I see. Where data is stored. Data is stored where it's stored. That's a very important concept. This is about a virtual layer on top of data, wherever it's stored, therefore you're minimizing that sort of data movement. Yeah.

 

Chris Detzel (00:53:27):

Okay.

 

Ansh Kanwar (00:53:30):

All right. So about 45 seconds in, we have 26 responses. Response rate has gone down a little bit. We'll give another 10 seconds if you could please click. All right, we're minute in. I'm going to stop. We got 28 responses and I'm publishing that right now. Wow. 60% roughly say that either you're currently in implementation or actively evaluating. That is pretty phenomenal. That is pretty amazing.

 

Chris Detzel (00:54:05):

That is.

 

Ansh Kanwar (00:54:12):

Okay. I'm sharing the results. We can take a look. And let's get back to the presentation here. Thank you for that. And a little bit, sort of give back on the data. This is thanks to David Menninger and Ventana Research. They asked a very similar question, which is virtualized access to data lakes. And the answer they got back was currently included in their data lake, in their technology roadmap for 27% of the respondents and 46% intended to include it in the future. This is from 2019, so it's good to see the current distribution in this community right now. But a shout out for David Menninger and Ventana, they've published some pretty fantastic work on some of these topics. So do check it out.

 

(00:55:10):

Okay, so let's explore the third aspect of what I want to talk about, which is RELTIO, in this age of this data virtualization, where does an MDM platform? And really we think of RELTIO as significantly more than your traditional MDM in the cloud. Always connected, always up to date, real time operational MDM system. And in that context, Manish Sood, our CEO and founder, he just addressed the Gartner conference, and this is what he said, "Core data is information about customers, vendors, locations, assets, suppliers, amongst other things. This is data that every organization runs on." And what we're seeing is a full circle where MDM has evolved from a reluctant spend to an indispensable spend. And in fact, the drive towards a mesh or drive towards a fabric. It just only highlights because see, what you're exposing when you invest in these arenas is what is your data quality.

 

(00:56:13):

Either data as a product or through metadata. What is the quality of your data set or data set as an aggregate across the company? And the need for certain core domains to be mastered just becomes very, very obvious. So we're seeing that cycle, which means that this high quality, actionable information to make sound business decisions or satisfied customers, create more enterprise value. Just the equation gets even more important because poor data equals poor decision making that has always been true for the past 50 years, will be true. And so what we deliver, we think of it as this core data as a product to our customers, to all of those who are customers so that you can then very easily publish it internally. If you think about the domains and as they're made available through the MDM system via an API, there's very little work involved in publishing those data sets into the world.

 

(00:57:17):

Some of the work that we're doing right now is really around, or early product work I should say, is thinking through how do we take the metadata that we also have in copious amounts and share that with some of the data catalog and other similar systems. And around that, I'd love to get feedback from you either here or offline on the direction that you would want to see our product take. Almost out of time here, but wanted to share a little map of how we think about plugging in. If you think about where does MDM plug in, we already are at the center of consuming a lot of first party data from all of your environments, whether it's person, organization, supplier, product, location, and connecting that, creating that foundation for that clean connected data, which is already mapped to these domains. I think there was a question about this earlier, and making it available for your key initiatives.

 

(00:58:27):

The outcomes on the right are the things that you are trying to drive and the basis for making decisions on clean connected data. That's what we provide you for those initiatives. And point being that this doesn't really go away. We've also put in a quick comparison of our traditional data warehouse, data lakes, data mesh, data fabric and so on. I'm not going to go through this right now just in the interest of time, but happy to discuss it offline or if you guys want to reach out to us via email or through the community actually. So to summarize, right, data virtualization, whether it's data mesh, data fabrics, some variant of the two that's here to stay. We think that this actually encapsulates the last 40, 50 years of computational data creation and data processing. We think it's a good trend.

 

(00:59:19):

Do remember these patterns started with analytical, data analytics, although the fabric is leaning more into the operational use cases, mesh and fabric, both are complementary. They offer concepts that are complementary. And we think that there will be more products that come out of the data fabric space. But organizational and cultural thinking that comes from the data mesh concepts, we think those two can then work very nicely together. Solutions like RELTIO Modern MDM is really about that operational and real time use case. And regardless of where this space moves, we think that mastered core data is foundational and it's an element that powers either of these approaches and in the end, the business initiatives that you are trying to land. I think that's all we have time for. Chris, back to you.

 

Chris Detzel (01:00:18):

Yep. Ansh, thanks so much. If you want to stop sharing. So thank you everyone for coming. Please take the survey at the end so when you click leave you'll get a survey that pops up. We want to get better for you. We do have several community shows coming up very soon, so make sure you check out community.realto.com. Ansh, thank you so much for your time and great presentation to really kind of push this out. It was, wow, I learned a ton today and I hope that everyone of you did as well. So thank you everyone. That concludes today's session.

 

Ansh Kanwar (01:00:54):

Awesome. Thanks Chris. Take care everybody.

 

Chris Detzel (01:00:56):

Take care everyone. Bye-bye.




#datamesgh
#datafabric
#masterdatamanagement
#communitywebinar
#Featured
#datamesh
#CommunityWebinar
0 comments
4397 views

Permalink