Check out the PPT here: Event-driven-integration_with_Kafka.pptxTranscript:
Matt Gagan (02:35):
... slide here. So the topic of today's session is to talk about event-driven integration with Reltio. And there's a very specific focal point here, which is the flavor of the month, it seems out there in the streaming world is Kafka. And so I'm going to tackle or try to tackle the Kafka topic head on. And so let's move forward with this. So our proposed agenda today is to recap, to bring everybody up to a level with regard to Reltio's overall approach to integration.
Matt Gagan (03:17):
And then to take a specific focus on event-driven, and how Reltio provides support for that. And then dive one level deeper and be specific about how Reltio can interact now and in the future, with Kafka. We're going to have a specific time, I would think, at the end for questions. But also, if there are questions that come up as we go through all of this, please put them into the chat. Chris will pick up on them and will attempt to interrupt me. And we'll see if we can get those questions answered in the livestream here.
Matt Gagan (03:56):
So thinking about the integration overview that I was promising there. When we think about Reltio and we start at the heart of this diagram here, Reltio has got its RDM and its MDM layers and capabilities woven out in those modern technologies that many of us are familiar with. But basically, a number of multimodal components, including a NoSQL database, an enterprise search capability and a graph ability to, in an open-ended way, connect data to one another.
Matt Gagan (04:26):
All of these capabilities are then exposed functionally to an additional open-ended set of layers of functional components that customers would build or leverage Reltio deliverables. That middle connecting layer is a very, very extensive and adaptive set of REST based APIs, as well as proactive notifications outbound from Reltio in the form of the streaming APIs.
Matt Gagan (04:55):
So it's on top of these mechanisms that Reltio delivers functional, adaptive capabilities in the form of the user interfaces that you may be familiar with, for data stewardship, for business users to be able to work with data. As well as from an administrative self-service point of view, to control the platform, configure the platform on an ongoing basis. So as to incrementally develop an additional value over time.
Matt Gagan (05:20):
And then you'll see that we're calling out some very specific connectors that Reltio is making available for frequently seen requirements. So many of our customers need to integrate with Salesforce. Reltio provides a Salesforce connector, but we're not here to talk about the details of that today. Similarly, many of our customers working with B2B data, need to enrich their information.
Matt Gagan (05:45):
Using Dun & Bradstreet, we provide a connector for those purposes. And then very open-ended, there are abilities to import batches of data from CSV or from structured documents. And conversely, in a very highly controlled way, be able to output CSV and structured documents using the Reltio Exporter. So here's your Reltio Data Loader, Export. But what about everything else?
Matt Gagan (06:09):
So building this out a little bit further, customers have synchronous needs. So think about this in the context of an engagement application. Maybe it's the loyalty. Maybe it's some digital application that customers are using, or a web portal wanting to interact with your organization. Customers are leveraging synchronous mechanisms, so that in real time, they can get answers to important questions from Reltio.
Matt Gagan (06:41):
And they're using tools like we see depicted here, MuleSoft or other integration platforms that provide these capabilities, to put in place an API led set of services, which on the one hand, talk to specific applications that have the answers to those questions. And then on the other side, provide reliable reusable services so that those applications can get the answers that they want, in a repeatable, easy to manage way.
Matt Gagan (07:09):
And so of course, these components are using different Reltio's REST APIs to get answers to questions like, who is this customer? What are their consent? What is their preferences? What's an extended profile or a 360 degree look like from Reltio's point of view? And we've covered this in some detail in a previous session and you'll be able to find the recordings, as well as a blog post that starts to talk about this API led approach to integration.
Matt Gagan (07:36):
What we're here to talk about today, though, is use of Kafka and alternatives in asynchronous communications. So this is more about not getting the burning answer to my question, which is how you would characterize the synchronous integrations. But rather, from an asynchronous point of view, it's a case of, here's what just happened. If anybody needs to know about it, they can come to me and find the answer to that question.
Matt Gagan (08:03):
Meaning customers' loyalty points just got updated. Maybe some apps out there need to know that a customer just changed their address. It would be great if they only had to do that in one place. So here's a place where that up to date and approved address can be found, so that subscribing systems can come in, take advantage of that, and do that very, very quickly. So Kafka is near real time in its ability to keep applications all on the same page, singing to the same hymn book, if you will.
Matt Gagan (08:34):
When we think about asynchronous, what is Kafka and these alternative mechanisms? They're just a very, very modern, very, very rapid and more extensible way of doing asynchronous updates. And if you think about the legacy, think 10, 15, 20 years ago, what did this look like? It was batch files. So you would just output a batch file and be very, very asynchronous. Here's everything that didn't just happen.
Matt Gagan (09:00):
It's happened sometime in the last 24 hours, maybe in the last week. And here's an opportunity for us to take those data and push them proactively in certain directions to make sure that eventually, everybody's on the same page. But Kafka represents an opportunity to do things much more quickly. So Reltio customers are using combinations of these two paradigms, these two ways of using Reltio's integration capabilities to solve their important business problems, operational and other otherwise.
Matt Gagan (09:33):
And then, from an analytics support point of view, Reltio does provide at least one more collection of capabilities in the form of outbound connectors, supporting some Reltio delivered capabilities in the form of Reltio analytics, Reltio data science. But also, customers wishing to bring their own snowflake environments, there's a snowflake connector that operates in pretty fast batch operations to refresh copies of Reltio data in that environment as well as these others.
Matt Gagan (10:05):
So it's at this point that we're going to start to jump off, and we'll talk about streaming in general. And then, I'll take a specific look at Kafka. So let's look at streaming capabilities and streaming support, generally within Reltio. So this diagram at the bottom is really something that we could read from left to right where inbound, they'll be new data, information coming into Reltio.
Matt Gagan (10:30):
Reltio is going to do all of its good stuff, generate events. And those events, you have an opportunity to place on to a number of different outbound message services. So currently, Reltio supports AWS, SQS and SNS, for the Amazon side of things. In the Google world, we support Google Pub/Sub. And for Azure oriented customers, we support Azure Service Bus. So there's a notable gap here. We don't yet support Kafka directly, as part of this set of overall pre packaged capabilities, but it is on the Reltio roadmap.
Matt Gagan (11:09):
What you also see a depiction of here is the fact that we do provide some in that self service environment that I was talking about, you have available controls. And those controls provide some ability to decide and determine which types of events make it onto these queues, and which queues, certain of those events actually get directed to, from among all of the possible queues that you might have available to you.
Matt Gagan (11:39):
What you'll also see notably absent here is we don't see SQS, Google Pub/Sub, Azure Service Bus on the left-hand side of this diagram upstream. That's really indicative of the fact that Reltio doesn't have a built-in event consumer. So our support for events is natively just outbound. Meaning if you wanted to stream events into Reltio and have Reltio respond accordingly, you're going to need to build or leverage a component that sits and performs the role of a Reltio consumer.
Matt Gagan (12:10):
So it's going to take from any of those types of queues that we may think of. It's going to have a listener specific to consuming messages over that of that type of message bus or queue. And it's going to then take the opportunity to package the information that it found up in a form that is compatible with Reltio. And then squirt that information into Reltio, so Reltio can be updated and kept in sync with what's going on outside there in the ecosystem and connected systems.
Matt Gagan (12:46):
So, moving forward. So, I'm thinking specifically about Kafka. So without a current Kafka capability, this is what Reltio customers are doing in order to integrate with Kafka, both upstream and downstream. So on the left-hand side, as well as on the right-hand side. So just like we discovered, there in this case, would need to be an ability to read from Kafka topics using a Reltio Kafka consumer. And on the outbound side, the information is going to take a hop.
Matt Gagan (13:17):
We're going to naturally publish whatever it is that the customer wants to see published, onto a variety of queues of their choosing. And then, there's a need to plug in a Reltio Kafka producer that can take those information and turn them into whatever Kafka native form is required for other systems, to be able to consume them and leverage them, and be kept on the same pages.
Matt Gagan (13:46):
In this case, Reltio, which sounds like a disadvantage. But from an enterprise architecture point of view, there are some opportunities that it does present. And these are, in fact, some things that even if you were only using SQS throughout your ecosystem, you might decide to do. It provides an opportunity for some additional filtering and routing of events to different enterprise queues, based on what's coming out of Reltio.
Matt Gagan (14:14):
So update to customers' addresses, update to customers' loyalty, and update to product information. These may need to go onto different queues, different topics downstream. So you may want a component like this producer sitting there doing that anyway. Also, when Reltio pushes its messages onto these queues, it's going to be in a Reltio canonical form, which may not be exactly what you need it to look like for consumption by all of your different systems across the enterprise.
Matt Gagan (14:47):
And therefore, there's probably an opportunity or even a need here to take the Reltio form and turn it into that canonical enterprise format that Reltio doesn't fully natively represent. And then, in the case of some of these outbound messages, the payloads would be very large, if, under certain situations and certain types of messages, we were to try to push all the information that you might ever need.
Matt Gagan (15:15):
And I'm thinking specifically of situations like a merge that takes place between, let's say, two large business records or profiles inside of Reltio. If we would have put what was merged and what it turned into, all in one payload, that would probably be too much of an overhead. So there's an opportunity here. When Reltio places a message natively onto any of these mechanisms, it's going to say that profile A matched with and merged with profile B.
Matt Gagan (15:43):
You have an opportunity then, to come back in through that front door, get the information that you want using Reltio's APIs, in a synchronous way. So you'll ping it and you'll wait for the response. And then put the corresponding message on to the appropriate, in this case, Kafka topics in order that you've got everything that a consumer application is going to need in order to be able to react appropriately, and be updated accordingly.
Matt Gagan (16:16):
So from a demonstration point of view, which is where I'd like for us to go next, we're going to tell a story where an upstream system produces an event based on the creation or the update of a customer in its own context. And then we're going to have Reltio via a Reltio consumer, pick up a message representing that customer and their information. Go ahead, ping the Reltio APIs, using this Reltio consumer that I built.
Matt Gagan (16:50):
And then Reltio will go ahead and update or create a customer as necessary, based on the information that it encounters in that payload. And then, of course, having done so, Reltio will respond and push an outbound message, stating and representing what this customer looks like, which I'll take and put into a Kafka topic. And then we will see what that message looks like, by putting in place a consumer on behalf of, let's say, a downstream system that needs to be kept on the same pages. Effectively, this upstream system and Reltio together.
Matt Gagan (17:29):
So in this demonstration then, you can really think of this as two flows in one. There's demo part one, there's demo part two. And now when we get to it, what does the demo look like? It's actually, I'm using a terminal on my local machine here. So we're going to see code. I'm going to place some JSON on behalf of this system, into a Reltio input Kafka topic. We're then going to see that that gets picked up, that it gets turned into a payload to push into Reltio.
Matt Gagan (17:59):
And then we should see Reltio respond to that with how it's interpreted that payload. And then very shortly afterwards, Reltio is going to place a message onto its own native SQS outbound queue. And I'm going to use this Reltio producer to read that, turn it into a correspondingly more compact Kafka message, push that onto a topic, so that we can then finally read that. And then if we wanted to, potentially post that into the application that it's operating on behalf of.
Matt Gagan (18:34):
That's what the demonstration is going to look like. And it's at this point, I'd ask if there are any questions that we need to deal with at this point. Or shall I go ahead and start to show these terminal windows?
Chris Detzel (18:46):
Hey, Matt. Before you do, we have one question. Maybe two. Say, when we are planning to process the data from Kafka consumer and the data has all the columns, but we need only four columns and column name needed to be changed for format, is that possible? I don't know if you got that.
Matt Gagan (19:08):
Yeah. That's going to be the responsibility of this component here that is consuming from Kafka on behalf of Reltio. So inside of here, you're going to put whatever logic is required, that says, I don't care about these other elements of that payload.
Matt Gagan (19:24):
I just need to focus on these endpoints here and their corresponding values. And then necessarily, I think in almost all cases, you'll be turning that payload into a Reltio specific shape in terms of its JSON, so that you can then simply post it into Reltio as an update. And Reltio will do the rest.
Rob (19:47):
Hey, Matt. This is Rob. I have a quick question. Could you go back a couple of slides? I think one more. Correct. Yeah, thank you. So in this case, Reltio is all REST API based. And these two squares, Reltio Kafka consumer and producer, from your perspective, is that a producer that we would build? Or is that a Reltio Kafka producer?
Matt Gagan (20:21):
I would say you would build a Reltio Kafka producer. What I should have actually stated, even at the outset of this presentation is there's a certain color coding going on in all of my schematics here. The blue is Reltio and orange is your own infrastructure, your own stuff. And so the onus then is on you or your your partners, to put in place, using whatever mechanisms you find most appropriate.
Matt Gagan (20:52):
The consumer, the producer, whatever it is that you need in order to conform what it is that Reltio natively will be able to do, and turn it into what you actually need for downstream consumption. And I probably should have focused on this at the time as well. But when we think about what your options are in deploying these middleware components, effectively, that's what they are, you've got a plethora of different choices.
Matt Gagan (21:16):
You've got your standard integration platforms, things like MuleSoft. MuleSoft itself can listen to Kafka topics and turn those API calls into Reltio. So there's an opportunity to use that on both sides of this. They can produce, sorry, listen to SQS and produce Kafka messages as well. You've got Kafka specific vendors out there, like confluent.io, with a whole plethora of different libraries and capabilities that customers can stand up and perform the same.
Matt Gagan (21:50):
Or when we get to my demonstration very, very shortly, you're going to see that I use JavaScript. And I created a number of just little scripts that I can fire up in my terminal on my Mac here, and have them perform these roles of listening and then producing, listening and then producing in these two different cases here.
Rob (22:11):
So from your experience with respect to Kafka, not to pick on the Kafka stuff too much, but what you're showing here is basically going from an existing Pub/Sub topic base model technology, SQS/SNS, Google, Azure Service Bus, et cetera, that already gives us topic based durable subscribers, all that sort of features that Kafka has, what are your customers benefiting from leveraging Kafka like that?
Rob (22:38):
Is it because they already have an investment in Kafka, and they can just essentially integrate with that existing footprint? Or is there something else that I'm missing?
Matt Gagan (22:49):
I think it's a number of things at the same time. I think Kafka is definitely flavor of the month. And I think that there are probably some very good reasons for that. It has open-source origins, and so I think that's attractive to organizations. It has capabilities beyond just purely streaming and messaging, and assuming that that message is ultimately going to die, and not continue to exist. So Kafka, they're very durable if you want them to be topics, and can be even considered to be rich, highly extensible databases.
Matt Gagan (23:32):
So what that means is, at any later date, should you so wish, you'd be able to come back into a Kafka topic and understand and retrieve a message, plus all of its additional metadata that you had previously acted on, let's say weeks ago. To reevaluate what it was that you'd consumed, why you consumed it, or even just use it as a database in its own right. And I know that there are organizations out there doing that.
Matt Gagan (24:04):
That's not necessarily my purpose and what I'm trying to convey in the value of Kafka to Reltio customers right now. But I know beyond Reltio, it's a capability that I think is causing customers to invest in it as an infrastructure, because they can do so many different things with it. And therefore, it's a skillset. It's a set of invested in components that they, with very little additional investment, should be able to leverage directly with their use of Reltio subscription.
Chris Detzel (24:41):
Again, Matt, we have a lot of other questions coming in. I'm not going to ask all those questions yet. I want you to go into the demo, because that's the meat of this. So let's keep going. And then whenever you get done with the demo, then I'll start asking more questions. Is that fair?
Raul (24:58):
Hey, Chris. This is Raul. Can I ask one question, maybe? Is that okay? Just one?
Chris Detzel (25:02):
That's right. Yep. Yep.
Raul (25:03):
Yeah. Yes. Matt, so this diagram that you're showing, can you explain how this will work on managed Kafkas here on Azure, or on a managed Kafka on AWS? So, are we still going to write the JavaScripts that will pull the data from a square source as Azure Service Bus? So, how will that work?
Matt Gagan (25:26):
Yeah, thanks for the question. I'm not intimately familiar with exactly what componentry is provided for you, in the context of those different managed Kafka environments. So I'm not therefore sure exactly what you will have, instead of the JavaScript examples that I'm going to use today to show you.
Matt Gagan (25:50):
But I think the important point here is there's a lot of flexibility available to you, in terms of what you choose to put in place here or here, whether it's part of your managed packaged capabilities, part of a separate investment in integration technology, like MuleSoft. Or whether it's something that you want to go ahead and custom build, which, of course, is what we're going to see in my demonstration today.
Raul (26:17):
Okay. Okay. Go ahead.
Chris Detzel (26:19):
Great. And I promised to ask those questions here later, but let's keep going, Matt. Thanks.
Matt Gagan (26:24):
All right. Thanks. So let's accelerate back to where we were. So we're going to see these kinds of outputs in my terminal windows. And so now, I'm going to pivot away and show you what that looks like. First, let's go to Terminal. So this should look somewhat like the depiction we just saw. So on the left-hand side, top left here, this is where we're on behalf of system A.
Matt Gagan (26:49):
I'm going to post some JSON that should concisely represent a new customer that just got created or updated in the context of system A. What we're going to see on the right, top right, is that message being picked up. And then you'll see the interaction with Reltio in the form of JSON back and forth, as it will spew into this window here. Once Reltio has consumed it, dealt with it, Reltio is going to place a message and we'll see that SQS message represented here.
Matt Gagan (27:21):
And then we'll see what we turn it into and place it onto the Kafka topic on the outbound side. And then we'll see this window pick up that message. And obviously, that's an opportunity then for us to be able to push that into system B, given the kind of context that we're trying to paint in this demo here. So hopefully, that all makes sense. Basically, the data is going to come across here, do a U turn, and come back in the direction more or less that it came from.
Matt Gagan (27:48):
And the bridge between each of these two windows is a Kafka topic. This one is inbound, this one is outbound. So let's go and get some data to put into this top window here. So I'm going to go into my line of code here. This is JSON and it represents that there's a source called loyalty, that's got a new ID representing a new customer, with a phone number, with an email address. And then with a few address components. Address line one, city, and state.
Matt Gagan (28:20):
And I'm simply going to copy that JSON and go back to my terminal, and paste that into this window here. When I hit return, that's when it's going to actually launch it onto the Reltio input topic. And you'll see the message got picked up. This is the inbound payload to Reltio, and this is Reltio responding with what it just created in its world. And then here on the bottom, we'll see a message get posted into SQS. And you saw a quick response there.
Matt Gagan (28:59):
And this has already updated on this outbound Kafka topic. So I don't know if you picked up on the details there, but the address is 6116 Durban Road. I remember seeing that in what I pasted here. So this data has all done a full round trip and made it quite quickly to system B from system A, which is what we're really trying to say here. And if I change gears and take a look at Reltio, the actual Reltio environment that we just posted these data into, let's come back over here.
Matt Gagan (29:35):
Refresh the dashboard and this Reltio demo tenant here. And what we can see is that Robert Graham was created. This was our target record just now with this timestamp. Now, if I go into this profile to look in some more detail, we'll see that the fragments of data, like the mobile number, have gone through a cleansing and standardized process.
Matt Gagan (30:01):
The address similarly, has been verified. The zip +4, has been generated, plus some metadata indicating important information, plus the latitude and longitude. And then if I expand this structured attribute for email, we can see that this appears to be in a valid format at a private domain, this fictional domain here. So this is the representation of a customer on the Reltio side.
Matt Gagan (30:28):
And all of that code that we saw in those various windows just now, were really all of the mechanics of the exchange of information to and from Reltio, and then pushing that inbound or outbound on those Kafka topics. So hopefully, that illustrates the speed, the responsiveness. And using, in my case, some custom JavaScript to empower these various perspectives here, really actually just this one here and this one here. I was using Node.js as an available JavaScript technology to put in place these two mechanisms.
Matt Gagan (31:13):
And so if I switch back to the slide deck we were operating from here, I guess this is the point where that pretty much is everything that I was proposing to demonstrate. The rest is, I could put more data in there. We'd see some different reactions, different responses. I saw that there was actually a potential match that was generated for the record that we just created, which I didn't dwell on. But this is all Reltio good stuff that I expect most of us understand.
Chris Detzel (31:43):
Yeah. Thanks, Matt. And I'm going to go ahead and start asking questions since we have a slide right for it. So, do you know the timeline? What is the timeline of Kafka connector on Reltio's roadmap?
Matt Gagan (31:56):
There isn't a timeline that I can currently share. So I was talking to product leadership last night and we're still not in a position to concretely say exactly when that's going to be.
Chris Detzel (32:06):
Good. Is Reltio planning to address the challenges in current architecture via proposed Kafka connector?
Matt Gagan (32:16):
I guess we'd need to get into the details of, what are considered to be challenging in this case?
Chris Detzel (32:22):
Yeah, it makes sense. Do we have any limit to read records from Kafka topic?
Matt Gagan (32:33):
It's an interesting question, of course, because Reltio in any API call that you make inbound, which I think is what the context is for the question here, will invite you to take advantage of the performance benefits of packaging up a lot of proposed updates to customer data, and bundling them into a single API call.
Matt Gagan (32:57):
So we'd like to think that the sweet spot, at least in batch scenarios is about 100 profiles at a time that could be updated. And of course, massively in parallel with Reltio. When we constrain ourselves to simply operating purely based on a sense and respond detection of a Kafka message, now we're missing an opportunity there and we're creating more API calls.
Matt Gagan (33:24):
But it would be driven by the use cases that you're looking to support. And by that, I mean if getting the information in Reltio, and therefore out of the back door as quickly as possible is the objective, to service some real burning use cases, that means Reltio needs to be on the same page as some upstream application.
Matt Gagan (33:47):
And Kafka is going to be the mechanism we choose to make that happen, then that's the cost of doing it. And we would say, that's fine. And there's not going to be any limit that I know of. Reltio's APIs are designed to scale to massive numbers of inbound posts, which, of course, is what this infers, in a very, very short timeframe. So I don't know what the saturation should look like.
Matt Gagan (34:16):
But based on your use cases, Reltio has been designed to scale. So it's about letting us know, making us understand what that looks like. And then we can scale accordingly in the context of your specific Reltio environment, to make sure that we're there for you and we can meet those expectations.
Chris Detzel (34:38):
Great. And they're pouring in a lot. So if the input event has both master and transactional info, will Kafka separate the use of JSON builder, to create the right JSON?
Matt Gagan (34:52):
So, this again, sounds like it's an inbound opportunity and context that we're talking about. Since we're talking about custom building a Reltio consumer, then that Reltio consumer, as it interprets each message, can be built in a way that interprets part of it as master data, any other part of it as contextual or transactional, or what we like to call in Reltio, interaction data.
Matt Gagan (35:23):
And by splitting those into two, you can decide which bits you want to use right now, which bits you might discard, or where you need those data to go in Reltio. So the master data represents a master data update to a profile in Reltio, will post that to that API call. And then separately, will be posting the transaction as an additional record in Reltio's interactions that will be joined to that particular profile.
Chris Detzel (35:52):
Okay. So another question. How would you synchronize out of order time series data? So say, interactions or series of changes for the same customer, originating from multiple input systems coming to Kafka, then consumer, then Reltio.
Matt Gagan (36:13):
That sounds like on the inbound side as well. You're going to want to use the timestamps as a mechanism that you would put into the payloads, that represent when in the context of those upstream contributions we should represent the information as being known and being factual. So that as you seek to interpret those in the context of your Reltio consumer, you've got an opportunity to reorganize and decide in what sequence you'd want to push them into Reltio.
Matt Gagan (36:50):
But that itself implies somewhat buffering of the data, because you're going to want to wait for a short timeframe, short periods of time, until all information that you think might need to be available, is available so that you can do that slight reorder. And then decide what to push into Reltio.
Chris Detzel (37:09):
And then Steven asked question. Is the attempt with the roadmap to replace the first hop? So is there Pub/Sub, SNS with Kafka directly? And if so, when will it be available?
Matt Gagan (37:24):
Excuse me. The information available to me right now is that the intention is to bring into the mix, Kafka up to the same level of capability that we currently provide for the SQS, the Google Pub/Sub and the Azure Service Bus. Meaning, whatever controls are available to you right now for those mechanisms, will be on feature parity with our outbound support for Kafka as a native output mechanism as well.
Chris Detzel (37:54):
Great. So another question. Can we aggregate the events Reltio creates in SQS? So, merge of entities sends out create change events? That kind of stuff.
Matt Gagan (38:08):
So I know that this is definitely a challenge for customers right now, because there isn't really a FIFO, a first in, first out necessarily available mechanism in the context of something like SQA. Which means that as you pick those messages up, you may receive them out of sequence.
Matt Gagan (38:29):
And so this on the outbound side for Reltio, sounds like a very similar problem to the one that we were just talking about the solution to on the inbound side. And really requires that, again, that you're going to buffer for a short period of time until you understand everything you think you need to know, before you send those information wherever they need to go next beyond Reltio.
Dmitri (38:50):
Matt, just a comment on this other question, if I may. This is Dmitri. On the FIFO, so it's not just about SQS because the events themselves on the platform level are generated asynchronously. And that's due to Reltio's scalable nature of the platform, because events are being processed by multiple nodes in parallel at the same time.
Dmitri (39:14):
But you can use Kafka specifically. Once you stream from SQS to Kafka, you can create a producer that will aggregate. And you can put storage key, use the timestamp that every event has as a storage key to sort aggregated values. And you can aggregate everything for the last five seconds, for example. In this way, you sort them in Kafka. And as a result, get it sorted, right?
Matt Gagan (39:42):
No, that's a great point. And I oversimplified my response there, but you're absolutely right. So yeah, Reltio is an asynchronous platform. There is no guarantee that any particular event is going to complete in advance of any other, given the distributed nature of the way those components work internally.
Matt Gagan (40:08):
But that's a great solution also, that you propose there to leverage those timestamps, which will be down to the millisecond level in terms of their detail.
Suicante (40:20):
And Dmitri, good question. [Suicante 00:40:22] here. Would we be able to do this on SQS, the aggregation on the SQS itself? Or Kafka provides that ability that SQS does not have today?
Dmitri (40:34):
Yes, you're right. Kafka has this capability and SQS does not have this capability today. So we'll have to use Kafka for that.
Suicante (40:42):
Okay, because we are struggling with that. When a merge happens with an entity, Reltio throws out 15 different events. The merge could happen between three or four organizations and in 15 events. Pushing them all the way to the system back is just too much traffic. So we want to aggregate and then pass one single message, giving them a view of what happened, what exactly happened.
Dmitri (41:09):
Yes. Yeah. Yeah. And this is very, very common question from our customers and partners. And sometimes, there is a perception that using FIFO, first in, first out is a solution here. But you need to realize the platform will input the first in not guaranteed that it's really first.
Dmitri (41:27):
So you have to sort by timestamp. And Kafka is the best solution right now. There are best practice I found in things like Stack Overflow, specifically for Kafka, how to create the producers that aggregates and source by timestamp. So that's already known use case for Kafka. And I would suggest you use that, yes.
Matt Gagan (41:49):
So in fact, it sounds like another opportunity missing from my original list that suggests that even with our native support for things like SQS right now, that a customer may wish pragmatically, to use Kafka as the downstream mechanism. Because now, your producer that is hopping the data from one mechanism to the other, can leverage exactly what it is that Dmitri is explaining to us here.
Dmitri (42:15):
But that's good, Matt, because that gives some room for my webinar in August.
Matt Gagan (42:22):
Excellent. And if it wasn't obvious to everybody, Dmitri is one of our engineers here at Reltio, a senior member of our staff here on the engineering side.
Suicante (42:32):
Dmitri, would be able to share some more details on it? We don't want to bring in a new component now in Kafka particularly, because our solution is almost getting into a crock state. Would you be able to share some more details on what would be our other options to reduce the traffic coming out, without Kafka? Maybe not in this call.
Chris Detzel (42:58):
That's actually a good question to post on the community. So if you go to community.reltio.com, post that and I'll have Dmitri and/or both Dmitri and Matt answer that. If you do that, I'll make sure that it gets answered.
Suicante (43:13):
Will do that.
Chris Detzel (43:15):
So, by the way, we have a few more questions. So keep pushing them in, but I do like feedback. And I share all feedback as I get it, especially if you put it in the chat. I can show people, "Hey, this is really good." So please post your feedback in the chat as we keep going.
Chris Detzel (43:35):
So, Matt, any use case guidelines for what events to capture, and how to support timing of separate events for specific entities, and also separate but related domains? So, any aggregation approaches to reduce traffic?
Matt Gagan (43:56):
I'm not sure that I fully understand that question.
Chris Detzel (44:00):
Mark? Let's see if Mark can open up.
Mark (44:04):
Yeah. It ties in a little bit with what Suicante was asking is we're struggling with the events. We're struggling with applications that are having dependencies on, let's say, an organization and a contact with the relationship there.
Mark (44:22):
And we're trying to get things back into a business transaction from the events is really what we're getting at. We're aggregating them, calling them. That's where we're struggling, and there doesn't seem to be really anything out there to give us any kind of guidelines, or use cases with this.
Matt Gagan (44:47):
Mark, is this on the outbound side?
Mark (44:49):
Yeah, it's totally on the outbound side. I'm sorry.
Matt Gagan (44:50):
Okay. Okay.
Suicante (44:52):
Totally on the outbound side. Yes.
Mark (44:53):
Yeah. So we've been struggling to figure out which events to go with. We think of them kind of, "They're okay." And it really is what you were just talking about earlier. [inaudible 00:45:07]. To get them in the right sequence, to try to get this in the near real time low.
Mark (45:15):
And also, to reduce the traffic on the applications because as things get the remastered, when we're in coexistent mode, there can be a lot of shuffling that has to happen of the apps that we're getting the events, which were actually the sources to begin with. We just want to try to minimize the chatter, because this is [crosstalk 00:45:38].
Matt Gagan (45:38):
Right. In addition to using a mechanism like Dmitri was explaining before, that to some extent, buffers allows an opportunity to reorder, correctly order, I should say, how you are then going to be able to interpret those. The only other thing I can think of that can potentially help here is leveraging the APIs, rather than simply listening to the event stream, and trying to work out what's noise and what's out of sequence.
Matt Gagan (46:11):
Meaning, when I see an event that I do choose to allow through, using my available controls, and I detect that something material has happened to a profile that I should care about, that I use the APIs to come back in and get the representation of that object in whatever ways I'm concerned about, whether it's relationships or whether it's the object itself, so that I can take that, package that up and send it where it needs to go.
Matt Gagan (46:39):
So rather, seeing the event stream as a trigger, rather than necessarily as the place to get the answers to all of the questions. It sounds like there's potentially a different way of thinking about how to use the event stream.
Mark (46:54):
And again, I don't want to get into too much detail [inaudible 00:46:57] you're able to necessarily [inaudible 00:47:00]. But you can be really, really [inaudible 00:47:03] post something in the community, if there's any kind of follow up that we can get, it'll really, really help with it.
Chris Detzel (47:10):
Hey, Mark, I like that idea. I am the community manager, but post that in the community and your questions on that, and we can see how much follow up we can push out. Certainly we can answer some of those questions just by a reply. But there might be some more opportunity that we can help you out even more, if that makes sense.
Mark (47:30):
Okay. Yeah, that would be terrific because it's good. I think yeah, we'll be looking into that.
Chris Detzel (47:35):
Yeah. Just go to community.reltio.com. Post. Remember, your first post is going to be moderated, but I'll go approve it. And then we'll have somebody answer that. And more specifically, and most likely, it'll be Matt or somebody else. So I really do appreciate that.
Mark (47:52):
Thank you.
Chris Detzel (47:54):
So another question, Matt. And this is like an ask me anything. It's like you present a little bit of stuff and we get a lot of good questions. So it says, can I-
Matt Gagan (48:03):
I should have presented much, much more, and then there'd be less time for questions.
Chris Detzel (48:08):
I love the questions. It's my favorite part, actually. Can I use Reltio API call to retrieve a party ID golden record, based on any source party ID which belongs to this party? Party all around?
Matt Gagan (48:24):
Yes. So there's an API call that you can make in Reltio that we call get by crosswalk. And in Reltio, the crosswalk really is the combination of source plus unique key in the context of that source. So when you do that, you'll be using the crosswalk as the mechanism through which to identify the overall profile, as consolidated as that may be in Reltio.
Matt Gagan (48:51):
And of course, that payload, you're then in control of what comes back to you. Is it all of the other crosswalk? So I get all of that correlated knowledge? Which attributes do I want to see as part of that payload being returned to me as well? So yes, is the answer.
Chris Detzel (49:08):
Yeah, I like that. It was quick and easy. Nothing's really easy, right? Another question. So challenges like support. So one advanced filters, ability, modify Reltio, default payload. I don't know if that's a question, but that sounds like a challenge that [Tipac 00:49:25] is having.
Chris Detzel (49:27):
And then Sandro asks, isn't an outbound API from Reltio to Pub/Sub constrained by a limit of 300 Records? I'm guessing the Reltio consumer would need to incorporate pagination for those use cases. Is that the case?
Matt Gagan (49:46):
I don't know of any limits on the outbound streaming side of things. The limits that I am aware of are specific to what queue mechanisms or streaming mechanisms you're using. There are limits to payload size, and I'm thinking about difficulties customers have encountered with very large, consolidated records in Reltio, needing to be posted in their entirety onto an SQS queue. And the size of the message is prohibitively, running into the limit of SQS and therefore, you've got a message that doesn't really have the integrity.
Matt Gagan (50:25):
So for those reasons, upstream in your control center, you do have the ability to configure what happens in those scenarios. And that I believe, can also now be conditional on the payload size. So not a one size-fits-all don't-publish-anything onto the queue, just the entity ID versus okay, publish it if it fits. Else, hold it back.
Chris Detzel (50:53):
Okay.
Matt Gagan (50:55):
But stepping back into the actual question that was asked, in terms of numbers of records that can be posted onto these mechanisms? I'm not aware of a limitation.
Chris Detzel (51:08):
That's a good job.
Dmitri (51:08):
I can add to that real quick, if you have just one moment.
Chris Detzel (51:13):
Yeah.
Dmitri (51:13):
Yeah. Thank you, Chris. So just so you know, for that, in the current release we are working on, which will be available later in October of this year, we add some additional capabilities which will make your workflow with queues recommended workflow with queues like this. You can stream events with payload into the outbound queue.
Dmitri (51:40):
Our statistics says that for most of our customers, absolute number of events will feed the size of limitation of the SQS, which is kind of 56 kilobytes currently. But we added additional flag. So basically, you just listen to the messages from the queue, and listen all of them, and you stream all of them.
Dmitri (52:04):
But if for some reason there was a single event that was too big to go through the queue, you will still receive it without actual payload, but with the flag telling you that this event was too big to go through the queue. So you will have to do one additional get call through the API to get this specific payload through the API.
Dmitri (52:27):
And you can work this manner. You will receive 99.9% of your messages of this payload normally. And in case there's something too big, you'll just do an additional get call and get that.
Mark (52:38):
Is the get call for the entity, or is the get call for what would have been in the event?
Dmitri (52:44):
For the entity. For the entity. And some additional functionality. I'm going to talk about additional functionality here a bit, but it will be available as well in this release is they'll be streaming payload of data only. So only things that changed in the entity will be streamed as a payload in the same manner as activity log is structured to this same type of JSON.
Dmitri (53:08):
It'll be documented. It's something we're working on right now. And I think I'll be able to talk more about this later in August. But anyways, just so you're aware of where we are moving with this.
Matt Gagan (53:18):
That's excellent, Dmitri. Thank you. And just to recap, I believe you stated that that's a future capability that's going to be in the next release due in October of Reltio. Is that correct?
Dmitri (53:30):
Yes.
Matt Gagan (53:31):
Okay. Thanks.
Chris Detzel (53:33):
So another question. What kind of monitoring reporting is available for events published and consumed?
Matt Gagan (53:44):
So, as part of your Reltio console tenant management capability available via that application, you do have the ability to monitor your various queues. So you can see queue sizes, queue rates.
Chris Detzel (54:03):
Okay. Good. As of now, so we have just about five or six minutes left. One, I would say please, add was this helpful? Did you like it? What kind of webinars would you like to look at in the future? [Jadon 00:54:20] mentioned maybe a webinar match score based on a comparator or match rules. That's a good one. But I'm always open for ideas.
Chris Detzel (54:34):
And then someone is like, "Hey, I just want a good webinar around just let's get on the phone and just talk about Reltio in general." So maybe we could do like a list of experts, four or five people like you, Matt, Dmitri and some others. And so people just start asking us a bunch of questions about certain things. That could be fun.
Chris Detzel (55:01):
But as of now, if there are no questions I am going to a webinar about advanced survivorship. We do have one coming up on survivorship in August. So let's come to that. And if there's more things that you want to talk more about survivorship, then we can also go more in depth. Joel is going to do that one as well.
Chris Detzel (55:27):
That's April 26th, I believe. So if you go to community.reltio.com and click on the events and upcoming events, you'll see that. I like those. So definitely survivorship is one we're covering. If there are no questions, please post quickly in the chat. Let us know how you liked it. Two is, I did post something there.
Chris Detzel (55:44):
If you want to rate and review us, Reltio, on Gartner's Daily, click on the link in the chat and we'll send you a little goodie. Just send me a view of, "Hey, I took it." You don't have to show me your score or what you rated us, but I'll make sure to get something sent out to you. So thank you everyone, for coming.
#CommunityWebinar