Reltio Connect

View Only

Reltio Snowflake Connector - Show

By Chris Detzel posted 02-23-2023 17:30

Recommend

Are you a data scientist or business analyst who is in charge of managing and analyzing data in your organization’s Snowflake data platform? If you’d like a better way to report accurate insights and critical updates quickly and efficiently, then this is the topic for you! The Reltio Snowflake connector delivers clean, merged, and validated profile data to your Snowflake data platform, so that you always have accurate data to inform your analytical and reporting decisions.

Ensure that your Snowflake tables and views are always in sync with your Reltio data model to give you a seamless Reltio-managed onboarding experience and an automated Reltio data event streaming process. Instantly convert Reltio data updates into Snowflake tables and views in near real-time using Snowpipe. Reduce developmental costs and enhance operational efficiency by eliminating the need for manual data exports and custom programming when moving data into the Snowflake Data Cloud with the Reltio Snowflake Connector. Join us for a discussion of the connector, a review of its architecture, a demo of how it works, and Q&A.

For questions, ask the Community: https://community.reltio.com/home

Transcript found here:

Christopher Detzel (00:07):

Welcome everyone, and thank you for coming to another Reltio Community Show. I'm Chris Detzel and I am the director of our customer community and engagement here at Reltio. We have Jon Ulloa. He's a principal product manager here at Reltio, and he'll be talking today about our Snowflake connector in the upgrades and things like that. So really excited about that, from our new release. And Jon, this just came out yesterday, officially? Is that right?

Jon Ulloa (00:36):

Yeah, it's hot off the press, Chris. That's right. Yesterday was the release of the Snowflake connector.

Christopher Detzel (00:41):

Awesome. So we're here to talk a little bit about that. So the rules of the show today is just keep yourself on mute. All questions should either be asked on chat, or feel free to take yourself off mute and ask those questions. The call is going to be recorded, and being recorded as we speak. I should have that out by the end of this week or early next week. So take the survey at the end of the call. The Zoom popup will show up. It's only three questions. We always want your feedback.

(01:10):

As a matter of fact, because of your feedback, we have added a show or two. So, we have some shows coming up. Today's show is the Reltio Snowflake connector. And then Jon will be back with us next week on the Reltio Google BigQuery connector. That also came out yesterday. And then we'll have an ask me anything to continue the show that we had on how to search my data with the Reltio APIs. There were so many questions, that we're going to answer some of those questions on an ask me anything, but we want to take additional questions on there as well.

(01:47):

And then on the 15th, we're looking at understanding Reltio API performance and what that means. And then this was a suggestion from one of our community members, to say, "Hey, look, we need to know more about crosswalks. What are they? How do you use them?" So we'll be going deep into the ABCs of crosswalks, understanding their purpose and their use. So I'm going to stop sharing, once I find the stop sharing button, right there. And Jon, I'm going to hand it over to you.

Jon Ulloa (02:18):

Absolutely. Thanks, Chris. There was a comment on my audio. I just want to make sure that you guys can hear me okay.

Christopher Detzel (02:23):

It sounds great. Yeah.

Jon Ulloa (02:25):

Awesome. Great. All right. Let me go ahead and share my screen.

Christopher Detzel (02:33):

I love the background, by the way.

Jon Ulloa (02:35):

Oh, thanks. Yeah, I love Chicago. I need to make it back there. It's been a while. Okay, can you guys see my screen?

Christopher Detzel (02:45):

Yeah, sure can.

Jon Ulloa (02:47):

Alrighty, jumping into Snowflake here. So let me start with some slides first, to give you some on the connector itself. Oops. Share screen. And we'll jump into a demo. What I wanted to do today is go over what this product does, who it's for, what problems we're trying to solve, what are some use cases that we can help identify within the Reltio community here. I want this to be an open session here. So as I go through this, feel free to put some questions in chat, feel free to interrupt, ask some questions. I want to make this a two-way street.

Christopher Detzel (03:25):

Awesome.

Jon Ulloa (03:26):

The whole intent here is to get you guys familiar with the new connector that was just launched yesterday. And then we have some time for a demo. So, I'd like to show you what benefits and features we're actually showing in near real time, and get your feedback on that.

Christopher Detzel (03:43):

Hey, some folks are saying they're having some problems with the mic. I can hear you just fine. So I don't know. Gino, are you able to hear him okay? I just want to make sure.

Gino (03:54):

So just to compare and contrast, Chris, you sound like you're standing right next to me. Jon sounds far away, like he's in an echo chamber.

Christopher Detzel (04:03):

Okay. I don't know if there's something else you can do, Jon, just to... I don't know. I feel like he's... To me, it's fine.

Gino (04:12):

But I don't think it's terrible. I think it's adequate.

Christopher Detzel (04:14):

Okay. Let's keep going.

Jon Ulloa (04:16):

Yeah, yeah, no problem. I tried to move a little bit closer to the mic here. Hopefully that helps a bit.

Gino (04:20):

Oh, that's a lot better. Thank you.

Jon Ulloa (04:22):

Oh, awesome. Okay, great. Yeah. So, here I have the slide, three major challenges that we see with data and the industries that we're in here today. The first thing is that it's really hard to get that single source of truth. So this is the Reltio bread and butter aspect of identifying, funding, deduplicating, enriching data, and being able to trust that data and making sure that it's actually accurate. Data is coming in from all different sources, and it's really hard for companies these days to organize that, get that single version of truth to be able to tell a story, about how that data impacts whatever business initiative you're working on.

(05:01):

And then the second thing is that when we talk about data and the influx of data, it's a lot, and it's going to continue to grow and grow. And as data continues to be consumed at exponential levels, customers are trying to understand actually what's going on with their products. And businesses are trying to be able to communicate that. And the big thing is getting that feedback from the customers on their products, and being able to organize this data and stay up to date, especially in a digital economy. We're going to talk about eCommerce transactions. It's important to be able to get the latest and greatest data, to be able to accurately reflect what the customer is experiencing in that digital ecosystem, in a digital experience at that point in time. It's critical nowadays than it was 10 or 20 years ago. So the speed of data is absolutely important.

(05:54):

And then the third thing is that there's a lot of integrations that are going on, especially when we think about large organizations that are trying to have a data strategy, trying to do data analytics. Lots of decisions need to be made in terms of where it's stored and if it's decided to be stored in a centralized warehouse. There's lots of data that needs to be integrated with your first party data, third party data integrations, CDPs, MDM platforms, all sorts of different integrations that are hitting this warehouse. And they're constant. And there's lots of vendors in this space right now, and it's sometimes overwhelming for customers to be able to manage that and to be able to understand exactly what do I need to do, what's your product, what are you going to offer me? When you talk about time to value, being able to get set up from zero to one very quickly, that becomes more and more important for customers these days.

(06:46):

So, the vision that we have with this connector is to essentially be able to take the bread and butter of Reltio in terms of cleansing, deduplicating, enriching data, and be able to get that into a customer's Snowflake instance very rapidly, very persistently, and being able to expose the key data types that Reltio provides to customers today to either do analytics on Reltio to ensure that you're getting the best value from Reltio, and being able to understand, for example, the best naturals that are being applied in production, or to be able to take this single source of truth that Reltio is giving you and pass it downstream to other applications or business users for their own data science models, for their own analyst reports, what have you. And to be able to get that into Snowflake persistently and rapidly and uncovering all the key data types is the goal of what we're trying to do with this connector.

(07:45):

And so this slide is really... There's a lot going on in the slide though. What I wanted to cover here is exactly a more tactical view about how the Snowflake connector works. So when you think about the Reltio platform, there's lots of integration points in terms of how you can load your data. You can use the data loaded or a bunch of [inaudible 00:08:04] APIs to be able to get data from legacy systems, from third party connectors, from other enterprise applications, whatever have you. Reltio does its magic, cleans that data up, deduplicates and enriches it, and then it gets exported up to Snowflake.

(08:21):

The key difference here that we're trying to communicate with this new connector is that, as events are occurring in Reltio, we're listening to those events and capturing those changes in near real time. And so whatever you would see in your Snowflake warehouse would be representative of what you see in the Reltio UI when it comes to an event change. I'll go into more specifics as I talk about the architecture, but I think that's a really powerful thing that I want to say out here, because when we think about getting value from that data, it's becoming the fastest data that you can actually consume. But then, we also have different integration points to be able to visualize that data, and to be able to understand exactly what you're working with and what's actually most important.

(09:04):

And I have here a screenshot here of the recent Snowsight I integration that we did within the Snowflake connector. So what this product does is not only integrates into your warehouse, and we manage that schema out of the box for you to be able to work with the Reltio data, in near real time, but we also take it a step further, demonstrate some applications in our Native BI tool, which is called Snowsight, to be able to understand exactly what's going on with Reltio data. So there's an example of a matched KPI report. It can be able to understand how many entities with no merges exist, how many potential matches exist, what are the match rules for those potential matches. And then from a merge perspective, was it an auto merge, was it a manual merge? So being able to represent this in reporting formats and something that we document and we showcase as part of the product, and we think that it's extremely important.

(09:57):

When we talk about key Reltio data types, we're talking about entities, relations, interactions, matches, and merges. This is what we came out with with the GA release yesterday, and this is representative of the key data types that our customers at Reltio are interested in today. And when we talk about near real time applications, Chris, we can spend probably another community session on just going over the different use cases of near real time. But there's aspects from a healthcare life sciences perspective, from when you're talking about a medical device that's helping manage somebody's insulin levels, and they fluctuate daily, there's lots of spikes. A customer needs to know that, and that's near realtime information that's needed. When we talk about financial services and the trading desk and being able to process transactions and get a time stamp on that for auditing purposes, or maybe know your customer or anti-money laundering concerns about validating a new account.

(10:56):

When you're in these experiences and you're trying to get validation that that account was open, there's lots of backend processes that need to happen from a financial services organization to be able to go ahead and say yes or no. And in this economy, when everybody is expecting things to be done very quickly, that's now becoming an expectation that happens in real time and I'm going to prove for my account that I just opened. And so there's lots of different use cases out there, especially from insurance. Think about insurance and getting a quote on a claim. So being able to understand that you're shopping for an insurance and being able to get that quote in real time after this information is very important.

(11:37):

But there's lots of risks that need to be measured by the insurance company, lots of business rules that need to be evaluated from an underwriting perspective. How do we go ahead and communicate to the customer that we can proceed on this journey, giving them the validation but at the same time protecting that business? And this is a near realtime expectation that's becoming more and more common.

(11:59):

One other use case I'll talk about, and then I'll get on I promise, but I'm just so fascinated with the near realtime component and what this connector can do. So, when you think about logistics automation, when you think about warehouses that are out there these days, I walked into a warehouse about 20 years ago, and this is in Equinor. My uncle was helping manufacturing our warehouse. And part of my job a couple years ago was to do some software automation on robotics and warehouses. And I got a chance to walk into a warehouse about three years ago, and I barely saw any people. Everything is being done by robots these days. So when you think about getting a pallet off a crate and transferring a parcel to a chute, passing a parcel down a conveyor belt and putting it into another conveyor belt, there are robots that are essentially doing these things now.

(12:54):

And when you think about the demand for eCommerce and package delivery, Prime Day, holidays, these facilities are all powered by robotics, and they're pumping 30,000 to 60,000 packages a day. And that can be even more in real time. So the cost of one package actually slipping up in a robot may be missing the idea of the package on understanding how to work and the software is not smart enough to do so, stops that routine, that flow for 30 minutes. That costs the company roughly around $5,000, because you are essentially preventing multiple packages from getting through, getting delayed, and then risk of [inaudible 00:13:39], and then not to mention the overhead of shipping.

(13:41):

So when you think about near realtime importance of data, and having that robot understand and being able to adapt to the different variations of the packages that are coming in, and being able to share that knowledge to other robots, in another facility that's on the same line, the same make and model of that manufacturer in near real time to prevent that $5,000 mistake, and could be more, that's extremely important. So when we think about near real time and being able to promote data into the warehouse, the Snowflake warehouse of the customer, and that's the central analytics repository, to be able to harness that energy and give that data to the people that can make these decisions is extremely important.

(14:28):

All right. So now I'm going to talk a little bit about the Snowflake architecture here. So on the left side, when we see a separation of the routing environment and the customer environment, we have the MDM platform service and this is where our customer tenants are located, and lots of activity going on in these tenants. There's an entity created, there's a new data source loaded, there's a relation that's been deleted, match and merge events going on all the time. And what this connector is doing is it's listening to these activities. So every single event that occurs within the customer's tenant, we're able to track that, and we feed that into a microservice. This is a Snowflake connector, to essentially transform those events into compatible tables and views for Snowflake, and it's packaged and compressed JSON.

(15:17):

And so what we do from there is we send that into an intermediate step, which is a cloud storage bucket for a customer. So what you see on the screen here, it's saying an S3 bucket, and that's for AWS customers. But I want to be very clear about, especially with the release that happened yesterday, is that this is not only for AWS cloud users with Snowflake environment in there. It's also for Azure and GCP. So, you can go ahead for Azure customers, you can go ahead and replace that S3 bucket with a blob storage. And then for GCP you can replace that bucket with a GCS bucket.

(15:56):

And so what we're doing here is we're making this very compatible to all sorts of the major cloud providers, to be able to provide this event flow. And so just for this example here, we have a cloud storage bucket, and we passed that compressed JSON to that bucket. And so we have documentation for a customer to set up a queue and to set up a Snowpipe to essentially listen to these events that are landing in this cloud storage and passing it to their Snowflake in near realtime. And so that Snowpipe element facilitates that near real time transformation into a variety of data tables that we provide out of the box. And so that's essentially what's happening here. So we're converting all these events into easily digestible data tables for our customers to query in near realtime.

(16:46):

And just wanted to touch on permissions and security aspects of this. So, all we need to essentially get this data into your cloud storage bucket is this one single right permission into that bucket. From there, when you see that dividing line of the customer environment, that is all internal communication from your own cloud storage bucket to your own Snowflake. So there's no vendor concerns about directly inserting things into your Snowflake. We have a very relaxed policy to go ahead and just write that permission into your cloud storage bucket. And this has been widely adopted by several customers, with little friction in terms of IT organization pushback. That's one of the things that we designed it this way, so that we can comply with any sort of IT policy that may make this a little bit difficult, but this is something that was purposely designed to make this a lot more seamless and a lot more aligned to each company's IT policy, if that makes sense.

(17:54):

Okay. I'm going to pause here. Any questions so far, Chris, or are we good to keep on moving?

Christopher Detzel (17:59):

Yeah, at the moment I think we're good.

Jon Ulloa (18:01):

Okay, awesome. So I'm going to go into this, a little bit of the demo. We're going to actually going to see some live data, but I wanted to just to present the schema. So I mentioned some data tables, from a Snowflake perspective. We have a landing table. This is where all the information is first put into the customer's Snowflake environment. And it's going to include all sorts of the entity relation, interaction, match, and merge events, and everything is... This is the starting point. You're going to be able to identify if it's an entity type, if that's an organization of contact. And then more importantly, when was the event, when did the event occur, what was the timestamp, and who did that event. And then a variety of JSON arrays for attributes and crosswalks for that specific entity, for example.

(18:50):

And so everything here, this is the start of where the data seeps in. And as part of the product, we have a variety of staging tables. These say, "Views." We offer views too, but we recommend tables and we provide essentially scripts for this, to organize the data into these key patent types. So we have entities, relations, interactions, matches, and merges tables. And within the entity table, for example, you are able to see the attributable crosswalks associated to that entity. And that's key. And you're still representing that event, created time, that event, updated time. And more importantly, what you'll see in this demo is that we look to represent the latest object version for querying purposes. So that landing table has that historical version management of maybe there was a bunch of changes to a specific entity, and that's all tracked there.

(19:45):

But what we're trying to do with this table is represent the latest version of that entity for the latest, most, I guess relevant or accurate aspect of that event, for querying purposes. So it's going into this near realtime fashion, that we're also representing the latest object version for analysis and insights.

Christopher Detzel (20:08):

Hey, Jon.

Jon Ulloa (20:09):

Oh, go ahead.

Christopher Detzel (20:10):

I have a few questions now. So, can we define the response time for near realtime? I'm interested to know if the interface to Snowflake can be plugged into a UI.

Jon Ulloa (20:21):

Yeah. So essentially what we guarantee is 400 objects per second. So that's roughly 1.5 million records an hour that's essentially processed into Snowflake. So you think about that, near realtime can happen in a couple of minutes, if that helps explain that.

Christopher Detzel (20:40):

Yeah, I love the stats, man. Is it bidirectional connector? So can data flow from Snowflake to Reltio?

Jon Ulloa (20:46):

No, that's a really great question. The release that happened yesterday is starting out with one direction on the export fashion, but on our roadmap, we're hoping to make this in a bidirectional feed, so that we can import and export from Snowflake directly. There are aspects of... bidirectional aspects. So from the analytics attributes perspective, this is something that's usually powered by the customer. So let's say a customer would want to create maybe a target score for an entity, like it's a contact. So part of my demo today is going to be a B2B lead score case. So for people that are looking at prospects to engage in the B2B sales, you have certain information about a prospect, but there's also maybe a score or a likelihood that they would convert for your product.

(21:41):

You can essentially put in, in an analytics attribute here that score. It'll calculate that, pass it back into Reltio, and then on your next export you'd be able to see that analytics attribute lead score being populated in a [inaudible 00:21:55] fashion. So that's something that's available today. But when we talk about full scale, like import and export, from Snowflake specifically, that's on the roadmap [inaudible 00:22:06]. Hopefully that answers your question.

Christopher Detzel (22:07):

And just two more quick questions and we'll move on. So Snowflake schema has to match Reltio data structure, is that right, and say L3 data model for pharma?

Jon Ulloa (22:18):

Yeah, right. So the schema structure should match one to one to Reltio, so it's JSON kind of structure. And so that's the way the out-of-the-box schema kind of works. And so we designed it to be flexible so that it is one-to-one to Reltio, and this helps for validation and verification of Pounce, understanding exactly mapping, that you've got the right data from Reltio, and that's the way that it's designed. But it's flexible if once it gets into the data table, you can manage to whatever schema that you want from there.

Christopher Detzel (22:50):

Okay. And then they keep coming in, but I'm going to ask in a minute, because I think we should move on, but one other question that was directly to me, was yesterday's release the first Snowflake connector release, or was it more of an enhancement?

Jon Ulloa (23:05):

Yeah. That's a good question. This is a new separate product. We do have a legacy Snowflake connector out, but that is a product that is being duplicated, and we sent out some notification on that to our industry legacy customers. This is a new product because it essentially has a different underlying architecture on how it works. So the way to think about it is that we had a legacy product out there. It was more of a pool. So customers had to essentially use an API to call into Reltio manually, or maybe they could schedule some incremental batch jobs, or what have you. But there needed to be a pool from Reltio to go ahead and do that.

(23:45):

This product is essentially constantly pushing the data so that the customer doesn't have to worry about stuff like that. So this is technically a new product.

Christopher Detzel (23:55):

Great. I have a couple more questions, but let's keep going and then I'll ask those here shortly.

Jon Ulloa (24:00):

Yeah, sure. No problem. Yeah. Again, let's talk about the relation table here. The main difference that we'll see is that there's a start and end object here. So you're understanding the lineage of the relationship, where it started, where it ended. That's the key thing you want to remember about that table. From an interaction perspective, we have members of the interaction. So, if there's an entity that made a payment to another organization, another entity, that would be captured here as members, to be able to track that interaction.

(24:29):

And then from a match perspective, we have here potential matches, not matches, manual matches, and then a version of a timestamp. So when we think about potential matches, being able to export that full set of matches is definitely a use case that we're trying to solve for. I believe in the Reltio platform you can be able to understand potential matches as a small subset, but you'd be able to get all the potential matches here, and be able to classify them, and then do some troubleshooting. If you wanted to maybe go to production and you're in the test environment, you're trying to play to see if these potential matches are actually true, are they working with your match roles and be able to tune that way, that's a really popular use case for this table.

(25:12):

From a merge perspective, this is understanding what had already happened. So that validation of here are the match rules that I applied, here's a winner and loser, here's a merge tree kind of information, here is a match rule associated to that, and here is the timestamp that occurred. So this is just about optimization of Reltio performance here with these two tables.

(25:36):

Okay. I talked about the features and benefits and I'll just maybe end it here and then go into that demo, because I know we're running a little low on time, but when we talk about out-of-the-box, Reltio managed pipeline, as soon as this is configured and set up, after the initial data load, the customer doesn't need to touch it. This is managed by Reltio. So if you perhaps make a schema change, you add in a new attribute into your data model on Reltio, that's automatically picked up. So, there is really not a lot of user actions to be taken here, as this is more of a consistent persistent pool, pushing of the data rather than a pool. And so that's something I definitely want to highlight here.

(26:21):

In terms of data types supported, we're starting with these five data types that we discussed. But on the roadmap here, we're going to add more and more data types like activity log, history, workflow, things of that nature to really get that 360 view. But we have this key database that is started right now with the release that just happened yesterday. I think I talked about near realtime quite a lot. So I think I'll let that feature speak for itself. But yeah, we also touched upon the fact that this is compatible with Google Snowflake, and AWS Snowflake, and Azure Snowflake.

(26:54):

And the fact that we definitely put security first, so we work very closely with our Snowflake partners here and Snowflake engineers to understand exactly how to make this as cost-efficient as possible. When we're streaming a bunch of data, you're going to get a lot of updates. So it's all about setting time to live policies on your storage buckets that we provide some recommendations for. It's all about using some features within Snowflake, time travel, archival processes, to be able to manage that data flow. And this is also why we present the latest object version, because you only want to understand what the latest change is, and you have a historical table to look at that kind of stuff. But for now, we want to keep it lean in terms of respecting your cost elements of producing this and putting this in your Snowflake.

(27:41):

So that's something, I definitely wanted to cover that. So Chris, I don't know if you want to pause for a couple more questions, and then we can go into the demo if that works?

Christopher Detzel (27:50):

Yeah, works for me. Let me get to that question. Do we provide an out-of-the-box report or reporting templates?

Jon Ulloa (27:59):

Yeah. I mentioned the Snowsight integration earlier, right? It's a local BI native tool for Snowflake. Sorry, were you saying something, Chris?

Christopher Detzel (28:08):

No, no, no, that makes sense.

Jon Ulloa (28:12):

Oh, okay. Yeah. Yeah. We do have a connection to that. And so what we do is we have a documented reporting template, that essentially shows how to make a basic match KPI report. So understanding those potential matches, the match rules associated to that, and then drills down to the entity ID. So if you were to add a new potential match into Snowflake, it would essentially show up in that report. And so the whole goal is that we're providing a template for customers to visualize that data, to understand, okay, this is what it is for this specific use case of understanding match rules, and optimizing that.

(28:53):

But the goal is to provide that as a template so that you can use that template for other use cases that you may have, and that's what we're providing with the connector today.

Christopher Detzel (29:02):

Great. Is the linked flag in the event indicating merge or a relationship, or maybe something else?

Jon Ulloa (29:10):

Yeah. The link is connecting two events together, and that's used to essentially help with the processing of maybe the relation table. So if it's a linked event to another linked event, how do we get the start and end objects to be put in that relation table? That's predominantly used as a flag to help organize those scripts that we provide down the line, if that makes sense.

Christopher Detzel (29:37):

Great. And I don't know if you know this answer, but is Redshift Connector an opportunity in the future for Reltio? Do you know?

Jon Ulloa (29:43):

That's definitely on our roadmap, so I appreciate that. So our goal is we started with the two most popular connectors. So when we think about dropping data into analytics warehouses that our customers are using today, Google, Snowflake, Databricks, Delta Lake, those are all connectors that are extremely popular. But we are aware that Redshift is out there. We're aware that Azure Synapse Analytics is out there too. So these are things on our roadmap that we're definitely targeting. Yeah, if there's an opportunity for that person that wanted to discuss more with me on that, I'm happy to schedule something separately, but no, I appreciate it.

Christopher Detzel (30:18):

Great. Two more questions and then we can go to the demo, because I know people are excited about that. Is this an extra add-on, or do we need to add on our org subscription? How does that work?

Jon Ulloa (30:31):

Yeah. So this is an additional subscription. So this is something that I would recommend to talk to your AEs about, if you're interested in the demo and in this connector. Yeah. It would require [inaudible 00:30:45].

Christopher Detzel (30:45):

So two people asked me that question, so boom. Perfect. And then can we include audit logs, part of the data loading to Snowflake?

Jon Ulloa (30:54):

Yeah. That's a good question. So I mentioned that we're launching our connector. We launched our connector yesterday, and we're starting with some key data types. One of the data types on our roadmap in the future is audit data. So for audit logs, that is planned for a future release.

Christopher Detzel (31:12):

Great. Let's keep going, and we'll keep piling in these questions, and then I'll start asking them here shortly again.

Jon Ulloa (31:29):

Okay, sounds good. All right. I assume you guys can still see my screen?

Christopher Detzel (31:31):

We can.

Jon Ulloa (31:32):

Awesome. I have here a connected Snowflake instance here with our connector here, and I have a variety of queries to show you a little bit more about that schema that we were talking about with the data table. And let me just go ahead and start running this, so it will give you a sense of what schema we're working with here. Okay. By the way, I don't know if you guys have started using the new Snowflake UI, but I converted it about a month ago and it's really good, it's really snazzy. So if you guys are considering doing that, I was a console UI person for a while, but this is super helpful to be able to understand JSON comparisons across. It keeps on the same line, which is great. And the only UI, Chris, I can think of that's close to this is the new Reltio UI, so the shameless plug for Reltio UI.

Christopher Detzel (32:34):

Well, shameless or not, people are having to use it, you know what I mean? So that's exciting. Everybody's changing their UI for better experience.

Jon Ulloa (32:43):

Excellent. Yeah, excellent. Yeah. We have this data table right here, and we have a UI. And I mentioned this version. So this is going to be tracking all the versions of events that are associated with the particular object, whether that's an entity relation, interaction, match, or merge. And I'll show you through the demo how this is facilitated through an event change that we'll create in the Reltio UI today. But this is an important feature to be able to represent that latest object version.

(33:10):

And then we have this associated timestamp of when this event occurred. And then we talk about this deleted linked aspect. So, we want to identify if the entity was deleted, and we're going to go ahead and put that here. We want to identify if there is entities that have a relation, and we grow that here. Then within the object type, where is this an entity relation, interaction, or merge? And then the type within the type within an entity, in individual organization, contacting. Think about that. And then we have all the packaged JSON in terms of analytics attributes, crosswalks, things of that nature that are packaged here at the start of this landing table. So I just want to be clear, that this is a table that's used to facilitate the main tables that you would use to query against.

(34:03):

All right. So this is the entity view here. And so here we have the version, is going to be the latest version of this particular entity, the type of that entity. And then we have the JSON within attributes and crosswalks here, that analytics attribute, so that right back functionality, that's optional for customers to pursue. They're going to use this, but that is possible and available, who it was created by, when it was created, updated by, and then updated time, and then if it's accurate. So we want to, again, measure if these things based on deletion events or inactivity, if they're active, we want to be able to have a flag to classify that stream for analysis for analytics purposes here.

(34:47):

And so go down just very quickly to relations, and same kind of structure as entities. And then we have the start and end object here. So understanding where this started from an object URI perspective and the crosswalks, and then where it ended is important. And that's what we're trying to show here from a relation perspective.

(35:17):

Interactions, again, the member function here, this is the difference in the table. So who are the associated members in a particular interaction? And this is something that Reltio helps track and helps you understand it. And then, again, active matches, important to represent that, in terms of what's relevant today, instead of having information on a deleted entity and matches for that, that entity was deleted. So we want to maybe focus on active, which is true.

(35:51):

And then we have this potential match. So we have the entity ID, and then we have this potential match that's associated with that entity ID, and then we have the match score as well, and then the match rules as well, if there are any. And that's going to be classified in this column. And then, if it's not a match, like this one is, we want to call that one out. And if it was a manual match, not based on any kind of match rules that was done maybe by a data store, we're also calling that out here. And then the effective timestamp, and then [inaudible 00:36:22] which is true.

(36:24):

And then the last thing, from a data schema perspective, is showing the merge key, and then what actually happened based on those naturals, the winner ID, the loser ID, the timestamp, the match rules, and the URIs associated with that. You receive... There's an automatic batch by [inaudible 00:36:42] number, and that's mentioned here for analytics purposes. And then if it's an auto batch based on a rule, or if it's scheduled, or if it's a manual match, and then again, if it's active.

(36:54):

So hopefully this gives you a preview of the data schema and just with real data to be able to understand that. But I think some things I also want to call out is that a lot of customers have asked, "Okay, how do I break out some of these things? How do I work with this data that you give me? I want to go further, I want to dig deeper." And so based on our documentation that we released, we have that ability to do that. And so this is just showing an entity.

(37:23):

So I mentioned that we have entities by types, and depends on the customer data model, that in this particular tenant we have organizations, we have individuals, we have employees, we have contacts. So I just want to focus on contacts. So I'm just going to go ahead and limit a view on contacts. And we have queries to show you how to isolate that, if that's something of interest right.

(37:43):

Now, other customers are asking if we want to be able to understand, okay, now I have my focus entity type that I want to analyze and query against, but I don't want to worry about OB false or any kind of noise. I trust the Reltio data. This is telling me operational value is true and I want to use this for analysis. I don't want to worry about the junk. And so this is essentially a query that we have a sort procedure that we've provided documentation for, to essentially focus on only OB values.

(38:19):

So if you look at this right here, I don't know if you saw before in the attributes, I'll go back just to show you, but there is a lot of OB true information. And then depending on how many events and how many sources are conflicting with an entity, there will be a lot of different addresses, for example, here. But here, we're just seeing the single source of truth address that we defined on the Reltio platform through customer roles as well, to understand that this is what I care about and this is what I want to go ahead and analyze against. And it's the same for crosswalks as well.

(38:57):

And so the last thing that a lot of customers also ask is, "How do I essentially get rid of that JSON, and the attributes and crosswalks. I need to flag that. That's super annoying." Definitely understand the pain point there. So we're coming out with documentation, essentially how to do this. So this is essentially taking certain attributes from this contact view with the OB values. So, I already know I want to focus on contacts, I already know I want to focus on the single source of truth, and now I just want to be able to see these attributes spread out on column based, and flattened out so I can understand exactly what I need to focus on from a lead score perspective, from a variety of prospects, for example.

(39:45):

And so this is all flattened out for analysis for this B2B kind of lead score use case right here. And that's what we want to show, is that the data that we deliver is flexible. You can create custom views on that. And the most interesting thing is that once we create an entity, we'll just do this in a second, everything slows down, and you'll see that, that everything gets an update, even the custom views. Any questions before, Chris, or can I keep on going?

Christopher Detzel (40:17):

Yeah, there's some questions. So you want to just answer those now, I guess. Okay, let's do it. Can we include audit log as part of the data loading to Snowflake? I think we asked that already.

Jon Ulloa (40:28):

Yeah. Yeah. That's a future-

Christopher Detzel (40:30):

Yeah. What are some specific business problems this can solve upon implementation?

Jon Ulloa (40:37):

Yeah. That's a good question, and let me go through this demo here because this is the lead score demo. Let me set up the situation a little bit more. So, I have created here an entity contact kind of list here, and I'm only curious about OB values, and I'm looking at the attributes. And I have a variety of attributes here that I have listed, but this is for essentially sales or field team to be able to understand, okay, who do I need to target that is interested in my product, and based on their interests and what stage they're in, if it's an opportunity, if it's a qualified sales lead, if it's a technical evaluation, be able to get a pulse check into how they're doing. And there's a lead score associated to this.

(41:21):

So when I create another entity, and let me get through this demo here, but we'll show you that, if there's a change in this score and somebody is trying to target a certain subset of a highest scoring aspect of it, that data immediately gets updated for a sales team to be able to work off the latest list. And this is just one example here. I do have more examples that I go through actually in our academy training, and so recommend that we check that out as well. And I'll sure I'll be on here more to maybe discuss more use cases in general. But hopefully this will be good for this demo here.

(42:01):

So let me just go... Oh, yeah, let me just go ahead and continue and let me show you what I'm talking about here. All right. I have here, this is the tenant that's connected with Snowflake Connector. And what I want to do here is I want to create a new profile, and I'm going to make it a contact because that's what I have that table for. All right. And then I'm going to go ahead and say I want to target somebody, James Bond. And for anybody curious, James Bond, I'm a Pierce Brosnan fan, that's my Bond. So we're going to go ahead and make this a lead source, which this is a referral, and this is the employee that's tracking this one down. The department that this James Bond in, let's call that operations. And the score right now, let's just call it an 80. Put in some just random attributes here. Senior director. Okay. Stage.

(43:27):

Let's see. Lead status: active. Then let's just call this [inaudible 00:43:40] for now. Okay. And so I'm creating this entity right now. So I'm looking this up. Okay, I think we should be good here. I'm going to save it. And we have this new entity created. So, I'm this entity ID, I'm going to take this entity ID. I'm going to move away from the Reltio screen here, and I'm going to put this into the data table. So I want to show you just how quickly this is going to get updated. And so when I go ahead and run this query, that this is the data table, this is a landing table that first projects an entity update. So I just created a new entity. And we want to see after I run this, if it's going to show up.

(44:31):

Let's see. We've got here... Look at that. So this is the entity that was just created and let's just look at this JSON here. Yeah. I see operations value. This is James, this is the exact thing I just created, James Bond. And so that's great. That was super fast. So I want to go ahead and continue to show how it flows down here. Here. I'm just going to [inaudible 00:45:05] all this stuff [inaudible 00:45:08] understand this.

(45:00):

Okay, so entities, this is the table that we provide. We provide a script, that you run it off at this table. Let's see if it shows up there. It does. It's the same scheme, attributes, crosswalks that are now broken up. And this is the same kind of information that we just tracked in there. Let's do it for this custom view, which is also important. We want to show that even if you're creating custom views on something that Reltio provides, it's still showing up. This is something that Reltio documents, on how to do something, but it doesn't necessarily provide out-of-the-box. This is something that you would create on your own and this is filtered by contact.

(45:50):

And then let's continue to go on the OB selection. Okay. So now we have no more OB true rate information. This is the single source of truth just for this demo here. This is an entity created. So in real life there will be multiple versions of this, but it would be the latest that would be represented here, and we'll do an entity change up and show you what I'm talking about. But this is essentially what this does for you here.

(46:22):

And then last but not least, we have this [inaudible 00:46:26] attribute demo here, and it shows up. So you see James Bond, 34, male, senior director of operations. This is a referral, Sam Smith, technical evaluation. All these things I've gotten to show up. And so when you think about, again, providing a list of lead score for a sales or field team to focus on, you now have this being showcased here. Let me see if this is... Yeah. This is... Let me see. One second. Spreadsheet. [inaudible 00:47:21].

(47:28):

There we go. Sorry. Yeah, he was 80, so that was a greater than 80 thing. But if it's greater than 79, 80, or above, we see James Bond here, and all the associated lead scores of these candidates. So you're shortlisting these candidates, and then this person that was just a new entity that was created in the system is now a part of that list. So you see the value of near real time, is that the salesperson is now enabled with another person to contact.

(47:54):

And so another thing I want to showcase here is the time travel feature that works with the Reltio data that we provide. So if I look in the past minute of activity with this entity here, you'll see that it was created. This existed a minute ago because I just created it. But if I go back 10 minutes, you're going to see that it's not there, because there was no entity. So, when you think about managing different versions or updates of an entity, being able to track that lineage of that history, this is a pretty neat time travel feature that you can utilize with the Reltio data that is piped in through the Snowflake Connector, to be able to understand different changes of an entity that's associated to that.

(48:48):

So now what I'm going to do next is just quickly change the score. I'm going to edit this here. I'm going to make it 93. And what caused the 93? Maybe it went to the [inaudible 00:49:10] technical evaluation was great, and then maybe now they want an add-on service or something, and it's 200,000. That's what drives the lead score, just for simplification purposes. And I'm going to go ahead and save this update, and then I'm going to go ahead and see what happened again with this update. So I'll do through this one, entities. Got it. Sorry. It's the same ID.

(49:42):

All right. So we're going to see again how quickly this shows up. Actually, I'll start with the data table because I want to show you the version and stuff. All right, I'm going to go ahead and enter this query. And then we're going to hopefully see two different versions of the event. There's the entity that was just created and then I just made some updates to it. So let's see if it shows up here.

(50:05):

It did. So we see two versions that's occurring here. We have, in this version, we just changed a bunch of things. We'll see value scores. So the lead score is 93. This is 80. So you can see that there's different versions of these events that are categorized in this landing table. The most important thing that is important here is that we want to show the latest version. So this is the most recent entity update. And we see, again, that this is going to be the change in the lead score of 93.

(50:51):

And then when you think about showing this again on the latest list, it's going to be flexible to show up on this one too.

Speaker 4 (51:01):

A small question here. Hello. The time travel, are you using the Snowflake standard feature, or you're bringing in the data into the table so that... because Snowflake only has a maximum time travel of 90 days backwards, right? That's my understanding.

Jon Ulloa (51:17):

Yes, that's right. This is a Snowflake feature. Yeah, it's 90 days for time travel.

Speaker 4 (51:22):

And just a small followup question. When these API calls which are happening for the Snowflake Connector from Reltio, we have a limit of one million queries, one million calls in a day. Does Snowflake... Does that count to the daily quota, or that is not included in the quota?

Jon Ulloa (51:41):

For your Snowflake-specific quota?

Speaker 4 (51:46):

No, no. So if you go in your initial security diagram in which you offer Reltio publishing to an S3, SQSQ, and a S3 bucket, in your VPC, on the left side, these are calls happening. These are API calls happening, and we have typically on the base Reltio one million calls. So are those calls going towards the quota? Because if I do a mass upload, I'll basically suck up my entire quota for the day, right?

Jon Ulloa (52:19):

Yeah. No, that's a good point. I don't think so, but let me go ahead and confirm on that, because I understand that concern and I don't think it should be counted. But let me go ahead and confirm that for you.

Gino (52:28):

No, outbounds are not generally considered API calls.

Jon Ulloa (52:36):

I appreciate that. Thank you, Gino. Okay, one last thing. If we have some time I can go over an API, but I'm not sure if we want to just maybe pause it, Chris, if we want to answer questions, or I can just quickly demo an event monitoring API real quick.

Christopher Detzel (52:52):

Yeah, we have several questions, and maybe I can push those out to the Reltio community and just have you go answer those directly. Is that fair?

Jon Ulloa (53:00):

Yeah, I can do that. Whatever works.

Christopher Detzel (53:02):

Okay. I'll push out the questions to Community.Reltio.com, and then he'll answer those specific ones, and I'll hopefully do that by Monday or Tuesday.

Jon Ulloa (53:13):

Okay. That works. So let me go ahead and quickly look at this entity monitoring feature that we have here. I want to go with this data table and I'm going to copy a timestamp to this one. All right. Copy this timestamp here.

Christopher Detzel (53:43):

By the way, when you were talking about time travel, I thought you were joking at first and then it's like I guess I've heard that correctly.

Jon Ulloa (53:50):

It's a real thing.

Christopher Detzel (53:50):

Yeah.

Jon Ulloa (53:55):

That's what Reltio is doing, some time travel. Yeah, we're not releasing that. Just to be clear. All right. Yeah. I have the gut status here, and what I'm going to do is... This is best practices before you start doing validation after a major sync that you see and you run this, to see if there's no events that are occurring. Refresh. One second. Sorry.

(54:26):

All right. I can start taking some questions as I refresh this token, if that works, Chris?

Christopher Detzel (54:31):

Yeah, so why S3 is needed in the middle instead of directly reading from AWS SQS?

Jon Ulloa (54:41):

It's essentially a security measure so that customers, if they wanted to encrypt a data at rest, they can. And so by putting it into a storage bucket, and then also connecting to Snowpipe, that's one of the requirements to do that. And so that's essentially why we designed it that way.

Christopher Detzel (55:00):

Great. And Brad asked, for the entity JSONs, do they contain just their survive values or are OV values included in the attributes?

Jon Ulloa (55:12):

Entity JSONs? They're both. Both the values are included. The query we're showing is just to isolate OV, and the raw JSON in both of them would be included.

Christopher Detzel (55:27):

Yeah, and I think this was answered by Gino, but just to make sure, is there a mechanism on the Reltio site to delete data from these shared Snowflake tables for GDPR purposes, if required?

Jon Ulloa (55:42):

Sorry, the question was tracking?

Christopher Detzel (55:45):

Is there a mechanism-

Gino (55:47):

Yeah. So Jon, the question is is there something that will... If you do a GDPR delete, will it automatically delete the data out of Snowflake? And I think the analogy is it's the same as you would have to delete data from SAP or Salesforce or whatever. Reltio does not actively go out and delete data and other applications in a GDPR delete. You need a workflow there to manage that. Reltio is going to have all the crosswalks for you, but we don't actively delete that data from other places.

Jon Ulloa (56:19):

That's correct.

Christopher Detzel (56:22):

Great, thank you, Gino. Can you link the OV true stored, is it procurement documentation, or PROC documentation?

Jon Ulloa (56:34):

Are we true to POC? So PCR, what was that documentation?

Christopher Detzel (56:37):

Sorry, can you link to the OV true stored PROC documentation?

Jon Ulloa (56:45):

PROC documentation. I'm going to have to get back to you on that one.

Christopher Detzel (56:49):

All right. Yeah. And there's several more questions. I don't know, we only have like one minute. So, how about one more question?

Jon Ulloa (56:55):

Yeah, sure.

Christopher Detzel (56:57):

Is these tables created during the data ingestion, or do we need to pre-create these tables using DDL provided by Reltio before we start the data ingestion?

Jon Ulloa (57:12):

Before we start the data ingestion? Sorry, can you repeat that question?

Christopher Detzel (57:15):

Yeah. Are these tables, are they created during the data ingestion, or do we need to pre-create these tables using DDL provided by Reltio before we start the data ingestion?

Jon Ulloa (57:27):

Yeah, they need to be created before the actual data ingestion. So there's documentation that goes by step by step on the configuration. We recommend some validation to be done, just re-indexing a simple entity and making sure that everything is going forward with the configuration. And then after you create these table structures, then you do an initial battle load. So hopefully that answered your question.

Christopher Detzel (57:50):

Yeah. Great. Jon, unfortunately we're out of time, but I am going to post all these questions to the community in the next few days and hopefully we get those, or Jon, I'll have you help answer those, or if there's others that can help answer those as well. But this is really good. Hopefully you guys liked this. Please take the survey at the end, the popup. Really appreciate you guys coming and until next week and when we do the Google BigQuery one, so that one I'm really excited about, Jon. So, he's going to be back.

Jon Ulloa (58:23):

Yeah. Sounds good. All right, guys. Take care. Thank you for your time.

Christopher Detzel (58:26):

Thanks, everyone. Take care. Bye-bye.

#CommunityWebinar
#snowflake
#connector
#Snowflakeconnector

#Connectors

0 comments

9545 views

Reltio Connect

Reltio Snowflake Connector - Show

By Chris Detzel posted 02-23-2023 17:30

Permalink

Quick Links

Privacy & Terms

Account Not Active

Reltio Connect

Reltio Snowflake Connector - Show

By Chris Detzel posted 02-23-2023 17:30

Permalink

Quick Links

Privacy & Terms

Contact Us

Account Not Active