Summary:
The Reltio Community Show is hosted by Chris Detzel, director of customer community and engagement, with special guests Abhradeep Sengupta, Senior Product Manager at Reltio, and Dimitri Blinov, Principal Product Manager at Reltio. The show will cover the topic of How to Search Data with Reltio APIs. There will be a follow-up session in a few weeks and the call will be recorded and posted to the community. There will also be a survey at the end of the show and a discussion thread will be posted on the Reltio community for further questions.
In the storage architecture, there is a compute plane which contains all the data compute and processing capabilities, but doesn't persist any data. On the very bottom is the data plane where all the storages such as primary data storage, elastic search index matching storage, activity, and history storage are located. The way the top layer interacts with the bottom layer depends on the type of search you're doing. For example, when you do a UI search, it will hit the elastic search index first, where only references to the data are stored. When you request an object directly by crosswalk, you go directly to the primary storage to fetch the object. When you search for potential matches, you hit the matching storage which only contains references to the data. When you search for activities or history, it fetches the data from activity or history storage.
The text discusses the capability of searching for a specific entity type or all types through both the UI and API in the system, with the option to make a specific entity type the default. It also touches on the idea of a faceted search through the UI, where the number of entities found can be broken down by their prefix value distribution. The discussion also mentions the idea of fuzzy search, which isn't currently available through the UI but is a popular use case for customers through the API to handle spelling errors or variations of a search term, and how the platform defines the amount of fuzziness allowed based on the search term length.
This is an explanation of advanced search and type-ahead search in a software system. In the advanced search, there is a rest API call to filter entities using multiple attributes, such as last name, email, and so on. Advanced search can handle complex queries by combining multiple operators. In the type-ahead search, there is a UI component in which the user can search for different instances by typing in the top right corner. The type-ahead search provides suggestions in a drop-down list as the user types.
The speaker is in the same environment and is going to demonstrate the functionalities of the Save Search API, including updating and retrieving the saved searches, and the creation and deletion of a saved search.
Data models are important for searching for a particular entity in the crosswalk by using the combination of the ID of crosswalk provided and the type of the source. The search hits the primary storage directly and provides information such as the ID and the source type.
The process of searching entities through API involves an initial request to get the first set of entities based on a filter, followed by subsequent iterations using a cursor value to get the next set of records in a loop until it reaches the end of the cursor. The first iteration must have a filter and the subsequent requests only need the cursor value without the query or request body, which is handled automatically.
A relationship in the text refers to an entity that contains names and descriptions and can be either undirected, directed, or bidirectional. It has start and end entities and can have base attributes, directional context, and attributes. The APIs such as find tree and find connections allow one to search through the entities and relationships, building a tree or graph of a specified depth to find the desired information.
The performance of an API is a crucial factor in determining the latency of the platform. Factors that can impact the latency include data volume, architecture, and attributes of objects being searched or saved, among others. The benchmark documentation provides information on how certain conditions such as the number of filter conditions and crosswalks can affect the latency.
Dmitry discusses the possibility of implementing relevance-based weighted search with weight scoring and API responses, but mentions that it is not currently available.
Transcript:
Chris Detzel (00:00:07):
We're going to go ahead and get started, and thank you everyone for coming to another Reltio Community show. My name is Chris Detzel. I'm the director of our customer community and engagement. Today's topic and show will be, how do I search my data with Reltio API? Special guest, Abhradeep Sengupta. He's our senior product manager here at Reltio. Dmitry Blinov, he's a principal product manager at Reltio and he's been on some shows in the past.
(00:00:37):
And so just some rules of the show. One, keep yourself on mute as usual. All questions should be asked in chat or feel free to take yourself off of mute and ask. We will have a follow-up session in two or three weeks on a ask me anything around this topic, but that's not to say we still can't ask questions here. The call is going to be recorded and posted to the community and at the end of the show please take the survey at the end. It's a Zoom pop-up.
(00:01:08):
Just to give you a glimpse of a look forward, today, we do have our show on, how do I search my data with Reltio API? We have a new update on our Snowflake Connector, which if you were there for the release, you've probably heard that. We'll go deep into that. That's on the 23rd, so that's two weeks from now. And then right after that, the next week, we'll have a Reltio Google BigQuery connector show and go deep in there. And then just added, a couple of more is Ask me Anything, How to Search my Data with Reltio APIs? I will be posting a question or discussion thread on the Reltio Community, sending it out to you guys to ask your questions so as you think of some today, ask them today, but also think of some in the future. And then on the 15th, Understanding Reltio API Performance. So as you can tell, lots of API calls, literally. So I'm really excited about that. I'm going to stop sharing my screen and let Abhradeep introduce himself and share his screen.
Abhradeep Sengupta (00:02:19):
Yeah. Thanks, Chris. So I'm Abhradeep as Chris already mentioned. I'm working as a senior product manager in Reltio and taking care of few modules like user interface, RDM, search and all that. So I'll start sharing and Dmitry will start the discussion, and we'll take it forward from there.
Dmitry Blinov (00:02:38):
Thank you, Abhradeep. And just to introduce myself real quick. I'm a product manager at Reltio. I've been at Reltio many years, and my area of responsibility is platform core, API and data plan. Okay, how to search my data with Reltio API. So next slide. Today we will talk about our platform and then the value that brings our platform is API first platform and how that impacts the overall kind of streamlines of how we work with the platform. And we will touch base on the infrastructure architecture supporting search, also underlying layer, specifically because that brings the understanding of topics like what is my optimal API to use for specific scenario, and why should I expect certain latency and performance from [inaudible 00:03:36] search APIs? Things like that.
(00:03:38):
And then Abhradeep will walk you through the specific scenarios for API and UI search. And then we'll talk a little bit more about things like searching activities and getting an entity directly by [inaudible 00:03:52] crosswalk. And then we'll talk about searching relations and references and touch base on relationship search, searching by graph APIs like building the tree hops, finding hops or connections for specific profile. And at the very end, we'll talk a little bit about API performance in terms of search scenarios specifically. Next slide.
(00:04:17):
The first introduction, Reltio helps unlock and accelerate value from your data and this is one of our primary goals. Provides 360 degree view by unifying customer, supplier, product allocation, database transactions and interactions. And on the right-hand side here we see the diagram, [inaudible 00:04:43] level diagram of connections, but that's a good reflection of how usually objects are referenced and related to each other in your data model. Reltio provides instant access to timely trusted information across all system. You can see relationships among people, organizations, products, locations, suppliers and so on. And rich customer supply and product insights by rapidly enriching data from external sources. And obviously [inaudible 00:05:13] is a major company and in all of these scenarios and the value that Reltio brings. Next slide.
(00:05:20):
So Reltio is an API-led platform. What does it mean specifically is normally all interactions with the platform going through a Reltio Rest API, even our own UI and our own connectors, they work always through Reltio API. So every time you do a search doesn't matter, this is the type ahead search or you get [inaudible 00:05:46], or you do a filtered search, or you search something in Reltio UI, every time the first layer you hit is the Reltio Rest API layer. Next slide.
(00:06:03):
And in terms of architecture, we can talk about this high level as three layers. So there is a control plane that basically has the console UI and all the control tools. It doesn't have any data or any data processing capabilities. Then they'll be compute plane that doesn't persist any data but it contains all the data compute and processing capabilities like when you do data load, or you extract your data, or you search your data, or you do any API request, that would go through a compute plane. And then the very bottom you have the data plane and this is where you have all your storages like primary data storage where your profiles are stored, and also elastic search index, matching storage, activity, and history storage and other types of storages.
(00:06:50):
And what's important to note here is how the top layer interacts with the bottom layer. So when you do a UI search or you do a type ahead search or you do a filter search, first storage, it'll [inaudible 00:07:04] or use, it will be elastic search index where we do not store any actual data, but instead we search references to the data in the primary storage mapped to the index, to the filtered search results basically. And then you can also request an object directly by [inaudible 00:07:26] crosswalk. In this scenario you go directly to the primary storage, you avoid going to the index, you just get or fetch the object directly from the primary storage.
(00:07:38):
When you search for things like potential matches, you will hit the matching storage that again doesn't contain any actual data. It contains references much, much tokens basically to find data, the primary storage. And when you search for activities, a history in the UI, it fetches this data from activity or history storage specifically. So that is important to understand whenever you do the search internally it might work slightly differently, so depending on the scenarios you need just to give you one idea, if you know the crosswalk or object ID, it makes much more sense to just fetch it directly from the primary storage rather than a filtered search where you will say, well, my entity ID of this or my crosswalk is because you'll hit one less service and it will be more optimal and fast. And that's all I wanted to tell about the high level architecture and how it is applied to the [inaudible 00:08:48] platform use. So next Abhradeep will walk you through the search scenarios.
Abhradeep Sengupta (00:08:53):
Yeah. Thanks, Dmitry. So I'll just minimize and go out of the slideshow mode because I would like to show you on the UI how the search looks like and then some of the API from Postman, right? So there are different searches in Reltio UI that is available. Majorly if I have to divide it in three places like facilitated search, then advance search, and then global search. We call faceted search as a quick search also. And then the user has option to use, save searches and access them through API, right? So if I go and talk about from the UI perspective, so this is the quick filters or the faceted search. So here you can choose entity types and then you can choose one of the faceted attributes from that entity type. And then you can see the distribution of unique values and the moment you select it, it basically hits entity filtering kind of a call. And then based on that the main panel refines the result set.
(00:09:56):
So you have 316 contacts and if you hit Mr, then it'll only show filter out of those three. So that is what quick filters are faceted, works on the faceted objects only. And then you have advanced search. This is one of the most common use cases of the search. You basically are creating a query defining different filters and [inaudible 00:10:18] you can add or union different sets of query and apply on different entity types. And based on that you get back the results set. So that is called advance search. And then the thing at the top, right at the horizontal bar, top right corner, we have something called global search. This is more kind of a type ahead search. So as the user starts typing, it starts throwing the result set. The uniqueness here is that depending on the search term, it starts showing up things from different entity types.
(00:10:52):
So this is in the backend using type ahead search bar across the different [inaudible 00:10:58] types that is present in your tenant. And then you already have an option to save your searches. You can save it for yourself or you can have it shared across different users, and then you have APIs to manage this. So I'll go back to the PPT and we'll show all this in action. So when I am talking about faceted search, so faceted search is basically nothing but the UI. The UI calls this API to get the information that I was showing of like for a prefix, what are the different unique values and what are the count for each of those values? This is mostly used by the UI, and then you can have multiple facets to enable search. So let me get the authorization and then I'll show.
Chris Detzel (00:11:57):
And entity type, an advanced search so that we don't need to select entity type always when multiple entities are enabled.
Abhradeep Sengupta (00:12:08):
Yeah, so we'll go to the advanced [inaudible 00:12:10] search and we can say that we want to search only for this, whether we can make it a default. Dmitry, do you want to answer that? We have an option for making a default, we can define it as a specific entity type.
Dmitry Blinov (00:12:22):
Yeah, the question is Chris, the question is are we talking about UI or API? If we're talking about UI then I think Abhra is right and we have this type ahead search, but we also have advanced search where you can plan any type of search. So the short answer is yes, you can search without selecting the specific type. It can select all types and that definitely yes through both UI and API. But how exactly you can do that, we can probably take a look at this.
Chris Detzel (00:12:54):
Yeah, thank you.
Abhradeep Sengupta (00:12:57):
This is basically the faceted search that I was showing from the UI, so this is the same thing that is used behind the scene. The API gets called and you can see this chooses, okay, I want to search on contact entity type and I want a faceted search on prefix and type. So because I have chosen contact, so this just only gives how many contacts are there, and then for those contacts, how are my prefix value distributions are distributed. So that's what mostly the faceted search is when UI uses it.
(00:13:30):
The next one that I'm going to touch base on is fuzzy search. So fuzzy search is something that we don't have it available directly in UI today, but this is something that a lot of our customers uses, in terms of having the fuzziness through API while searching for an entity. This is helpful for use cases where the user is not very sure about what the exact search term, so fuzziness helps and then minor spelling errors or typographical errors, those kind of issues can be avoided. And so basically behind the scene, Reltio platform uses a certain kind of algorithms and then depending on the search term and length of the search term, it defines the variations of the fuzziness that is allowed and the longer the search term is the more variations that it can be allowed.
(00:14:21):
So for this example, what we are trying to do is basically I'm searching for fuzziness on attribute last name with S-M-I-T, right? So my data has a person whose last name is Smith and I'm not sure how is it spells it, right? So I leave it there and calls for a fuzzy search, so that I can get this correct. So just for an example, I'll go to a different tenant and then I'll do the authorization and then, sorry I have too many here. Yeah, so fuzzy. As you can see in the fuzzy I'm looking again for entity type contact and then I'm saying "Okay, give me all the fuzzy searches where that last name attribute closely matches with S-M-I-T," and then max 200 is the number of records that I'm okay to return with.
(00:15:19):
And as you can see it returns. And then if I go right, the name actually is Steve Smith. So though I did not give S-M-I-T-H like the exact [inaudible 00:15:30] but it returns based on the fuzziness. So that's how the fuzzy thing worked from the API perspective. We have a plan in our roadmap to bring this into UI as based on multiple customers' request. So we'll work on that. As of now, this is limited to API only.
Chris Detzel (00:15:46):
Can I ask a few questions or should we keep going and then ask?
Abhradeep Sengupta (00:15:54):
Yeah, so let's-
Chris Detzel (00:15:57):
Sorry, Dmitry, are you saying something?
Dmitry Blinov (00:16:00):
Abhradeep, after you. Yeah, I can answer those questions [inaudible 00:16:02] so maybe we can quickly [inaudible 00:16:04].
Chris Detzel (00:16:04):
Okay. Thank you.
Abhradeep Sengupta (00:16:05):
Yeah, let's move ahead a little bit. So the next one is the complex filtering based advanced search. So this is by far probably the most used scenario for our customers and they use it for different use cases. So if you are searching from let's say calling this API from a third party application and trying to figure out the list of entities based on a certain criteria, this is the API that gets mostly called out. So it basically allows multiple search operators like the way we form SQL queries, right? You have filter conditions and then you have separate operators. It supports a lot of operators as you can see from the UI, if I go to UI and have this thing. So basically the operators is here, so I can have an attribute here and then I get a option here depending on which attribute I'm talking about.
(00:17:00):
So that is one option. So you have all the supported operators that contain starts with and then greater than, less than all these things. And you can basically combine this with fuzzy as you see in the example at the top, right? You can extend the query that we used last. So we can say a person having last name fuzzily matched with S-M-I-T along with an email that starts with Robin, or along with an email that contains Reltio. So we basically want to know in my customer base how many are closely matching with Smith or S-M-I-T? And then how many of them are basically having an email coming from Reltio organization or they belong to a Reltio organization? So that kind of query we can join this query predicates by and operator and then run the query. And also in terms of, you'll see there we have a crosswalk count. In the advance search you'll get a lot of variations.
(00:18:00):
I'm not going to discuss about each and every variation. We have a list of ... All these are documented in our help portal so you can go and check out different variations. But you can combine basically multiple things together and form a complex query, even like entities having a crosswalk count equivalent to [inaudible 00:18:18], right? So let's see one in action. So I will look for multiple attribute best filtering. So again this is a GET Rest API call and this is the [inaudible 00:18:33] that the environment from in the postman, it's a parameter environment that I'm tagging from. So I'm saying, "Okay, it has to match S-M-I-T as fuzzy last name. Along with that you have to have an email that starts with Robin, right? And then this is the filter with this filter I'm basically, yeah.
(00:18:57):
So as you can see it again returned back all the entities. It is a copy of Steve so we executed a cloning operation on the Steve Smith profile that we saw earlier and then we updated the email to something that starts with Robin. So it basically gives you out another profile which has a last name close to Smith, but then it also uses a email that starts with Robin. So this is probably a very simplistic example, but you can understand we can have all these operators combined and different filter conditions joined together to form a complex query in terms of ... And this is how exactly the advanced search that we see in the UI works, right? So this is the API behind the advanced search.
(00:19:50):
And then we go to global search or type ahead search. So this is the one that we were talking about. If I go to the UI and add the [inaudible 00:20:02] right-hand corner where you are basically typing something and as the user searches there's a dropdown that comes in front and then shows different entity types and different entity instance for all these types. So that's kind of a type ahead search. That's the API again for users, while from UI we call this API and get that results set. Here there are options, there are options to have both POST and GET methods are possible. Reltio recommends using the POST method because then you have to have the condition for the search pasted in the body and that basically when you have a long and complex kind of a example or use case, then it helps to form that query, call that POST API instead of GET. So let's see if I have an example for this one.
(00:21:03):
Yeah. So this is the one that I have in my PPT basically. So again, as you can see what I'm trying to do it here is basically I am saying that the filter is that everything that starts with the word copy. So I'm basically trying to say if you are aware about it, somebody has created multiple copies of a same profile through cloning process and they have named labels that starts with copy. So that's the filter criteria. And then in the select option you have what other attributes that you want to select as part of the return and then you have all the other parameters like activeness is active and then ovOnly you want to search [inaudible 00:21:53] by OV.
(00:21:49):
So all those different parameters, options that you have available, you can get those from, again, from our documentation I have again picked up for the interest of time very simple and straightforward example. So if I click this I'll get all these different profiles. So as you can see some of them have in the label itself have a copy and some of them have indifferent attributes, the word copy. So that's how the type ahead search works. Because I only have examples here in the contact entity side, so we getting most of the entity types returns are basically of contact type, but because it's a part of a global search, if I have copied some organization data, it would've been a mix of record coming from in organization as well as contacts.
(00:22:59):
Now I also showed you some of the options too, saved searches, right? The saved searches is basically a user defines a query and then that's one of the queries that he often executes. So he wants to save it instead of forming it every time. So maybe he has created an advance through different query filters and then he wants to save it and keep it as a personal or he can share it with across all the users. So when you do this things from the UI perspective, you basically can have these APIs get caught, and I'll show some of the examples.
(00:23:43):
So the GET API is basically this will return you all the saved searches that is available in the tenant and then to create, update, delete, you can basically modify or delete anything. I'll not delete, but what I'll show is basically how to get again through a very basic, simple example how to get the list of saved searches and then I'll try to update one. So all this will have a object entity ID and based on that we'll try to update that particular one. So if I go back here, this is basically it says, "Okay, give me all the saved searches that I have." Okay, so I'm again on the same environment, I'm executing this and as you can see, right? This is a URI of saved searches and it says that, okay, this is a name of the saved searches, so save search, try API updated version 3.0.
(00:24:35):
So this is what he is working on and then I'm not going to create or delete anything, but what I'll do, I'll just do a harmless update of one of the search that we have at save mode. So remember this ID and then I'll go to the update save search. Now this is basically a put update, the put method as being called and then I have all the details in the body. If you remember right, this is the ID that I got back last time. So let's say I'm basically updating the name. So one version for the interest of time, I'm not updating anything else but I'm changing the name to version 4.1. And so this is when I hit this and then as you see it returns a response updated version 4.1 for that particular site search. I'll go back once more to the GET call and then I'll try to see if that shows up here also.
(00:25:45):
So this is the updated response. So basically in real time almost when you call the API change something in the saved search and then the next moment you hit the GET API and you see the updated changes. So this is how the save update and GET works. Delete will be similar. So I have delete example also I'll not execute it, but you can give the ID or the URI of that particular saved search and then delete it. Similarly, you can post something to create a saved search, in the body you have to define all the details of that particular search as you get back into the GET response. From there you can form the body, post it to create a new saved search. So this is how the saved search will be defined and then I'll be going to the next one.
(00:26:38):
And I think we have a lot of items to cover. That's why I'm a little bit going fast and we'll take all the questions. I see a lot of questions. So this is the one that Dmitry was talking about, GET by entity ID and crosswalk. These are kind of direct hits to the primary storage, goes there and picks up the data. And then right-hand box we have the options there. The examples that I have given is very basic, straightforward, simple examples. And then you have options which are kind of optional, are not mandatory in terms of most of these APIs, but you can further refine the results that you want to get back. So when I am saying entity ID, I basically give a specific entity ID and the gate call returns the full set of that entity detail in adjacent format, whereas I also have the option to define a select clause kind of thing in the sequel mode kind of a thing.
(00:27:37):
So the second example that I see here, it's basically says okay, "For this entity ID, give me only these three attributes." And this one at the top will give you all the attributes, all the details of that particular entity basically. Similarly, you can have search by crosswalk also. So as we know the uniqueness of crosswalk is basically the value provided by the source ID and the source type of that crosstalk. So these two form mostly the uniqueness of the crosswalk so we can find it out. So let me go into the API first. I'll try to hide those APIs and then I will go to the UI and check that my APIs are working fine and it matches with whatever is then is there in the UI. And that should match, that should tally because UI behind the scene basically using the same logic or the same APIs. So I'll go here and then I'll try to get entity by ID. So this is basically returns the full response of all the entity details. I hit this one and this is basically the URI of that particular entity. And as you can see this one is returning an organization and then all the organization details, trade style, value and everything, organization type. So the entire detail is here. If the application which is using this API does not need all this information, you can use the select format of the thing.
(00:29:22):
Similarly, I have the same example of entity by crosswalk, right? This is the example that I was showing in the PPT. So this is the entity crosswalk value and this is the source type. So this exactly has to match how the source has been defined with the physical name in the tenant that when we create in the console, in the data model and all that. So this has to exactly match that particular source name and then the source value. So if we hit this API, we will be getting back a response and this should be a typical one entity. So as I see this first name is Donette, and then you can see it here, right? Donette and Foller. So the person is Donette and Foller. So it has this ID. Now to tally this against the UI that we have. So I will go to the UI now and look for Donette Foller.
(00:30:20):
So let's go here and then I will see, so this is the global search from here I see that Donette Foller, you go to this person's profile, you go to sources and then this is basically the full form of that search. So that's why I was seeing that how it is defined in a physical name at the source level in [inaudible 00:30:46] data model that is important. And then you can see that this is the value, this is the ID of crosswalk provided ID, and this is the type of the source for this particular crosswalk. So we basically search by this combination and got back this entity ID. If you can see 129207-1. And if I go back to the postman I have 129207-1 and source type is equal to BvD. So this is how the search by crosstalk is working by hitting the primary storage directly.
(00:31:25):
One more thing that I will be discussing after this is search the activity log. So again, when we are talking about activity log, there are options to search how do you search different activity logs? So we have not, again, I'll go back to the UI and show how the activity log looks like. So at the dashboard level we have all the activity logs here. So you have basically all the activities that is happening across the tenant for a specific time period. And then this basically in the UI you are showing this activity log and there are multiple options to get this back from backend to API. So we'll discuss some of these APIs and then the challenges and the performance related stuff. So if I now go back to my presentation. So this is again used by the UI but a lot of time a lot of downstream system or third party applications are interested in terms of more from a governance perspective or auditing perspective what is happening into our master data hub, who is doing what, that kind of questions and at what point of time?
(00:32:48):
So those are pretty important from a compliance perspective sometimes. So this is pretty important API I would say. So we have an option of getting activities. So this is basically get all the activities. It's huge so over the time it can grow really big and that might have some performance issues. So other options, what are the other options? So you have an option to filter activities also and it works like any other API pretty fast. So you can have a filter based on a username. [inaudible 00:33:22] from Reltio what has she done in a definite time period? So there are default time period and then there are time periods that you can define also. And one more thing is from a performance perspective you can also have, if you feel that the performance is not up to the [inaudible 00:33:41] because there are certain limits of 10,000 I think items that gets returned.
(00:33:45):
So you have a option of running it through cursor and scan APIs basically, which returns through cursor. We'll discuss about the cursor based APIs for entity search. But the similar concept actually is applicable for activity log for better performance. And then you also have a activity export API, which is basically here if you see, right? It is the export API. So if performance is an issue for an activity log, regular API, you can always fall back onto a scan cursor-based API or an export API. But for this example in our case we'll show something that is basically a normal filtered activity log, right? So we'll go to Postman again and I'll try to find out activity log API.
(00:34:41):
Sorry, just [inaudible 00:34:46]. Yeah, so this is the one. So what I'm trying to check here is that I was showing here, right? So in my tenant, what are the things that has been done by this person? So that's the [inaudible 00:35:02] query that I have because I know she has done recently something in my tenant and I want to check what are the activities that this person has done. So as you can see we have basically called through rest API GET method and then we can see okay, who is the user? So I get the user back that I sent through the filter and then there is a task ID associated with it. On this date, on this time she has set up some periodic task [inaudible 00:35:32] which she has in this ID, right? And then you see there is only one activity that has happened through Jyoti, right? So this is the detail that is there for Jyoti. Now this is again a very simple filter but you can have a much more complex filter to filter activities here.
(00:35:54):
Next we did discuss about cursor, right? The cursor based filters. So basically this one is something that improves the performance and it basically we call it as a scan API as you can see it here in this, right? So how it works is basically we probably know or understand the cursor. So it's basically in memory kind of a storage where you can have a point at which iterates over a set of results, returns it back to the user and then again keeps an index there and then starts repeating the process. So how it works basically is there are multiple iterations of the results. If the results set is too big, you have a definition through different parameters where you define, okay these are my limited set of records, maybe 100 or 200 that I want to send across in a single response. And then the cursor index basically looped through and iterate through all the results that until and unless it reaches at the end of the cursor, right?
(00:37:03):
This basically is kind of a loopy process, but for the larger results that instead of a single hit, it works better in terms of performance, be it entity API or activities like searching entities or activities. Now how does this work is the initial request, as you can see it here, in the initial request I'm asking that, okay, give me all the entities where it is based out of California. Let's say I'm talking about some customer or a supplier data. So it basically gives me all the records that are based out of California first initial request. And I say, "Okay, give me first hundred records," so what it returns is basically it returns first hundred records and along with that it returns a cursor value. So which cursor I have to hit. And then in the subsequent request body, I think you see the subsequent request, I'm not sending the query anymore.
(00:38:02):
So query is required in the first sequence and then the query is not required whether the request body is not required in the first, but in the iterative process from the subsequent loops I need to send, okay, which cursor you have to go to and fetch this next hundred set of records. And that's where this value becomes important. And the rest going to the hundred and one record, 101 record and then picking up the next hundred records. It's all done by [inaudible 00:38:30] behind the scene. So this basically is the concept of search with cursor and then the first iteration will always have to have a filter, without that the scan request will not work.
(00:38:45):
So this is the entity search at a high level, very high level. So I have tried to cover most of the APIs with a very simple and no frill kind of scenarios and use cases. I have not gone deep into all the different parameters and options that a user has. We have all the details available in the documentation and I can share, I have a list of documentation links where you can get much more details and we can share it through community. And I'll stop it here based on entity search and I'll request Dmitry to go over the rest of the presentation in terms of relationship search and then the new APIs that we are planning and the performance part of it.
Dmitry Blinov (00:39:26):
Thank you, Abhra. Yeah, one more thing I'd like to mention that is activity searches obviously also available through UI, right? We didn't specifically show it but you can do the same type of filtering with activities through UI and find all of your activities in the main dashboard in the UI. So searching relations and references. Now let's talk about, we talked about entities and entity related scenarios, and now let's talk about relations a little bit. Next slide.
(00:39:59):
Okay, so relations search. We have a search, you can enable relation search on your tenant as well. Real quick things to know about this is that requires a separate index. So on the bottom, which here you can see that we already discussed the data plane layer and there are specifically elastic search. The elastic search would contain different indexes for entity index and relation index. For entity relations specifically, they track separately, update it separately, they interact with primary storage separately and you need to enable it specifically on a tenant. It's not enabled by default, it is enabled on demand.
(00:40:58):
You can search relation records up to maximum number of 10,000 items with a single request. That's a similar request to the search API Abhra was walking you through. It's just [inaudible 00:41:13] keyboard entities use relations and you can do the same type of filtering. You can use ordering, you can define the maximum number of items you turned and so on and so forth. You can do select, you can use options, you can do sorting and ordering. And for relation index, basically for relation search the result would be [inaudible 00:41:37] that contains all of your relation objects with their attributes and crosswalks. So that's yeah, high level relation search. Let's switch to the next slide.
(00:41:55):
Abhra, I feel like we missed one slide here in the very beginning. I was going to talk about relationships, maybe not. Let me check real quick we miss anything, No. Okay. It will be later. It's fine. Hops, connection search relations and APIs. So if you think about your data structure and keep in mind the inventory or the one present, you can see on the slide, you have your entities of different types and you have relationships between the entities where the entity would be one entity with start object and another one will be an end object for a specific relation. So when we talk about the type of a search where you need to find hops between two entities or connections between two entities, that's not the type of a index search we are talking about. These APIs work slightly differently. So specifically we have three of them find tree, find connections and find hops. And relationship itself, so like entity, it also contains names and relationship as an object contains name and description, it contains a direction, can be undirected or directed or be directional. It can start and it obviously contains start and end entities, basically the entities this relation is between.
(00:43:27):
And it has also base attributes and directional context. And when you search through your entities and relations for example with the find tree API, which is your tenant, your entity, UI and then on the _tree. And then you can look into our documentation, what is the parameters available there? But basically thing to know there is at first it will find the starting entity object, then it will find its filed object along with relationship between them. For that parent object it will find all the child objects, all of its children and so on and so forth. And eventually it will build the entire tree this starting object is attached to. So it'll be a walkthrough the entities and objects rather than an index search in this case.
(00:44:21):
Find connections. It'll again find the original entity. Then it will find all of its connections or all of its relations to the next level entities and you can define the depths you want to find connections for. So it'll build you a graph of a certain depths. Find hops, work slightly differently. It will find all the hops and it will first find all the objects and all the hops between them. And again, you can define the depths. So for example, you want to go just one level up and one level down or you want want to go level down and then a level left and right from this object. You define this with the depths you want to go with when you are searching for your hops. But the principle for this API says the same. You find the initial set of objects or first starting object and then the certain depths and your conditions that you defined in your API request. You'll find the relationships and objects that build the graph with that original object. We don't have search connections or hops I should say, API search relations is available through our UI, but hops and connections is not. This same API is used in UI when you build the graph view, basically graph view is built based on these three APIs. Next slide.
(00:45:57):
So we continue working in this same direct direction in terms of providing [inaudible 00:46:04] of finding connected objects. And the next API we are going to roll out pretty much between now and the summer is 23.2, which is in June. The next find connected parties, so basically do a index search with filter, object filter conditions and everything. Find the original set of objects. For example, find an individual, then find all the connected parties, meaning that find again for example the contract for these individuals and then find all the individuals attached to that contract. So basically this type of relationship API allows you to define everything, the types of relations you're going to use, the types of entities you're going to use in this search, things like that. So it's universal the example I just provided for a specific domain, but that's universal. You can define any data types, any relation types.
(00:47:03):
You can find connections starting by again search request, even fuzzy search. So you can fuzzy search by phone number, find the person and find all of its connections. You can do a direct search by crosswalk and then again find all of the connections for the additional object. The advantage here for these APIs is that all of that you do is a single API call, so you save a lot of latency. With network latency, you don't need to do a lot of API calls, single API call does a search, you return, you get the full result for the conditions you defined. And then search before save is another scenario we are looking at so basically it's very popular today. It's part of a lot of implementations our customer did. We can do this through various CTL tools, but we don't have a direct API for that. So oftentimes before making a decision whether to save or update an object or not, you need to first do a search and based on the result of the search, do this update, the search before data is something we also plan to roll out. Next slide.
(00:48:13):
So let's talk about performance a little bit and before I go there, just one thing to kind of wrap up. We didn't talk about all types of searches specifically. I'll name the two search for interactions and search for potential matches. We didn't cover them in this session. We can cover them as part of the questions. Again, they're covered in our documentation. They are also popular use cases. But the intent here was to cover specific use cases for how do I search my objects and obviously relations is part of this whole picture. So talking about performance, that's all, again, platform is API led. That's all API, so performance equals a API latency in this case. And how do I understand how to get the best latency and what doesn't bug my latency? Is that a data volume or some other factors? So first slide is about data factors.
(00:49:11):
Data is a major factor for the API latency for the Reltio platform across the board. That is to do with the architecture I provided. I provided one example at the very beginning. In certain cases you go through first elastic search storage and then primary storage. In other cases you go directly to the primary storage. But that's not just that. For example, you have an object, which only has simple attributes and a couple of crosswalks. That would be a simple reiteration internally. Then you have an object which has tens of crosswalks and some of them are reference crosswalks. And say you have five references to five other attributes, when you do a search on this object yet you just simply feed this object or you save this object instead of one [inaudible 00:49:57], it'll be multiples. It's more than five definitely it's like about maybe an end of seven, I don't know, internally 20 or 50 [inaudible 00:50:05] depending on what those crosswalks are and how relations are built.
(00:50:08):
But you'll have to go and find all these other objects, this original object has relation to. And obviously all these are additional [inaudible 00:50:16], additional deliberation specifically in this case and it impacts the latency of [inaudible 00:50:22] a result and [inaudible 00:50:22]. So just an example we have in our documentation, we have all these benchmarks and factors documented. I'll not go through them all but just an example. Every 10 filter conditions will increase latency by 1% on average for your search request. Each 10 additional crosswalk may increase latency by up to 40% on average for the latency. So if you have an object with 10 crosswalks or 30 crosswalks, the latency may differ two times. Each five extra results in the search request may increase latency by 40%. Obviously you do a search and in the search with the filter condition and you find five object, versus you do a broader search and you have to return half of the tenant data as a result. It'll have different latency.
(00:51:16):
Five additional lookups may increase latency of search request by 40%. That is because lookups are resolved by a separate system called RDM, reference data management system. And you need to do obviously, you need to interact with the system, do additional read calls and so on. Every additional single additional reference attribute will increase latency by up to 80%. So easily two reference having objects with no reference attributes versus having objects with two reference attributes on average, latency will be two times different if not more. And you can understand that for example when you model your data and you want to optimize your search, whenever you don't need to search on a specific type of relationship, you have object A and object B and a relationship between them. But you don't need to search on this relationship, you just need it for it to be present to maybe search with the graph. But you don't need a UI search. You don't need to always introduce a reference attribute that represents this relationship, because every reference attribute will increase latency on the [inaudible 00:52:26]. Next slide.
(00:52:30):
Some statistics called sometime ago, but for potential matches, type ahead search, simple search results, GET relationships for facets, and it's all in milliseconds here. And the only purpose to show the slide is you see different sizes of tenants color coded differently here. And you, by looking at this picture will understand there is no direct relationship between the size of the tenant and the latency of the searches. For example, in some cases on May 28, specifically the tenants of just size the 0.3 million profiles, which is a small tenant was performing slower than the tenant with 800 million profiles for type ahead search.
(00:53:29):
That is because the data factors were different there. So data volumes do not impact the API latency on average. That's not a factor. Data model is a factor. Data volume is not a factor. Data volume is a factor for things like export obviously. It is different when you export a hundred million profiles versus just one million profile. But for individual API calls, data volume is not a factor. I'll stop right there and we can spend some time on questions. I see a lot. Thank you.
Chris Detzel (00:54:05):
We're going to get kind of through some of these questions. Unfortunately we're not going to be able to get through all of them. I will ask as many as possible. But what I'm going to do is create a thread on the Reltio Community and post every one of these questions there and the ones we don't get to today, we're going to add those to the ask me anything coming up in a few weeks. So what I'll do is I'll email everyone the thread and then I'll actually post your questions that are asked here on the community and we'll email everybody there. So with that said, let's get started. Can we do relevance based weighted search with weight scoring and API responses?
Dmitry Blinov (00:54:56):
So I think we switched the questions. What was it, the one I was [inaudible 00:55:00].
Chris Detzel (00:55:00):
Yes. The question is, and this is at the very beginning, can we do relevance based weighted search with weight scoring and API responses?
Dmitry Blinov (00:55:07):
No, we don't have that today. No.
Chris Detzel (00:55:09):
No?
Dmitry Blinov (00:55:14):
Something to consider but we don't have weight search, no.
Chris Detzel (00:55:14):
What's the frequency of refresh between primary storage and elastic search?
Dmitry Blinov (00:55:19):
Yeah, it's real time pretty much, but there is a delay of up to 30 seconds. So there is a pipeline, we stream the data from primary storage to elastic search. So you update your profile. First it's saved in the primary storage and then they update the stream to the index. But the original, the first time you published the object delay between updating the index and updating the primary storage may be 30 seconds, as in that is real time. It continues real time, yeah.
Chris Detzel (00:55:54):
Okay. If I select HCP UI, it should stick to HCP and not deselect it. Is that true or?
Dmitry Blinov (00:56:03):
Abhra, do you want to take this one?
Abhradeep Sengupta (00:56:07):
Sorry, can you come again please?
Chris Detzel (00:56:08):
Yeah, yeah. If I select HCP UI, it should stick to HCP and not deselect it. Is that correct?
Abhradeep Sengupta (00:56:19):
I need a little bit more context for that.
Dmitry Blinov (00:56:21):
So I think you mean when you select an UI specific type, right? And then you do search. Yeah, it'll not deselect I think. It's easy to check but as far as I remember it does not deselect so it'll stick to it. Yeah.
Chris Detzel (00:56:36):
In the UI advanced search, will we ever be able to compare one attribute to another attribute? So for instance, attribute one not equal to attribute two?
Dmitry Blinov (00:56:44):
Yes. Very popular request. We don't have that today. We discussed this in a [inaudible 00:56:50] roadmap. Right now it's not in the roadmap for the next two releases.
Chris Detzel (00:56:53):
Okay. For fuzzy search, do we need to set up a fuzzy match rule, or is it out of the box without a need for an L3 change?
Dmitry Blinov (00:57:04):
No need for L3 change. It's also out of the box pretty much. Yeah.
Chris Detzel (00:57:09):
Great. Will fuzzy search return Vyktor, V-Y-K-T-O-R when searching for Victor, V-I-C-T-O-R? Otherwise, it isn't the same as contained search, is that right?
Dmitry Blinov (00:57:26):
Yeah, you can also, well, it goes beyond that. So it will find more than just contains [inaudible 00:57:35], so we can-
Matt Gagan (00:57:37):
Dmitry, this is Matt Gagan with Reltio. I just commented in response to this one. I tested this one while we've been in the session and it did return Vyktor for Victor with those different spellings.
Dmitry Blinov (00:57:51):
Okay. Thank you, Matt.
Chris Detzel (00:57:52):
Real time. So can you control the sequence in which entities are returned?
Dmitry Blinov (00:57:58):
Yeah, for your parameter, you have sort parameter, you can select ascending, descending order for example, yeah.
Chris Detzel (00:58:07):
Okay. Can I control the attributes and the results set? So example, I only need the e-ID name and email in this example.
Dmitry Blinov (00:58:16):
Yes, you have a select option to control that, yeah.
Chris Detzel (00:58:21):
Okay. How can we enter a long list of values, so like a non-reference value in a UI search?
Dmitry Blinov (00:58:29):
You use advanced search for that [inaudible 00:58:31] is the answer, I think, the best is to use advanced search where you can have a number but just you have a builder there. You can build pretty complex long [inaudible 00:58:42] there.
Chris Detzel (00:58:45):
So if the collection of samples could be provided, it would be helpful or another approach to access the API request. So one is my [inaudible 00:58:54] is we're going to get access to this PowerPoint. So you'll have some of that, but I think he's wanting more of a better collection.
Abhradeep Sengupta (00:59:04):
Yeah, so I'll try to share the Postman collection. So I set up the data in my tenant, but that has to be done wherever you will be executing on which tenant you'll be executing this request. I can share the request but the data has to be there.
Chris Detzel (00:59:19):
All right, so I've got this question and you know what, I'm going to have to stop there because time is up. But I promise to get all these questions on the community. Once I get these on the community, the ones we didn't answer today, starting with this really long one, I'll push in there and then we'll start asking those questions on the ask me anything. So that's all the time for today. Abhra and Dmitry, thank you so much. Matt, thanks for answering some questions. Thanks everyone for coming. We had a big crowd. There's lots of interest here and so we'll continue to have these calls, Dmitry. So thanks so much man for putting two more on this. So thank you everyone, really appreciate it and we'll see you soon.
Dmitry Blinov (01:00:10):
Thank you.
Chris Detzel (01:00:11):
All right, bye-bye.
#API
#CommunityWebinar