Find the PPT here: Reltio Integration Hub Tasks Usage and Best Practices to optimize tasks
- Introduction to Reltio Integration Hub:
- Explains that Reltio Integration Hub is a product for building integrations to bring data into Reltio or consume data from Reltio into downstream systems.
- Highlights its key features, including over 1000 connectors and a low-code/no-code user experience.
- Core Concepts:
- Defines key terms like connectors, recipes, jobs, and tasks.
- Explains that tasks are units of measurement for compute resources consumed during job execution.
- Task Calculation and Importance:
- Demonstrates how tasks are calculated in different scenarios.
- Emphasizes the importance of task optimization for efficient resource utilization and cost management.
- Monitoring Task Usage:
- Shows how to use the Reltio Integration Hub dashboard to monitor task consumption, job executions, and recipe performance.
- Demonstrates how to drill down into specific dates and projects for detailed analysis.
- Best Practices for Recipe Optimization:
- Provides several key tips for building efficient recipes, including: a. Using batch/bulk triggers and actions b. Implementing trigger conditions c. Optimizing poll intervals d. Utilizing scripts for complex transformations e. Efficient variable declaration
- Real-world Examples:
- Walks through examples of optimizing recipes, showing before and after scenarios.
- Demonstrates how to handle complex data transformations and high-volume data processing.
- Handling Errors and Failures:
- Explains how to track and handle failed records in batch processes.
- Shows how to use logging effectively for troubleshooting and monitoring.
- Working with Platform Limitations:
- Discusses strategies for handling long-running jobs within platform time limits.
- Community Resources:
- Introduces the Reltio Community Library, where users can find and share pre-built recipes.
What the video accomplishes:
- Education: It provides a comprehensive overview of Reltio Integration Hub, its core concepts, and how to use it effectively.
- Optimization Guidance: It offers concrete strategies for optimizing recipes and reducing task consumption, which can lead to improved performance and cost savings.
- Problem-Solving: It addresses common challenges users might face, such as handling errors in batch processes or working with large volumes of data.
- Best Practices: It establishes a set of best practices for building efficient recipes, which can serve as a guide for both new and experienced users.
- Real-world Application: By using actual examples and demonstrations, it shows how these concepts and practices apply in real-world scenarios.
- Resource Awareness: It raises awareness about task consumption and its impact on resource utilization and costs.
- Community Engagement: By mentioning community resources, it encourages users to engage with the broader Reltio community for support and shared learning.
Overall, this video serves as a valuable resource for anyone working with Reltio Integration Hub, providing both theoretical knowledge and practical skills to build more efficient and effective data integrations.
Transcript found here:
Chris Detzel: All right, Gaurav, why don't we go ahead and get started. Thank you everyone for coming to another Reltio Community Show. I'm Chris Stetzel, and today we have, I don't know if he's a special guest, because he's been on several times in the past, but Gaurav Guerrero. It's a principal product manager here at Reltio.
Chris Detzel: Gaurav, how are you? Good. Thanks. Nice to be here. Yeah, it's good to see you again. I'm excited about this one. So the topic is Reltio integration hub task and best practices to optimize task utilization. So hopefully that's what you're thinking about. So rules of the show, keep yourself on mute.
Chris Detzel: Questions should be asked in chat. I'll make sure to get to them. And or you can take yourself off mute. We do that from time to time. The show will be recorded and posted to the community and I will follow up as I usually do. We have a few upcoming events. One is today's show. And as a matter of fact, Wednesday, we have another show coming up on patient [00:01:00] centricity.
Chris Detzel: If you're a life science company, I'm sure you'll be very interested in this show. We have extra. There are one of our partners to bring some thought leadership around that area. On July 9th, we have a show around unlocking the potential of RealTO Business Critical Edition Enhanced Security and Resilience.
Chris Detzel: And unfortunately on July 11th, Reltio pipeline for Databricks data pipeline for Databricks, we're going to have to reschedule. But just left that on there. We are still going to do it, but I'm just having to reschedule that for later July is my hope. We are still going to have July 18th on improved data discovery with Reltio integration hub for Reltio.
Chris Detzel: So a lot of really good stuff here. We have a couple more that I just haven't put on this yet. We'll probably skip most of August. And maybe look at late August into September to do those shows. And lastly, if you haven't signed up for the data driven 24 data driven 2024 conference register now [00:02:00] as a matter of fact, right here, you can get 200 off if you use community, and that has to be capitalized.
Chris Detzel: So make sure you sign up for that. And I can provide the link in the chat. I'm going to stop sharing, Gaurav, and let you.
Gaurav Gera: Thanks, Chris. OK. Hi, everyone. I'm part of Gera here. I'm part of the product management team here at Gradle. io. Really excited to be sharing. Best practices on the integration have tasks, right?
Gaurav Gera: This has been a very common asks from all our customers as well as partners. And it's been a while that you know that we share this information and it's on me that I'm sharing it. So late, but better late than never. Um, There are a few topics that I won't, I'm going to cover in this, but feel free to stop me, ask questions in chat Chris can interrupt me and ask those on your [00:03:00] behalf.
Gaurav Gera: I'll be happy to answer that. But here's the agenda. I'll give you a brief introduction. integration hub. If you have not. Encountered this capability as part of realto, this was a chance for you to learn more about it. Then we are going to jump into the details of what the tasks are, how it's calculated, why is it important to optimize those tasks because it, obviously links to money.
Gaurav Gera: And then finally, how you can monitor the tasks and if there are any best practices on building recipes, then we'll cover those those as well. Jumping right in Reltio Integration Hub. This product has been out there for for three years now. A little over three years.
Gaurav Gera: And the whole purpose of having this product is for you to build your integrations. Either to bring data into Reltio or consume data from Reltio into downstream systems in an easy, scalable, secure manner, right? And to be able [00:04:00] to do that, we provide over a thousand Reltio connectors. Now these could be platform connectors or community connectors along with a low code, no code UX or user experience.
Gaurav Gera: For you to build these recipes as we call it, but these are typically the integration flows or data workflows for you to move data from one system to another system. It has all those rich capabilities or transforming data, validating the data. And then, if you need to have lots and monitoring requirements, then this.
Gaurav Gera: As a platform that, provides all those those capabilities, right? And the whole purpose over here is to expand the reach of the trusted data that Reltio curates for you. Make sure that all your systems, whether it's analytical systems or operational systems, they have the the accurate trusted data that you mentioned right out of Reltio.
Gaurav Gera: And since it's. It's a platform with where in [00:05:00] citizen developers can build a build the integrations for you. It increases that business agility because now your data engineers don't have to work with IT teams. They can consume the data on their own, right? Just a correction over here, Reltio Integration Hub Offlet is not part of a base subscription earlier it was.
Gaurav Gera: If you have subscriptions of Reltio before Offlet. April 2024, Reltio Integration Hub would have been part of the platform itself, but since April 2024, Reltio Integration Hub is not a part of the subscriptions, right? We can talk about that if there are any specific questions, right? Yeah, I talked about the features briefly you can build your integration flows or orchestration capabilities and incorporate things like validations data transformations, alerts as part of this orchestration, there are a bunch of pReltio connectors available, provides you low code, no code integration capabilities through, through that [00:06:00] point and click user experience tool.
Gaurav Gera: And then the data at rest, as well as at motion is completely encrypted. The platform is certified with SOC and the other security compliance and regulations out there. The core concept and I'll come to the the other concepts as well. The core is that any data engineer or yeah, mostly data engineer would build is something called as a recipe.
Gaurav Gera: You may call it as an integration flow. You may call it as a data workflow within the tool, we call it as recipes. These are recipes on this. The right hand side, in the screenshot, you can see a sample recipe. These recipes have something called as a trigger, which basically means how you want to initiate that recipe, whether it's based on event trigger, on schedule and whether it processes like bulk data or in a batch mode, like there are powerful. Capabilities available on how you can build the business logic. You can connect to either [00:07:00] on prem systems, cloud applications, but has also connected SDK. So for connectors or for applications for which you do not find connectors within our age. There is an SDK available for you to build that capability on your own and then as I mentioned, there are multiple connectors already out just before you decide to build your own connector you may as well go and look for those platform or even community built connectors, right? These recipes can be. Offline we have started working on providing pre built recipes as well.
Gaurav Gera: So you would hear a lot more about these coming out for example, with our DNB data blocks integration real chain integration for Reltio. These are all based on these recipes and we now provide but on top of real chain integration hub. But some of the core concepts that you need to understand before we, go into the depths of tasks and how you optimize or build these recipes.
Gaurav Gera: The first one is the connector, right? So the connector is nothing but a simple [00:08:00] piece of code that allows you to connect Reltio or t integration hub as an application to that source system or that downstream system, right? So think of Salesforce, think of Redshift. Think of any database system, any cloud application, any on-prem application.
Gaurav Gera: You need to provide you need to connect that system with with , right? So that becomes a system to system connectivity. It provides you things like how you want to authenticate. Reltio against that source system or that source system against whether you're using OAuth based authentication, or you're using client based authentication, or, some other basic authentication.
Gaurav Gera: So it provides you how you can connect, how you can provide your authentication. And lastly, it also provides you what all kind of actions that you can consume on those source systems. For example, if you're connecting to, In Realtio, we have [00:09:00] exposed certain set of APIs. So Realtio is built on thousands of, or hundreds of APIs and not all those APIs that are available through the connector, only the most commonly used APIs that.
Gaurav Gera: that you need for you to build bring data into realty or consume data out of realty are provided as part of the realty connector, right? So all those APIs are exposed as in the form of actions. So whether it's a get entity, search entity, or post entities, get match results, things like that, right?
Gaurav Gera: So all these kind of APIs are exposed as part of actions through the realty connector. So I hope you understand what the connector actually does. It provides technical connectivity, it provides how you authenticate, and it provides that basic list of actions that you can perform on top of those source or downstream systems, right?
Gaurav Gera: Then there is this piece called recipe. I explained in the earlier slide as well. Recipe is nothing but that [00:10:00] workflow or the integration flow that orchestrates the data movement from your source system to the downstream systems. Reltio being one of these. Reltio needs to be either a source or a downstream system in case you're using Reltio integration, right?
Gaurav Gera: Then the other piece is. job, right? I'll come to task a bit later. So the job is nothing but the actual run or the execution of the recipes, right? So think of recipes as the design time artifact, which you are building to connect your tool, let's say Salesforce system or HubSpot system, right?
Gaurav Gera: So once you have designed how you want to pick up data from Reltio, you want to transform that data, what kind of validations you want to perform and how you want to push that data into. of HubSpot. So that becomes a recipe. Job is nothing but the runtime version of that recipe. And this recipe gets triggered let's say on an event basis.
Gaurav Gera: So a new record gets created in Realtio. It [00:11:00] triggers this recipe and that recipe then moves that data from Realtio to HubSpot, right? And that movement or that execution of data transfer happens through a job. Think of job as the runtime version of the recipe. It performs that actual data exchange from one system to another system, right?
Gaurav Gera: And then there's this concept of task. Task is, Don't think of task as in, in terms of the, an activity or an action that is performed. Task over here is a unit of measurement for the compute that gets consumed when a job is executed. There are multiple steps that execute within a job. And for those steps, there are tasks getting consumed, right?
Gaurav Gera: And so task is nothing but. that unit of measure of the compute of that resource. Typically when you buy Reltio Integration Hub, you get a certain entitlement of [00:12:00] tasks that you are allowed to consume for, right? So if you were on a subscription plan earlier than April of 2024, you were allowed to use 25 percent of your consolidated profiles as the tasks on a monthly basis. For example, if you have 100, 000 consolidated profiles that you have licensed for with Reltio as the base package, then on a monthly basis, you can consume up to 25, 000 tasks. So that's the that's the entitlement, right? And you can then monitor, how your recipes are executing and how much tasks are those are getting consumed. So that's the basic definition of tasks and how it is being used. And often, these questions come in during the sales cycle or even customers when they start working on recipes, they identify that, their task utilization is too high. They don't understand, what kind of calculations are being performed, right?
Gaurav Gera: So I'm going to answer those questions over here in, in [00:13:00] much detail, right?
Gaurav Gera: So before I jump into, what is the task and how these are calculated, are there any questions around these core concepts of integration? There's one
Chris Detzel: where can we check how many tests are executing from up?
Gaurav Gera: Yeah, I will come to that. It's not a monthly basis. Unfortunately, we don't have that kind of a dashboard, but I can share the dashboard as and it's part of the demo as to how many tasks are getting consumed, how many recipes that you have, how many jobs are getting consumed.
Gaurav Gera: So there's a demo specifically. On that. Perfect. Okay. As I said, task is nothing but a unit of work, a unit of measurement of the compute resource that is being consumed in a recipe job of execution, right? So not everything is a task. As you can see over here from this picture a trigger, which is nothing but it's something that starts a recipe or [00:14:00] starts a recipe job.
Gaurav Gera: That's not a task. If you're performing any kind of logic and the control statements within that logic, like the if condition, the repeats, the stop statements, or even the handle errors, which is nothing but your exception handling. All those kinds of steps within the recipe are also not considered as part of the tasks.
Gaurav Gera: What are considered as part of the task is the actual action that is being performed, actual compute that is happening, right? So any action In the recipe that you perform. So for example, if you're calling a calling an API or performing some sort of a compute. So for example, if you are writing a piece of code in Python or JavaScript and for that you have defined a connector, right?
Gaurav Gera: Or you are, for example, declaring a variable for that matter, right? So all those actions that you perform Within the recipe, they become a task, right? So anytime, when you are defining this, [00:15:00] the recipe, you click on the action in app, make sure, you need to, in the back of your mind, think that this will consume that one task.
Gaurav Gera: If you're calling another recipe from your main recipe, that also becomes a task, right? So I'll, and I'll come to the examples in the next slide, but hopefully this kind of explains on a high level or within a recipe, what is a task or what sections in the recipe will be considered as part of the step, the task and what steps are part of the tasks, right?
Gaurav Gera: So here's that one simple example. So here this is a simple recipe. And what you're looking at is a job execution. So a job has been completed for a particular recipe. This is a scheduled job, not an ideal recipe not something that you would build, but think of this as a sample recipe, an example, so to speak, to, understand how the tasks are calculated.
Gaurav Gera: So it has a trigger. This trigger is a schedule trigger, which runs, [00:16:00] let's say a task at midnight, right? On a daily basis. And what it does, it it reads some data from Reltio based on some id, and then it replicates that data to HubSpot. And then it also updates that, for example, across work in Reltio and then logs that message in a Workato log or a reintegration hub log, right?
Gaurav Gera: If this job is successfully executed and all these four steps are completed, right? So in, in such cases, there will be four tasks consumed. So one, two, three, and four, and within a job execution law, you can see that task is also given over here as to how many tasks were consumed as part of the job execution, right?
Gaurav Gera: And so this is a simple calculation over here, right? And this is for a successful job, right? If there is a job failure. So for example, if the job had failed at step four but step two and step three were successfully executed, even though a job failed, [00:17:00] there will still be two tasks consumed because these two steps were completely completed, but the job failed at step four.
Gaurav Gera: So the, there will be no tasks for step four and there'll be no step task for step five. So in a failed job, also the tasks are getting consumed based on the successful execution of those steps. So this is a simple example. This is more of a different version wherein I've introduced a simple if statement, right?
Gaurav Gera: Now here you can see that this recipe would either consume one task or four tasks, depending on the result of step number three, which is an if condition, right? If the if condition is false. So rest of these steps within If condition is not executed, only step two is executed. So that's why it's step one, but if the, if condition was true.
Gaurav Gera: The job executed step number four, five and six, then the tasks consumed would be four, [00:18:00] right? So depending on how your job executes, how your data is being processed within the recipe, the task consumption can vary, right? And this so that's why it's important to identify or design your recipes in a way so that how many tasks are consumed and you can estimate based on those use cases. Here's another example, right? And this, in this case, it uses a, a for loop as well as an if statement, right? Now what this recipe is doing is it gets, it's getting triggered on a scheduled basis. It looks up for data in an Amazon SQS queue. It processes multiple messages. So it says new messages, so it can process multiple messages and within a loop it receives that one message, processes one message, it parses that message, checks if the event type from that message is correct.
Gaurav Gera: Entity underscore created. In that case, it will go and, fetch the [00:19:00] data for that organization. If it was an organization and then pushes that data into Salesforce as an account, right? So a bit complex recipe and it all depends on the number of messages that are being processed in one job execution that will determine the, you Number of tasks are consumed.
Gaurav Gera: So let's assume that in this case, there were five messages in the in the queue, right? So it would go into this for loop, pass all the five messages, right? And then it found that out of five, only one message was entity and a score created event. So that's why for that one message, it processed the two additional steps of fetching the ID or the details based on the ID and the created that account in Salesforce, right?
Gaurav Gera: So that's why the total task consumption over here is five plus two seven, right? So that's why, the total task is the number of messages multiplied by step one, plus the number of messages multiplied by number two, if step four is true, [00:20:00] right? So that's how tasks are getting.
Gaurav Gera: Calculated and you can see that in the in the job log as well. And the reason we are doing this one message at a time over here is in the center section, you can see that. This message, every message has its own structure. So it has attributes, body, and then some other parameters for that message.
Gaurav Gera: And what we are interested is only the body part. This is where the actual data resides within every message that gets generated in LT and pushed out to queue, right? So this is the reason. that we read every message and then pass it, right? So there's a better way to do that wherein we can utilize the tasks, right?
Gaurav Gera: But so you can understand the downside over here. So let's say there were a thousand messages in the queue and this recipe was processing those thousand messages, then it would have consumed at least a thousand tasks, right? Even if there were no actual processing happening. So let's say all [00:21:00] those events were for entity and this could change.
Gaurav Gera: then this would have paid all the time, right? And all those thousand tasks would have been consumed just for passing. And you would have run this entire recipe for nothing, right? But you still would have consumed a thousand tasks. So highly inefficient recipe, right? And we'll come to that during the demo as well.
Gaurav Gera: So here is a bit optimized version of that recipe. What I've done over here is of, instead of processing the parsing as part of the loop itself, one message at a time, I put that logic in a Ruby script, for example, right? So this. Step number two, the Ruby script does that parsing for the body of all the individual messages that we have.
Gaurav Gera: And it generates a list with just the list of entities that were pushed to this queue. And then I've added one more additional step over here to separate [00:22:00] out the create versus the other events. And then I process only the the creative one. So it's in this case let's assume that it was processing one task or other one.
Gaurav Gera: message, then it would have consumed three tasks, right? But still, this is also working in batch, right? If there were, let's say a thousand messages, it would, and all those thousand messages were changed events, not create events, then at max, only two tasks would have been consumed against a thousand in the previous example of the recipe, right?
Gaurav Gera: So this improves the performance of the recipe as well as, the utilization of the tasks.
Gaurav Gera: So yeah, these are a couple of examples of, how the tasks are getting calculated in different scenarios. So the next thing that I want to do is the question that was asked around, how do you monitor the recipe? So for that, I'm going to jump onto Reltio Integration [00:23:00] Hub, and you can launch Reltio Integration Hub either from the this chocolate bar, as we call it.
Gaurav Gera: You can launch integration hub from here, or if you have a console application within the console application, you will see an example of a link to launch the integration hub, right? And when you do that, you land up on this page. So there's something called a dashboard. I'll come to some of the other pieces as well, recipes as well later on, but this is the dashboard which is where you will be able to monitor.
Gaurav Gera: The recipes that are getting executed, how many jobs are being executed, and how many tasks are getting consumed, right? So by default, it comes to, let's say, last seven days. So within last seven days, there have been 807 tasks that have been consumed. There have been 53 failed jobs and 59 successful jobs out of nine recipes.
Gaurav Gera: I can exchange this to, let's say, last 30 days as well. And this is the [00:24:00] graph that I get with 23 and a half thousand tasks getting consumed out of 155 jobs. Successful plus 53 failed jobs. But if I'm particularly interested on a particular day. For example, on this day of June 23rd, there were a lot number lot high number of failed jobs.
Gaurav Gera: So I can just click on this. I can hover on this and see that there were 38 field jobs. Which is what I'm more interested in, and there were like 300 tasks used. So when you click on this, the in the dashboard, you'll see that the filtered down list of all the recipes that were executed within that day of 23rd June, right?
Gaurav Gera: I can go a previous day and I can see the list of those tasks on the, or rather list of recipes that were executed. Or the PTSD, right? So this is this kind of also gives you that drill down version of. at a recipe level, the number [00:25:00] of jobs, the field jobs, as well as the tasks that are getting consumed in those 24 hours, so to speak, right?
Gaurav Gera: Not just that, right? You can also monitor task consumption based on your projects as well. Let's say you were using the RIH Salesforce integration project, right? So in that case, you just select that project and the, in the last 30 days, the only day that was activated was on. May 27 and it had 18 tasks and eight jobs had been executed, right?
Gaurav Gera: So this kind of also gives you a drill down version of. At a project level or at a use case level, how your recipes are performing, right? And then you can have other filters as well. So I can do, let's say HubSpot. So how many recipes are there, which are consuming or built with integration with HubSpot and what's their task utilization, right?
Gaurav Gera: So I see that on June 19 and [00:26:00] 20. There have been few cases where in the integration with HubSpot has failed for whatever reason, right? And then you can go and check the job details as to, how these what, where these failed and what kind of improvements that you can do on top of these recipes.
Gaurav Gera: So hopefully this gives you a fair idea of how these tasks are getting monitored and how we can do that. But one of the other pieces that is very important for everybody to understand is that if you go to job logs as well, and I've shown this screenshots earlier, you can see the tasks getting consumed in the individual jobs also.
Chris Detzel: Hey, girl. Yeah, quickly. So I assume tasks that ran successfully in the prod tenant are what will be counted and not the test dev tenant
Gaurav Gera: tasks. So the entitlements are for your base package, right? So dev test and production. So all are counted against the entitlements. And that is one of [00:27:00] the reasons that, we are not able to show the dashboards that consolidated for the entire base package, right?
Gaurav Gera: What over here is for your particular tenant only. So if you're in this particular tenant. And the recipes are running in this tenant, then this will show the utilization in this tenant only, right? When it comes to the consolidation piece, this is something that's been feedback that we have received more recently.
Gaurav Gera: We are working with our subprocessor Ricardo on how efficiently we can show. That kind of a dashboard, which, shows that aggregate view of task utilization across the base package dev test and production or the other tenants that you might have within your landscape.
Chris Detzel: What's the best way to restart from failure from a failure point.
Gaurav Gera: Good question. Um, it's not from a failure point. So when you restart a job, it will pick up from the same trigger. So for example, [00:28:00] let's say this one, right? So if I had to restart, Yeah, so this kind of field, because the batch size was limited to 2000. So if I have to start this job, it will pick up from this trigger itself, and it will start the entire execution from start, not just from step six.
Chris Detzel: And one more question when we get to it. So would you recommend using RIH for daily batch jobs, or is it better to use batch some other mechanism using direct Realty API calls.
Gaurav Gera: Yeah, I'll come to that. I have an example. And you can see it also depends on, what kind of use cases that you might have.
Gaurav Gera: But there are certain best practices that we follow, which is the whole purpose of the the webinar. So I'll come to, what are those best practices? And depending on the use cases, how can you can, how you can define those recipes.
Chris Detzel: Great. I have another question, but let's wait to ask that as we dive a little bit deeper in there.
Gaurav Gera: Sure. [00:29:00] Okay. Next piece is now that you understand what task is how to calculate the tasks and why is it that we need to optimize our recipes? What does it make? Why are we telling you to do that? So first of all it's important that your recipes run in an efficient manner.
Gaurav Gera: You get the kind of throughput that you expect within your recipes, right? So if you have optimized your recipes, even if task optimization was your goal, you would have improved the performance, the throughput of those recipes as well, because think of tasks as, a way for you to you know, improve the performance of the recipe, right?
Gaurav Gera: That's the goal of that. We have task is just that stick for you to have that motivation,
Chris Detzel: right?
Gaurav Gera: Efficient utilization of tasks. I mentioned, task is an entitlement that you buy from Realtio, right? So you're basically paying for a task, even if it's on a [00:30:00] 0 basis. So that's why it's important that you consider task usage optimization.
Gaurav Gera: And lastly, since you're improving the performance, you are improving the task utilization, you are also saving the cost because of the. The source systems that you would be integrating those would also be applying some sort of credits, right? If you have a, if you're purchasing API credits from your source systems or downstream systems, you would want to efficiently utilize those credits as well because you are obviously if you are able to do that efficient utilization, you'll be able to save costs as well.
Gaurav Gera: So this is the motivation for you to optimize these recipes and given the different use cases that you might have, you can, define your recipes to efficiently utilize tasks as well as get that performance that you might want to have. So we're going to go through that same set of recipes that we went earlier.
Gaurav Gera: during [00:31:00] the calculation of the tasks, right? So here's that same recipe once again with a higher number. It reads that message from SQS, parses that message, checks what kind of event it was, whether it's a 80 degree change, Things like that. And then gets the data from Reltio using the Reltio ID and then pushes that data to Salesforce, right?
Gaurav Gera: It's not a very efficient recipe, right? For example, if you are processing 10, 000 create events I'm just talking about create events, right? And if you're loading, for example, 10, 000 records in Reltio I'm sure there will be a lot, many events that will get generated. For example, if you are loading, name and address there will be relationship create events, right?
Gaurav Gera: If you already have data within Reltio, then there will be some consolidation that happens. So there will be additional events for entity merge that might get triggered, right? So let's assume, let's keep it simple. If we are creating 10, 000 records in Realto [00:32:00] and it creates, let's say even 10, 000 events in the in the queue.
Gaurav Gera: In such cases, this recipe would consume 30, 000 tasks. based on the calculation that we just saw in the previous slide, right? And it has a potential of wasting a lot of tasks in case those events are not creative inside. So it's a very inefficient recipe, right? If there are no events in the in the recipe, still this recipe gets executed because it has a certain polling frequency, right?
Gaurav Gera: five minutes being at minimum, and you can increase that polling frequency. So let's assume that you have put in a polling of 30 minutes, right? So every 30 minutes, it will go and check for new messages. If there are no messages, this recipe will still execute. Although nothing will happen, but there will be still a job getting created.
Gaurav Gera: No tasks will get consumed anyways because step two will not get executed, right? But still you'll see a [00:33:00] blank, a lot of blank jobs getting generated. And step number four, it's highly inefficient way of calling the API step number five and six will consume a lot of API credits as well as bring down the performance of this recipe.
Gaurav Gera: So even if you have to optimize this recipe a bit, the first thing that I will do is in step one in the trigger, I'll put in an object filter, um, which will check if. that the job should start only if there are messages in the queue, right? Otherwise, just ignore, don't start the job, right?
Gaurav Gera: And since I'm processing multiple records, so I've, what I've done is from the previous slide is here, I was doing for each item in the message. So every message, what was getting processed, Now I'm doing a batch of messages, right? Let's say this recipe was executing to 2000 tasks. Although it [00:34:00] says 2010 K a recipes p would in this case process 2000 task or read 2000 messages from the SQS, I'm sorry.
Gaurav Gera: It'll read 2000 messages from the SQS, push down into this batch and then run in a batch of, whatever you have to find. Let's assume 200, right? So as you pick up those 200 messages, pass those messages in bulk, because as I explained earlier I've put this. Ruby script that, you know, has this logic of looping through every message and then putting finding that event and the data, right?
Gaurav Gera: It splits that into different events, like create and change, and then processes that. The other thing that I've changed over here from the previous recipe was I was creating here one account at a time. Here, I'm creating accounts in batches, and the batch definition depends on the the source application that you are connecting to.
Gaurav Gera: In this case, Salesforce allows for 200 accounts to be created in a batch at max. So [00:35:00] that's why I here in step two, I had to do a batch size of 200, right? So in this case for for an event with 10, 000 or for a queue of 10, 000 events, there will be five jobs that will get executed because it will read 2000 messages at a time.
Gaurav Gera: It will consume 30 tasks. Which is much more efficient than the previous one. So earlier it was 30, 000. We brought it down by a factor of 1000. There will be no jobs that will get created. The data will be processed directly from the queue. So the other piece that we changed was You know, we were not processing the data from the tube, right?
Gaurav Gera: So one of the best practices that we tell from Railtube is always process the data from the tube itself instead of doing an additional getEntityById from Railtube, right? So this is the other These we're processing the data from directly from the queue itself, [00:36:00] and then the batch way of working, right?
Gaurav Gera: So since Salesforce allows 200 records, if you're connecting to HubSpot, they might have a different batch size. If you're loading data into Reltio, we have a batch size of 2000, so you can load 2000 records in a go, in a single API call or in a single action. So the batch size would differ based on the source application that you are connecting to.
Gaurav Gera: So this is another version of the same recipe with some task improvements. And then there was a highly optimized recipe with for performance as well as tasks. So for the same number of events, And this would just consume 12 tasks per job. So a bit better from 30 per job to 12 per job.
Gaurav Gera: There's only one wasted job over here. That is because I'm doing the parts and the splits often of, combined step three and four into one. recipe to save that one task. So in this JavaScript itself, the parsing happens and [00:37:00] the split happens. And then I check for those events and then process that in a batch of 200.
Gaurav Gera: So if you had, let's say 10, 000 2000 are being processed in one job. So 2000 so there'll be one task consumed for parsing that. Then about 10 tasks within this loop to process or create accounts and batch software of 2000. So that's why 12 tasks are getting consumed, right? Rest of the things are are the same.
Gaurav Gera: You are processing the data directly from the queue as well as utilizing the batch APIs, right? So in summary these are some of the best practices that you should follow based on what I just mentioned, right? So always. Use batch or bulk triggers, like every connector that you connect to.
Gaurav Gera: So whether it's SQS, whether it's HubSpot, Salesforce, or any other, source system, always use the batch or the bulk triggers to make sure that, you are efficiently utilizing the number of [00:38:00] events from that. So this is step number one, step number two, always use a a trigger condition, right?
Gaurav Gera: So if you are reading data in batches. You don't want to start those recipe jobs in case you get data, which is not supposed to be processed by the recipe, right? So only if you have the data, for example, to be processed by the recipe use these trigger conditions, right?
Gaurav Gera: And these are defined when you create the recipe, define a trigger. So for example, in this case, I'm processing the SQS. Messages only if the messages are available. Otherwise the recipe will not start. The other piece of advice over here is. If possible, or if your use cases permit you, increase the poll interval, right?
Gaurav Gera: If you're using, for example, the SQS queue or Azure PubSub, Azure queues or Azure topics, I should say Google PubSub, these allow you for [00:39:00] a minimum pole frequency of five minutes, right? And with a maximum of, let's say one hour in days or in weeks as well. So if possible, make sure that if you're using SQS, you have an efficient way of defining the pole frequencies.
Gaurav Gera: It may happen that not Every use case is a real time use case or not. Every use case is a near real time use case. The data transfer can wait for an hour, for a day. So in such cases, you can define your poll frequencies, right? So this will also bring down the number of tasks that are getting consumed, as well as make sure that you are efficiently utilizing API calls, as well as you have a performance integration.
Gaurav Gera: Something else is similar to actions, always use the bulk, sorry, similar to triggers, always use batch and bulk action, right? Do not create, for example, one record at a time in RelTio or [00:40:00] in Salesforce or in HubSpot, things like that, right? Highly efficient, inefficient way, especially when you are looping through a list of values.
Gaurav Gera: Always use the batch mode and create those records in batch. Most of the connectors provide those batch actions for you to utilize that. And then you can save a lot of tasks by a factor of the batch size itself. So if you're having a batch size of 2000, if you had processed one at a time, you would have consumed 2000 tasks.
Gaurav Gera: But if you do a batch and send like all 2000 in batch, you're just consuming one task. So you're bringing down the task consumption. Last, before I stop for the for the questions is use scripts. Within RIH, we provide multiple ways of building scripts, so you can use JavaScript, which is what I use, there's Python, there's Ruby, it's whatever your preferences, you can use those scripts for complex transformations, [00:41:00] for bulk parsing, as you saw in the earlier case, wherein we were parsing the data from the queue.
Gaurav Gera: We used the Python script as, sorry, the Ruby script as well as the JavaScript. In this case, there's a complex transformation that I'm performing, and I'll come to this use case as well. The transformation here is normalizing the email addresses, or if you have additional use cases of normalizing phone numbers. Utilizing some of these JavaScript or Python script, improves the performance as well as, your task utilization. If you have knack of doing some programming highly recommended to use these scripts. And this is obvious, but we have seen customers, partners building recipes in which they have defined the variables in multiple steps.
Gaurav Gera: Right? Which is a big no. It unnecessarily wastes tasks, right? There is a preferred or a better way to define all the variables in just one step. And [00:42:00] then, it just uses that one task for all your declarations. But here, if you have 10 variables to be utilized, you have consumed 10 tasks unnecessarily, right?
Gaurav Gera: Yeah, small things, but, helps when you look at the larger picture, right? Okay. So I'll stop over here before I talk about some example recipes.
Chris Detzel: Yeah, a couple of questions. And by the way, like tips and tricks. I love that, Gaurav, and we'll definitely maybe write a blog about that so people can see it, quickly, I have I might have missed this, but is there a way to see the number of tasks per task step in the recipe to help better identify where, how to optimize? Maybe you talked about
Gaurav Gera: that too. Okay. Here's an example, the same recipe that we have been talking about reading the data from a queue in ReLTU and processing that, pushing that data into ReLTU, right?
Gaurav Gera: Okay. So this recipe consumed overall nine tasks, right? And At every [00:43:00] step, if you go, you can look at what was the input, what was the output and things like that. And for example, in this case, you can already see these steps were successfully executed, right? And if I go over here, it tells me that this was a batch size of 200 and there were like 1483 was the size.
Gaurav Gera: Okay. So this would have executed 1483 divided by 200 times. So whatever it is seven or eight, right? So this would have consumed eight tasks. And then plus one step for the sun, right? So every step typically has one task to be consumed. Not more than that. It's only the number of times that you execute that step will, will basically calculate how many tasks are consumed within the recipe, right?
Gaurav Gera: So in this case, since this Stepford executed on how many times, nine times or eight times. [00:44:00] We know that this was eight plus one, nine, make sense?
Chris Detzel: I think so. So what's going to happen if one or a few messages are failed to load and like Salesforce with the batch size of 200, how will we know which needs to be reprocessed?
Gaurav Gera: Very good question. Can I then take a segue into the recipe that I was going to show as an example, because I have that exact. piece answered in the other recipe, right? Before
Chris Detzel: you do, can you answer this one question? Yeah. Just quickly, are users charged by number of jobs, tasks, and not by time of their execution?
Gaurav Gera: So they are charged based on the tasks, right? So our entitlements or the licenses that we provide for the integration hub is solely based on the tasks that you consume out of your entitlements. Thank you. And do, and then the [00:45:00] entitlements that you get, I would prefer that you talk to your account executive or your CSM to know exactly how many tasks are you're entitled for, because it depends on what version of the licensing you are on.
Gaurav Gera: Talk to your account executive or CSM to get the exact number of tasks that you can consume on a monthly basis. Perfect. Thank you. Any other questions before I jump to that? That's it. Okay, so what I'm going to do is I'm going to take an example of a very common example wherein we are loading data from from an S3 or from any cloud storage, right?
Gaurav Gera: Here's this sample file, which is nothing but data for contact persons, right? It has first name, last name, rest of the other attributes along with address. The problem with this file is if you look at column E it's called Programmatic Business Orders. So it has multiple addresses over there, same as the case with mobile numbers, same as the case with direct numbers, personal phone numbers, right?
Gaurav Gera: [00:46:00] So this becomes a highly complex file to be processed because I don't have one value for every for mobile number or direct number. I have to process and. Transform that data. For that one column into a list of values, and then use that list of values as part of the nested value when the record gets created, right?
Gaurav Gera: So it's a complex transformation, and we have seen this for a lot of processing, right? The other piece is this was an extremely large file. It was one GB data that we had to process, right? This was just one file. Basically, we had several similar files. And with such complex transformation, right?
Gaurav Gera: So what we did was we, we used something called as the Wacato SQL transformation, right? So that's, or rather Wacato file storage, I should say. It's the same connector. What it does is So in [00:47:00] this recipe, we download that entire file in a in a storage, right? And then we basically do some calculations so that we can process the data in, in, in in batches, right?
Gaurav Gera: So what we do, let me jump onto the recipes execution itself, right? So what we do over here is we declare certain variables. offset the count is basically the batch size I want to define, the total records in the file, which gets calculated in step six, right? And then once in step. At seven, I get that there were, in this case, 25, 000 records, but the actual recipe that we executed had more than 1.
Gaurav Gera: 5 million records, right? Including that, the complex transformation that we had to do, right? And then we defined the number of pages based on the batch size, which was 1, 000, in this case, and then we run this this recipe. Or in a loop with the batch size and [00:48:00] then doing some sort of pagination and calling this function called function load individuals.
Gaurav Gera: The reason we did this was there are certain guardrails in the in the platform. One of the guardrails is that if your recipe is running more for more than 90 minutes, it will stop, right? So to avoid that 19 minutes timeout, we. Had these long running actions or other long running recipes, I should say, which would divide this data, the Huge volume of 1.
Gaurav Gera: 5 million data into smaller sizes, and then call subsequent recipes, which is this call function does, right? So this is also a recipe that is being called from within this room and we have a higher frequency over here. So for example, five recipes will get started at a time, right? So this kind of a.
Gaurav Gera: Recipe, if you're building or using, it will run for hours, which will not [00:49:00] stop within that 90 minutes of guardrails, right? And then this individual recipe, the call function processes that batch of 1000 records, pushes that data into Relty, because here we are reading the data from from a file.
Gaurav Gera: We were creating those records in a batch size of 1000. Appreciate it. So second, and what I've done is I've launched the. The the job, which was executed for this particular step. And this is where I will show how the transformation was done. So here there was a question on there as a batch and just certain records that get failed.
Gaurav Gera: So this is where we handle that. So we create this empty list. It calls a response list and we will fill in this response list with the results of the batch basically, right? What's happening over here is now we are processing those records in a batch of 1000. We [00:50:00] create the, those records in Realtio, right?
Gaurav Gera: And you can make this even more efficient. I'm not saying that this is the best way of defining these recipes, right? And once this definition creation is done, we add the response to to this list that we created, right? So there were 992 records that get added and I'll show the response and then we normalize that data, right?
Gaurav Gera: So in this case, we normalized, there was nothing in this case, hopefully there was something, nothing, right? But this JavaScript would would normalize that data for the email, the mobile number, the direct number. And then finally what we would do is log those messages. So in this case what I've done over here is if you see the response list, that was earlier blank.
Gaurav Gera: This has those. values wherein I've noted what was the source, the, what was the crosswalk value? If it was successful, then what was the URI? And if it successful was failed, then what [00:51:00] was the, so this is how you can track. The if the records were successfully created within a batch or if it was a failure, and what was the reason for that failure along with the specific record that got failed.
Gaurav Gera: So hopefully that, that kind of answers your question, but I can quickly show that simple transformation logic if that was of interest which was Yeah. This is where we are adding the response of that batch creation, and we capture the source crosswalk, the URI if it was a crosswalk value, what was, whether it was successful or a failure, and the error message if it was if at all it was that, right?
Gaurav Gera: And the other piece that I want to show quickly is we are adding this to the logs as well. So this was also something that we recently did to send to Ocato logs. And when you do that, you'll be able to see those in one place in the same log as well. [00:52:00] So here's that same logic that we saw, right?
Gaurav Gera: So this is getting created based on the logs that you are creating over there, right? So if I have an example somewhere, I don't, I don't have an example with an error, But the error messages that are getting locked can also be seen in this consolidated piece. So hopefully you got a fair idea as to what that recipe does, right? And if you're interested in looking at all these recipes, those are already available in our community library.
Gaurav Gera: So you can go to the community library, search for Realtio and you'll get that list of those recipes. So here's that recipe of the high volume streaming SQS. The optimized version that I showed, the the recipe that I just showed with, which uses the avocado file storage is also available for you to, you can just go over here, click on this, and then you'll get this button called use this [00:53:00] recipe.
Gaurav Gera: If you click on this, it'll be available on your projects as well. And then you can make it your own and do whatever you want to do with these recipes. Thanks.
Chris Detzel: Great. Just two more questions and then we'll be time to go. So batch size limit is 2000 on one of your demo things. Why are we using a thousand?
Chris Detzel: Wondering what's the logic behind that decision?
Gaurav Gera: To be on the safer side, sometimes it might happen that the batch size is 2000 is. Okay. But if you have a large volume of data the API might still time out, right? The API also has its limit. And if the even if the record number of records is low, but the size of those records is too high it still might get timed out, right?
Gaurav Gera: So just to be on the safer side, I used. 1000 because I know my record size would might get increased because I was processing those or creating those nested attributes out of mobile number and the [00:54:00] other multi value
Chris Detzel: attributes that I had. Does writing to the Ricardo log would that be counted as a task?
Chris Detzel: Yes. Yeah. Okay. And what's the API limit to what you were just talking about?
Gaurav Gera: Depends on the source system. I think for realTO it's 180. Seconds or something like that. I might have to check and get back, but if I am not wrong, it's 180 seconds.
Chris Detzel: Okay. All that's all we have time for.
Chris Detzel: Cause we hit the hour. Exactly. So thank you everyone very much. And I know it looks like some more questions are coming in. If you could post that community. Reltio. com I'll make sure that Gaurav answers any of those questions that you have. So it's two or three came in, please please Take the survey at the end.
Chris Detzel: I'm looking at putting a whole list of things that you say to cover for the rest of the year for shows and then this PowerPoint and this entire show will be it's recorded and we'll put, push it out to [00:55:00] community. I'll make sure to give everyone the information by tomorrow, the next day, for sure.
Chris Detzel: So thank you, Gaurav. Thank you everyone for spending time with us again and until next time, which You know Wednesday, maybe if you want to come to the patient one. So thanks.