It's fair to say, that the Reltio Professional Services (PS) Team loves Reltio’s Integration Hub.
Since Reltio partnered with Workato, we have developed (through customer engagements or just for fun) upwards of 25 different RIH recipes, a lot of which are available for use with minimal changes corresponding to customer's requirements and used by our PS, SC, Partners and Customers as part of their implementations and beyond.
We have developed recipes for:
-
Exporting & Importing Data
-
ETL
-
Reference Data Management (RDM)
-
Unmerging
-
Custom Reports which provide information about
Throughout the development cycle, we have tuned these recipes taking into account their task consumption to ensure that they are designed optimally for the customer requirements (functional and non-functional). So I'm here to share those best practices which we follow, so you can benefit from them too!
Firstly let’s understand the Major Factors of Task Usage. There are many different reasons why task consumption for a given recipe can rise. It boils down to three main factors that broadly impact the task consumption of any recipe:
The number of Jobs
|
The number of events per Job
|
The efficient processing of Events
|
E.g. Say you work in the sales org of a business. For every new customer opportunity, you are running a job and performing a particular action, which takes about 15 tasks each time that job is run. If you had about ten thousand customer opportunities being processed each day, those hundred would be a multiplier for the number of tasks consumed by each of those jobs - This becomes task-intensive.
|
E.g. You are reading a CSV file from your S3 bucket (other buckets are available) and then processing a couple of actions in HubSpot (other equally lovely CRM solutions are compatible with RIH too). Whether each event is processed individually or as a batch can have a multiplier effect on the total task count for a job. Additionally, keep in mind that data synchronisation or scheduled jobs typically process more than one event per trigger event. - This can also become task-intensive
|
The way you choose to implement processing logic can impact the efficiency of processing events and the overall number of tasks consumed in every job. Using the right tools can provide a balance between development complexity and task optimisation. - There are many tools in your arsenal here, PS often use Python, Javascript or Ruby which adds a little complexity when building the recipe, however, it improves the task usage massively.
|
Let’s talk about Task Optimisation - Now from the above section, we understand how tasks are being consumed, and the different factors affecting overall task consumption for each job, there are a couple of approaches you can use to automate the process of identifying recipes in need of optimisation within RIH, so you don’t have to do so reactively. By using the approaches below to assess different high-consumption recipes proactively, you can flag certain recipes as potential candidates for optimisation. Once you've identified those recipes, you can start applying the optimisation strategies that we’ll discuss to bring down the overall task consumption to acceptable levels.
Which ones need some fine-tuning? - There are two ways you can identify your high-consumption recipes, the Predictive approach and the Reactive approach. First, we will look at using a predictive approach, which is typically used when you don't have a lot of historical data to reference.
The Predictive Approach
In this example provided by Workato, you’re deploying new recipes into your production environment, transitioning from a previous batch process to Near Real Time (NRT). Consequently, it will take several months in production before trends indicating high overall task consumption become apparent. The predictive approach will start extrapolating data based on the past month’s usage within one or two months of starting the recipe. This can provide a prediction of annual task usage, helping you decide whether to consider that recipe for optimisation.
This recipe uses a RecipeOps connector to list all the active recipes in the workspace. It then uses a simple JavaScript to extrapolate new data based on the historical data of those recipes, which is subsequently output to a CSV file. Here's an example of what you might see when you output the data:
As you can see there are several recipes here with huge counts of Estimated Annual Tasks - these would be the ones to look at Optimising first.
The Reactive Approach
The reactive approach to identifying recipes for optimisation can be used once you have a decent amount of history for your recipes. To implement this, follow these steps:
Focus on the lifetime task consumption metric to identify recipes to address first.
|
Use meaningful periods, such as three to six months, for comparing usage across recipes.
|
Factor Seasonality in your calculation - are you likely to see more customer opportunities for example at specific times of the year?
|
Consider any upward trends in task consumption during the process to identify if any automation is showing periodic spikes or gradual growth in task usage.
|
Prioritise your list - Once you have identified the various factors contributing to overall task consumption for different recipes, you can start to assess and analyse those recipes for business needs and determine the most appropriate strategies for optimisation. Next, categorise them by business criticality. At a high level, every recipe can be classified as either automation that serves real-time or batch/deferred processing needs. Determine if any automation can be changed from near real-time or real-time processing to deferred processing and batching options. As an example, it may be possible to convert a recipe with a five-minute polling interval to an hourly or daily sync job without affecting productivity.
How to optimise
Once you have data and an understanding of the different factors contributing to overall task consumption for different recipes, you can use this information to choose an optimisation strategy. Let’s take a closer look at the three categories of optimisation strategies.
Trigger Strategies: Manage how automations are triggered to control the number of jobs for a given recipe.
Trigger
|
Optimisation Strategy
|
Description
|
When to use
|
Efficacy
|
Complexity
|
Trigger Conditions
|
Use trigger conditions to filter only the events that the recipe should process.
|
Trigger conditions are additional rules that determine which events should be selected for processing. For example, process only closed-won opportunities.
Trigger conditions eliminate extraneous jobs. No task will be consumed if conditions are not met.
|
Need to process only certain records that fulfil redefined criteria based on event data.
|
Low - Moderate
|
Low
|
Batch/Bulk Triggers
|
Use batch triggers when
available to process
multiple events in a single job.
|
Batch triggers can process hundreds to thousands of records together in a single job. Using larger batch sizes will help cut down the number of jobs needed to process the same number of records and hence the number of tasks used.
Combine batch triggers with batch actions in the recipes when possible for substantial performance gains.
|
Data sync. use cases to process a high volume of information for improved throughput and task savings when the source system supports batch/bulk actions.
|
High
|
Low
|
Polling Frequency
|
Set the polling interval to a higher value. Triggers also support setting a custom interval value.
|
Most RIH triggers are poll-based triggers. The default polling interval occurs every 5 minutes.
Using lower polling intervals will create more jobs resulting in higher task consumption, especially in cases where additional pre-processing logic such as data enrichment is required.
|
The source system does not have frequent new or updated records, and/or use cases are not time-critical
|
Moderate - High
|
Low
|
Batch Strategies: Implement batching techniques to optimise the processing of multiple events in a single job.
Trigger
|
Optimisation Strategy
|
Description
|
When to use
|
Efficacy
|
Complexity
|
Batch/Bulk Actions
|
Use batch/bulk actions when available to process multiple events in a single step.
|
Batch actions can process hundreds to thousands of records together in a single action. Batch actions with custom batch size can lead to significant task savings
when repeat actions are required to manage the target system limitations.
Instead of processing 1 record (1 task) at a time, you can process X record batches (counts as 1 task). This would reduce task usage by a factor of X.
|
Data synchronisation use cases to process a high volume of information for improved throughput and task savings when the target system supports batch/bulk actions.
|
High
|
Low
|
Filtering Events
|
Use action-specific filtering criteria such as WHERE conditions or query parameters in APIs to limit the number of events returned from the search/list actions.
|
Recipes often require only a subset of data based on specific business logic or dynamic request data in the trigger event.
Similar to filtering events in triggers using conditions, applying filtering logic in the search or list actions can limit the number of results returned by the business applications only to the required records, eliminating any additional filtering steps in the recipes.
|
Need to filter batch of events/list input from Search/GET actions based on specific criteria before using them in the recipe.
|
Moderate - High
|
Low
|
Delta Sync
|
Process only the changed records instead of the full data set.
|
Delta synchronisation processes only changed data (new or updated) since the last successful job execution instead of processing the complete record set every time.
Use identifiers in source data, such as the last modified date field, to read only the changed records. This approach will ensure improved data consistency, efficient task consumption, and reduced load on source and target business systems.
|
Data synchronisation use cases that support changed data identification in the source system. Refer to database design best practices for improving performance and reducing load on all systems.
|
Moderate - High
|
Low
|
ELT Pipeline
|
Use ELT pattern instead of ETL for complex data transformations
|
ELT (Extract, Load, and Transform) is an alternative approach to ETL for loading data into the data warehousing platform and then applying the transformation Logic.
ELT pattern support improves efficiency by taking advantage of purpose-built data warehousing platforms for high-volume data processing while recipe provides orchestration and automation logic.
|
High-volume data processing requirements for modern data warehousing solutions such as Snowflake
|
High
|
High
|
Streaming
|
Use streaming actions to automate large file transfer management.
|
File streaming is the concept of reading and writing a file in smaller parts (chunks) in sequence. A typical example is transferring records from a shared file system (SFTP) to a file hosting platform for analysis (Amazon S3).
When transferring a file from a source app to a destination app, Workato splits the file into smaller chunks and
downloads them. These chunks are then uploaded to the destination app in separate requests.
|
File management/analytics use cases where both source and target systems support streaming actions
|
High
|
Low
|
Recipe Efficiency Strategies: Use alternative methods to implement business logic in recipes more effectively.
Trigger
|
Optimisation Strategy
|
Description
|
When to use
|
Efficacy
|
Complexity
|
Formula Mode
|
Use the inline formula for performing data transformations instead of using repeat actions with lists.
|
Most business applications accept and return complex data structures. These data structures can broadly be classified as JSON objects or arrays.
Formulas allow users to work with and format data easily. Formulas in RIH are whitelisted Ruby methods. Mapping data in formula mode will enable you to handle complex data structures and count as just one task per step.
|
Recipe logic requires performing data validation, data cleansing, data extraction, data type casting, and/or data transformations.
|
Moderate - High
|
Medium
|
List Processing
|
Define target list schema and use dynamic list mode for implicit looping of existing lists
|
RIH supports implicit looping of existing lists in dynamic list input mode. Workato will automatically loop the input list and apply all configured data pill transformation logic to each element.
Use either the Mapper function or Accumulate List actions to use dynamic input mode. For Mapper actions, you can define the target list schema using Common Data Models from the tools menu. For Accumulate list action, edit the schema directly in the recipe to define list item fields as an array.
|
Need to apply data pill level transformation to all the elements in an existing list before using it in other actions.
|
Moderate - High
|
Low
|
Collections
|
Use Collections in RIH for complex data processing.
|
Collection in RIH uses an in-memory database to allow you to work with data from multiple sources.
You can create lists and query them using standard SQL syntax, using common SQL keywords like WHERE, GROUP BY, and JOIN to manipulate data from tables into your desired format.
|
Complex data processing requirements such as merging/enriching data from different sources, applying transformations to a batch of events, aggregating data for a summary view, etc.
|
Moderate - High
|
Medium
|
Custom Code
|
Use custom code for implementing complex business logic and data processing.
|
You can perform compute-intensive processing more efficiently with a few lines of code. Examples of such complex processing include flattening hierarchical data structures, filtering lists, transforming object maps, or running nested loops.
Write and execute custom code in recipes using popular languages such as Ruby, JavaScript, and Python. Each execution of such custom code is counted as 1 task.
|
Need for complex processing such as hierarchical data structures, high/nested cardinality, custom parsing logic, semi-structured or unstructured data processing, filtering lists, etc.
|
Moderate - High
|
Medium - High
|
Variable Grouping
|
Use single Variables by RIH step to create multiple variables.
|
When combined with frequent recipe execution or high-volume data processing, multiple steps to create variables can quickly add to overall task usage.
|
Apply this strategy to optimise recipes to group related variables in a single step
|
Low
|
Low
|
Variable Declarations
|
Declare and define variables close to where they are first used.
|
Recipe steps such as declaring and defining variables count towards the task usage. Often multiple variables are declared upfront for use inside a conditional block that is never executed.
For frequently executed recipes, such task consumption can add up quickly. Defining such logic inside the conditional block can help avoid any unused variables in the overall task count.
|
Recipe logic with variables for use inside a conditional block.
|
Low
|
Low
|
Response Caching
|
Use cache response for implementing API GET requests.
|
RIH can cache API responses for GET request endpoints. You can specify the time in seconds (up to 3600 seconds) that the API response is stored in the cache before being refreshed or deleted.
RIH will use a cache response for any request matching the cache key in the client request. This approach will avoid triggering API endpoint recipe and consumes just 1 task.
|
Need to implement API endpoint for frequently accessed data or multiple concurrent requesters requesting access to shared data.
|
Moderate - High
|
Low
|
Custom SDK
|
Use RIH connector SDK to implement complex automation logic.
|
RIH’s custom SDK connector allows you to write business logic using Ruby while providing a consistent low-code/no-code approach for using them in recipes.
The Custom SDK connector also supports advanced capabilities that can help optimise task count for commonly used recipe logic.
For example, multi-step action to get asynchronous and long-running Google
BigQuery results in a single step is counted as 1 task.
|
Need for implementing complex automation logic such as asynchronous processing with polling or special Ruby methods supported by connector SDK.
|
Moderate - High
|
High
|
Reusable Recipes
|
Optimise a recipe and reuse it as a regular function.
|
RIH allows you to call recipes from inside other recipes so that regular functions can be used in multiple recipes.
|
When you have a complex recipe which could be modularised and existing functions incorporated to improve development efficiency.
|
Moderate - High
|
Low
|
Please note that the above strategies shared are adapted from Workato Best Practices based on the same topic
Monitoring and Tracking
You’ve identified, assessed, and optimised your high-consumption recipes, but imagine having a dashboard where you can monitor task consumption trends monthly. This would allow you to see which recipes are experiencing periodic spikes and use that dashboard to determine why those spikes are happening and how to fix them - well you can!
Monitoring Task Consumption
|
Monitoring at the Job Level
|
Task consumption can be tracked in several areas within RIH. You can monitor task consumption for a specific job, recipe, or at the account level. You can also include task consumption metrics in your regular administrative reporting and automate task monitoring and management using RecipeOps.
|
Here, you can select a job to view the total number of tasks used. If a recipe calls other recipe functions, it will not display the total number of tasks across all child jobs. Each “child job” will have its corresponding task usage details.
|
Monitoring at the Recipe Level
|
Monitoring at the Tenant Level
|
Recipe task usage can be found under the Settings tab of the recipe. Alternatively, you can filter specific recipes and time windows from the Dashboard.
|
The monthly tenant usage is available in the Dashboard. It provides hourly to monthly and overall task consumption which is customisable.
|
Monitoring at the Account Level
|
You should always have visibility of your task counts across all of your tenants, Dev, Test and Prod. This will also be another way to flag the need to optimise your recipes. You can achieve this in a number of ways:
- Working closely with your Customer Success Manager who will have a regular cadence with you and share your usage against all your entitlements
- Proactively monitor by leveraging RecipeOps and other connectors to develop a recipe to monitor your usage.
|
In Conclusion
We have looked at Task Usage and Identifying which of the recipes to optimise and how to prioritise that work. We have looked at Strategies or Approaches to that optimisation whether they be Trigger Strategies (how automations are triggered to control the number of jobs for a given recipe), Batch Strategies: (batching techniques to optimise the processing of multiple events in a single job), and Recipe Efficiency Strategies (alternative methods to implement business logic in recipes more effectively). Lastly, we looked at how to monitor this good work and track changes in consumption at the Job, Recipe and Account levels.
Reltio PS have implemented many of these enhancements in their recipes to ensure that they are optimised. We have seen some great successes with these approaches, and we know you will see them too!
Paul Lawrence Technical Delivery Manager
Father, Analytical thinker, and Builder of Lego.
Paul is a Project Manager & Senior Account Manager with 16 years of experience working with global brands in FMCG, Pharma, and Manufacturing managing high-value and Enterprise accounts to drive renewals and growth of existing contract value whilst delivering exceptional Customer Adoption and best practice.
|
|
Special thanks to Moin Ur Raza, Diparnab Dey, Kevin King, Aditi Verma and Zaid Jawad for taking the time to review this blog and for providing feedback!