Automating the Reltio data load process

View Only

Back to discussions

Expand all | Collapse all

Automating the Reltio data load process

1. Automating the Reltio data load process

Recommend
Reltio Partner

SOURAV BANERJEE
Posted 07-20-2022 10:19

Reply Reply Privately
Hi,
I am currently exploring options on how to automate the Reltio data load process.

Current approach –

We are using the data loader option from Reltio UI to load csv files (stored in S3 bucket) manually.

Future approach –

We are planning to remove the manual process of loading csv files one by one and automate the process of loading the records which are there in csv files in Reltio from S3 bucket.

Can someone please let me know few options on how to achieve this?

Let me know if you need additional input.

------------------------------
SOURAV BANERJEE
Deloitte Consulting India Private Limited
KOLKATA
------------------------------
2. RE: Automating the Reltio data load process

Recommend
Reltio Employee

Diparnab Dey
Posted 07-21-2022 02:25

Reply Reply Privately
Hi Sourav,

If Reltio Integration Hub (RIH) is enabled for your tenant. You can create a recipe there which reads the CSV files from your S3 bucket and load the records in RELTIO in Bulk. You can even schedule the recipe to run as per your required time intervals. If RIH is not enabled already, you can get in touch with your CSM to have it enabled in the system.

Here is the documentation for RIH for your reference :
https://docs.reltio.com/integrationhub/reltiointegrationhub.html

Also, there is a course in Reltio Academy named "Reltio Integration Hub Foundation Certification" which has a video tutorial for your exact use case.

Another option could be to schedule the data loader job that you are currently running manually. Here is the documentation for how you can achieve that.

https://docs.reltio.com/dataloader/schedulejob.html

------------------------------
Diparnab Dey
------------------------------

Original Message
3. RE: Automating the Reltio data load process

Recommend
Reltio Employee

Diparnab Dey
Posted 07-21-2022 02:25

Reply Reply Privately
Hi Sourav,

You can schedule the job that you are currently running manually using data loader. This way the system will look for a new file placed in the S3 bucket based on your defined time interval and then run the job. You can refer to the below documentation for more information :

https://docs.reltio.com/dataloader/schedulejob.html

Another option will be to use Reltio Integration Hub (RIH) to create a recipe, which reads the file from the S3 bucket whenever a new CSV file is placed and load the records in bulk. If RIH is currently not enabled in your tenant, you can get in touch with your CSM to have it enabled. Here is the documentation for the Reltio Integration Hub for your reference :

https://docs.reltio.com/integrationhub/reltiointegrationhub.html

In the Reltio Academy course, Reltio Integration Hub Fundamentals Certification, there is a video example of the exact you case you are looking for.

------------------------------
Diparnab Dey
Technical Consultant
Reltio
Kolkata, West Bengal
------------------------------

Original Message
4. RE: Automating the Reltio data load process

Recommend
Bhaskar Nunna
Posted 07-21-2022 13:27

Reply Reply Privately
Hi Sourav,

You can explore Diparnab suggestion to use Reltio Integration Hub but it may require additional contracts/licensing with Workato based on how your agreement was defined with Reltio. Reltio Integrations Hub used Workato framework behind the scenes. We tried that approach at our company but could not proceed further because of the licensing terms I explained above. However, we use Informatica Cloud (IICS) to load data in to Reltio using Reltio APIs. On a high level, this is how we do it.

1. Load Accounts to Staging tables Using IICS (We use Amazon Redshift to stage the data coming from excel files or 3rd party vendors)
2. Validate data integrity and ensure the Account profiles have proper information populated before loading in to Reltio (Ensure Account completeness like Address info etc)
3. Use IICS to load data in to Reltio (Rest APIs and Bulk load APIs)
4. Let the magic happen in Reltio (further Data enrichments, profile completeness, matching/merging etc)
5. Extract data from Reltio and load in the Amazon Redshift tables for further consumption by downstream applications and dashboards using IICS

Hope this helps.

regards,
Bhaskar

------------------------------
Bhaskar Nunna
------------------------------

Original Message
5. RE: Automating the Reltio data load process

Recommend
Reltio Partner

SOURAV BANERJEE
Posted 07-22-2022 03:47

Reply Reply Privately
Hi Diparnab/Bhaskar,

Thanks for all your suggestions.

We are looking into the option of RIH.

Related to the scheduling option in Reltio data loader, I have checked but seems it will suffice our requirement . Please find the details about the use-case.

Currently we are splitting a large file and storing the split files in S3. For example if we are loading a file having 10 million Individual records, we are first splitting that into 10 files with 1 million records in each file (naming convention is something like a11, a12, a13…..). In this case we want to load these split files one after the other but in scheduler we can only schedule a job definition where the file name is static.

Please let me know your thoughts on this and in-case I am missing anything.

Thanks,
Sourav

------------------------------
SOURAV BANERJEE
Deloitte Consulting India Private Limited
KOLKATA
------------------------------

Original Message
6. RE: Automating the Reltio data load process

Recommend
Reltio Employee

Diparnab Dey
Posted 07-22-2022 05:41

Reply Reply Privately
Hi Sourav,

Can you please confirm if the data bulk load process that we are talking about here, needs to be performed in a scheduled (e.g. once a day) manner or in an ad-hoc manner i.e. data load needs to be executed as soon as the files are placed in the S3 bucket. Both can be achieved.

In case the data load needs to be executed in a scheduled manner, all you need to do while defining the job definition is instead of providing the static file path, you will need to provide S3 bucket directory path where the split files will be placed and then define your schedule. This way when the job runs, it will load all the new files that are placed in the specified location. If there is no new file available, the job will fail with an error called "no new file found" . So in this approach you will need to design it in such a way that before your defined schedule, the split files are uploaded in the S3 bucket.

You can even utilize a property called file mask, if you want to only execute the files which contains a specific prefix. For example: If your S3 Bucket contains files like Individual_1.csv, Individual_2.csv, Individual_3.csv and Company_1.csv, Company_2.csv. And you have defined the file mask property as "Individual". The job will only load the new files having the prefix "Individual" and ignore the rest.

If your data load process is ad-hoc, then you can achieve it in two ways :

1. As Bhaskar suggested, utilize an ETL (e.g IICS) or integration tool (e.g MuleSoft) to read the files from the S3 bucket and use the console data loader APIs to load the files sequentially. In this approach , your ETL/Integration layer will decide which files needs to be executed.

2. You can write a custom utility or a lambda function which will be triggered whenever new files are placed in your S3 bucket and trigger the job that you have previously defined as stated in the scheduled approach. Here is the swagger documentation for the data loader APIs for your reference :
https://developer.reltio.com/private/swagger.htm?module=Data%20Ingestion

I will also recommend you to take a look at the "Loading Data into Reltio" course in the Reltio Academy, as this course covers all these topics in a detailed manner and you will get to download the relevant postman collections for the APIs as well.

Please let me know in case you need any further information.

------------------------------
Diparnab Dey
Technical Consultant
Reltio
Kolkata, West Bengal
------------------------------

Original Message
7. RE: Automating the Reltio data load process

Recommend
Doug Yates
Posted 03-11-2025 09:27

Reply Reply Privately
I just saw this thread. Is there a way within a recipe to split a large file from a S3 bucket? We can split the files prior to loading to S3 but hoping there is a way to do this within the recipe to help minimize the number of steps before getting to Reltio. From what I read above, it sounded like there is a way but this thread suggests splitting before loading to S3 - unless I misunderstood.

------------------------------
Doug Yates
Humana
KY
------------------------------

Original Message
8. RE: Automating the Reltio data load process

Recommend
Radhakrishnan Ramalingam
Posted 03-11-2025 11:43

Reply Reply Privately
Hi Doug,

Of course, you can achieve this using a recipe. However, the key consideration is the total volume of incoming data, as it impacts the tasks count.

------------------------------
Radhakrishnan Ramalingam Architect
Majix Solutions Inc
San Ramon, CA
------------------------------

Original Message
9. RE: Automating the Reltio data load process

Recommend
Reltio Partner

Curt Pearlman
Posted 07-22-2022 09:33

Reply Reply Privately
Adding on and baking in some lessons learned... We were faced with having to load 14 large IDL files, followed by incremental updates each day. We originally looked at Reltio Console data loader and determined it was too limited for various reasons, even if we used its api programmatically. We briefly looked at RIH but decided that we had previous experience and artifacts within the team to do the following instead: We built an ingestion architecture in AWS based on a series of steps that began with Glue Databrew to extract the data and forms tables that mimic the information architecture in Reltio ( a main file, a phone file, an email file, an address file, etc..) That is followed by a python program that stitches data from those "grouped files" into csv rows, each row representing a full entity to be posted. This is followed by a JSON generator, followed by the ROCS Data loader utility. All sequenced and controlled by Airflow (AWS MWAA). All worked pretty well. But we then had trouble managing incremental updates to the nested attributes (Address, Phone, Email, Identifiers). So we swapped out Glue Databrew with AWS RDS. So now RDS absorbed the IDL and is updated each day by all the incremental updates and is able to maintain a current image of each nested attribute in full. Thus RDS database contains the entire set of data in current form including each nest in its entirety. The rest of the pipeline then detects any updated data from RDS and refactors the entire entity of that data into JSON form and posts it into Reltio. In this way, we essentially each day overwrite the entire nest of a record even if only one part of it is updated and we circumvent the hassle of figuring out how to update a single item within a nest. Yes there are techniques in the reltio api to index into a nest and surgically pass an update but requires forming a key that has restrictive requirements around MatchFieldURI based on how you want your survivorship to work so we found that too limiting an approach. We're quite happy now with the current approach.

Were we to do it again, I would definitely however look at using RIH for all the obvious reasons and points mentioned by others.

------------------------------
Curt Pearlman
PwC
Agoura Hills CA
------------------------------

Original Message

Reltio Connect

Automating the Reltio data load process

SOURAV BANERJEE07-20-2022 10:19

Diparnab Dey07-21-2022 02:25

Diparnab Dey07-21-2022 02:25

Bhaskar Nunna07-21-2022 13:27

SOURAV BANERJEE07-22-2022 03:47

Diparnab Dey07-22-2022 05:41

Doug Yates03-11-2025 09:27

Radhakrishnan Ramalingam03-11-2025 11:43

Curt Pearlman07-22-2022 09:33

1. Automating the Reltio data load process

2. RE: Automating the Reltio data load process

3. RE: Automating the Reltio data load process

4. RE: Automating the Reltio data load process

5. RE: Automating the Reltio data load process

6. RE: Automating the Reltio data load process

7. RE: Automating the Reltio data load process

8. RE: Automating the Reltio data load process

9. RE: Automating the Reltio data load process

Quick Links

Privacy & Terms

Account Not Active

Reltio Connect

Automating the Reltio data load process

SOURAV BANERJEE07-20-2022 10:19

Diparnab Dey07-21-2022 02:25

Diparnab Dey07-21-2022 02:25

Bhaskar Nunna07-21-2022 13:27

SOURAV BANERJEE07-22-2022 03:47

Diparnab Dey07-22-2022 05:41

Doug Yates03-11-2025 09:27

Radhakrishnan Ramalingam03-11-2025 11:43

Curt Pearlman07-22-2022 09:33

1. Automating the Reltio data load process

2. RE: Automating the Reltio data load process

3. RE: Automating the Reltio data load process

4. RE: Automating the Reltio data load process

5. RE: Automating the Reltio data load process

6. RE: Automating the Reltio data load process

7. RE: Automating the Reltio data load process

8. RE: Automating the Reltio data load process

9. RE: Automating the Reltio data load process

Related Content

How to Run Data Loader Jobs with RELTIO_JSON (API-First)

S3 File Cleanser

RDM Utilities Reltio Integration Hub - PPT

To perform External Match on CSV file using API

Potential Matches export to S3 bucket

Quick Links

Privacy & Terms

Contact Us

Account Not Active