Reltio Connect

 View Only
  • 1.  Automating the Reltio data load process

    Reltio Partner
    Posted 07-20-2022 10:19

    Hi,
    I am currently exploring options on how to automate the Reltio data load process. 

    Current approach –

    We are using the data loader option from Reltio UI to load csv files (stored in S3 bucket) manually.

    Future approach –

    We are planning to remove the manual process of loading csv files one by one and automate the process of loading the records which are there in csv files in Reltio from S3 bucket.

    Can someone please let me know few options on how to achieve this?

    Let me know if you need additional input.



    ------------------------------
    SOURAV BANERJEE
    Deloitte Consulting India Private Limited
    KOLKATA
    ------------------------------


  • 2.  RE: Automating the Reltio data load process

    Reltio Employee
    Posted 07-21-2022 02:25

    Hi Sourav,

    If Reltio Integration Hub (RIH) is enabled for your tenant. You can create a recipe there which reads the CSV files from your S3 bucket and load the records in RELTIO in Bulk. You can even schedule the recipe to run as per your required time intervals. If RIH is not enabled already, you can get in touch with your CSM to have it enabled in the system. 

    Here is the documentation for RIH for your reference : 
    https://docs.reltio.com/integrationhub/reltiointegrationhub.html

    Also, there is a course in Reltio Academy named "Reltio Integration Hub Foundation Certification" which has a video tutorial for your exact use case. 

    Another option could be to schedule the data loader job that you are currently running manually. Here is the documentation for how you can achieve that. 

    https://docs.reltio.com/dataloader/schedulejob.html



    ------------------------------
    Diparnab Dey
    ------------------------------



  • 3.  RE: Automating the Reltio data load process

    Reltio Employee
    Posted 07-21-2022 02:25
    Hi Sourav,

    You can schedule the job that you are currently running manually using data loader.  This way the system will look for a new file placed in the S3 bucket based on your defined time interval and then run the job. You can refer to the below documentation for more information : 

    https://docs.reltio.com/dataloader/schedulejob.html

    Another option will be to use Reltio Integration Hub (RIH) to create a recipe, which reads the file from the S3 bucket whenever a new CSV file is placed and load the records in bulk. If RIH is currently not enabled in your tenant, you can get in touch with your CSM to have it enabled. Here is the documentation for the Reltio Integration Hub for your reference : 

    https://docs.reltio.com/integrationhub/reltiointegrationhub.html

    In the Reltio Academy course, Reltio Integration Hub Fundamentals Certification, there is a video example of the exact you case you are looking for.

    ------------------------------
    Diparnab Dey
    Technical Consultant
    Reltio
    Kolkata, West Bengal
    ------------------------------



  • 4.  RE: Automating the Reltio data load process

    Posted 07-21-2022 13:27
    Hi Sourav,

    You can explore Diparnab suggestion to use Reltio Integration Hub but it may require additional contracts/licensing with Workato based on how your agreement was defined with Reltio. Reltio Integrations Hub used Workato framework behind the scenes. We tried that approach at our company but could not proceed further because of the licensing terms I explained above. However, we use Informatica Cloud (IICS) to load data in to Reltio using Reltio APIs. On a high level, this is how we do it.

    1. Load Accounts to Staging tables Using IICS (We use Amazon Redshift to stage the data coming from excel files or 3rd party vendors)
    2. Validate data integrity and ensure the Account profiles have proper information populated before loading in to Reltio (Ensure Account completeness like Address info etc)
    3. Use IICS to load data in to Reltio (Rest APIs and Bulk load APIs)
    4. Let the magic happen in Reltio (further Data enrichments, profile completeness, matching/merging etc)
    5. Extract data from Reltio and load in the Amazon Redshift tables for further consumption by downstream applications and dashboards using IICS 

    Hope this helps.

    regards,
    Bhaskar


    ------------------------------
    Bhaskar Nunna
    ------------------------------



  • 5.  RE: Automating the Reltio data load process

    Reltio Partner
    Posted 07-22-2022 03:47
    Hi Diparnab/Bhaskar,

    Thanks for all your suggestions.

    We are looking into the option of RIH.

    Related to the scheduling option in Reltio data loader, I have checked but seems it will suffice our requirement . Please find the details about the use-case.

    Currently we are splitting a large file and storing the split files in S3. For example if we are loading a file having 10 million Individual records, we are first splitting that into 10 files with 1 million records in each file (naming convention is something like a11, a12, a13…..). In this case we want to load these split files one after the other but in scheduler we can only schedule a job definition where the file name is static.  

    Please let me know your thoughts on this and in-case I am missing anything.

    Thanks,
    Sourav

    ------------------------------
    SOURAV BANERJEE
    Deloitte Consulting India Private Limited
    KOLKATA
    ------------------------------



  • 6.  RE: Automating the Reltio data load process

    Reltio Employee
    Posted 07-22-2022 05:41
    Hi Sourav,

    Can you please confirm if the data bulk load process that we are talking about here, needs to be performed in a scheduled (e.g. once a day) manner or in an ad-hoc manner i.e. data load needs to be executed as soon as the files are placed in the S3 bucket. Both can be achieved.

    • In case the data load needs to be executed in a scheduled manner, all you need to do while defining the job definition is instead of providing the static file path, you will need to provide S3 bucket directory path where the split files will be placed and then define your schedule.  This way when the job runs, it will load all the new files that are placed in the specified location. If there is no new file available, the job will fail with an error called "no new file found" . So in this approach you will need to design it in such a way that before your defined schedule, the split files are uploaded in the S3 bucket. 

      You can even utilize a property called file mask, if you want to only execute the files which contains a specific prefix. For example: If your S3 Bucket contains files like Individual_1.csv, Individual_2.csv, Individual_3.csv and Company_1.csv, Company_2.csv. And you have defined the file mask property as "Individual". The job will only load the new files having the prefix "Individual" and ignore the rest.

    • If your data load process is ad-hoc, then you can achieve it in two ways : 

      1. As Bhaskar suggested, utilize an ETL (e.g IICS) or integration tool (e.g MuleSoft) to read the files from the S3 bucket and use the console data loader APIs to load the files sequentially. In this approach , your ETL/Integration layer will decide which files needs to be executed. 

      2. You can write a custom utility or a lambda function which will be triggered whenever new files are placed in your S3 bucket and trigger the job that you have previously defined as stated in the scheduled approach.  Here is the swagger documentation for the data loader APIs for your reference : 
      https://developer.reltio.com/private/swagger.htm?module=Data%20Ingestion


      I will also recommend you to take a look at the "Loading Data into Reltio" course in the Reltio Academy, as this course covers all these topics in a detailed manner and you will get to download the relevant postman collections for the APIs as well. 

      Please let me know in case you need any further information. 


    ------------------------------
    Diparnab Dey
    Technical Consultant
    Reltio
    Kolkata, West Bengal
    ------------------------------



  • 7.  RE: Automating the Reltio data load process

    Reltio Partner
    Posted 07-22-2022 09:33
    Adding on and baking in some lessons learned... We were faced with having to load 14 large IDL files, followed by incremental updates each day. We originally looked at Reltio Console data loader and determined it was too limited for various reasons, even if we used its api programmatically. We briefly looked at RIH but decided that we had previous experience and artifacts within the team to do the following instead: We built an ingestion architecture in AWS based on a series of steps that began with Glue Databrew to extract the data and forms tables that mimic the information architecture in Reltio ( a main file, a phone file, an email file, an address file, etc..) That is followed by a python program that stitches data from those "grouped files" into csv rows, each row representing a full entity to be posted. This is followed by a JSON generator, followed by the ROCS Data loader utility. All sequenced and controlled by Airflow (AWS MWAA). All worked pretty well. But we then had trouble managing incremental updates to the nested attributes (Address, Phone, Email, Identifiers). So we swapped out Glue Databrew with AWS RDS. So now RDS absorbed the IDL and is updated each day by all the incremental updates and is able to maintain a current image of each nested attribute in full. Thus RDS database contains the entire set of data in current form including each nest in its entirety. The rest of the pipeline then detects any updated data from RDS and refactors the entire entity of that data into JSON form and posts it into Reltio. In this way, we essentially each day overwrite the entire nest of a record even if only one part of it is updated and we circumvent the hassle of figuring out how to update a single item within a nest. Yes there are techniques in the reltio api to index into a nest and surgically pass an update but requires forming a key that has restrictive requirements around MatchFieldURI based on how you want your survivorship to work so we found that too limiting an approach. We're quite happy now with the current approach.  

    Were we to do it again, I would definitely however look at using RIH for all the obvious reasons and points mentioned by others.

    ------------------------------
    Curt Pearlman
    PwC
    Agoura Hills CA
    ------------------------------