Reltio Connect

 View Only
  • 1.  Data Load for millions of records

    Posted 08-23-2023 11:15

    Hi Team,

    I am trying to do a initial bulk load with  a million records in Reltio using RIH. In the recipe, i have defined the batch size as 2000 and set the concurrency to 4. Still for processing a  hundred thousand records, the recipe is stuck, or even if its complete, takes close to 1hour . Is there anything I am missing out?



    ------------------------------
    Sharmistha Roy
    EY GDS
    BENGALURU
    ------------------------------


  • 2.  RE: Data Load for millions of records

    Reltio Employee
    Posted 08-24-2023 09:35

    Hi Sharmistha

    A 2,000 record batch size is really big.  Have you considered trying with a smaller batch size?  I usually experiment in the region of 100-500 records per batch to find an optimal size / processing speed fit.  Have you tried some other sizes to see the difference?



    ------------------------------
    Guy Vorster
    Principal Solution Consultant
    ------------------------------



  • 3.  RE: Data Load for millions of records

    Posted 08-24-2023 09:42

    But to load millions of records, is 500 batch size a correct approach?

    And also if there are any errors coming in the process, I am not able to see it under the failed records but it is coming under the total records with successful status as false, how  can i export out the errored records?

    Regards,

    Sharmistha



    ------------------------------
    Sharmistha Roy
    EY GDS
    BENGALURU
    ------------------------------



  • 4.  RE: Data Load for millions of records

    Reltio Employee
    Posted 08-24-2023 09:52

    There will be a sweet spot with regard to batch sizes and it depends on how wide your records are and whether you have any LCA's in place.   I have personally found that you need to try a few different sizes to figure out the optimal one based on your setup.  If you want to specifically do something with the error records then you need another step after the Reltio POST operation to inspect the returned batch and check for error records.   To keep tasks down, you may be better off thinking about passing that returned batch to a ruby or javascript function to only return those records that have an error status.  I would be interested to know if anyone has created such a function to trim down the output to only look for errors.  I will ask around internally to find out.



    ------------------------------
    Guy Vorster
    Principal Solution Consultant
    ------------------------------



  • 5.  RE: Data Load for millions of records

    Posted 08-24-2023 10:02

    Thanks a bunch, will try the suggestions and would definitely look forward for a function to get only the errored out records.



    ------------------------------
    Sharmistha Roy
    EY GDS
    BENGALURU
    ------------------------------