The Reltio Master Data Management (MDM) Hub is updated daily through scheduled batch data loads. These loads include updates to existing records and the creation of new ones. However, a critical risk emerges when a batch load fails and a subsequent load executes successfully. This can result in data discrepancies, where newer changes are potentially overwritten by stale information from the previously failed job if the failed file is reprocessed blindly. The objective of this proposal is to define and evaluate robust solutions to ensure data consistency and integrity, even in the event of partial or failed data loads.
Challenge
When a nightly batch load fails and the next scheduled job runs without addressing the failed data, several issues can arise:
-
The subsequent load may contain changes to the same records, potentially leading to overwrites with outdated information.
-
The Reltio MDM platform records crosswalk history in the order data is received, which complicates chronological reconstruction of updates using supplied timestamps alone.
Therefore, a strategy is needed to handle failed loads and prevent loss of data accuracy, without disrupting the integrity of newer updates already present in the Hub.