In this article I want to explore the topic of integrating and synchronizing data to and from an MDM system, and areas of debate that could and should occur when designing your integration scheme. Not surprisingly, the concept of a golden record sits at the geocenter of this discussion. So let’s start with a definition of a golden record, perhaps the most fundamental element within MDM.
What is a Golden Record?
Generally professionals in the MDM space refer to an MDM golden record as a record that at any and all times, holds perfect information or at least the most accurate information available anywhere in the ecosystem of applications integrated to MDM. This is based on a presumption that we have architected business processes and physical integrations such that if any system gains information more accurate than the information presently in MDM, the MDM record will be instantaneously updated in order to reaffirm the original statement that MDM holds “the most accurate information available anywhere in the ecosystem of applications integrated to MDM.”
On that note, I’d like to take a moment to point out that physical integrations are only one component of keeping the information in MDM accurate. One might say they are the transport mechanism. Equally, perhaps more important are the business processes across the ecosystem that vie to update and keep the data accurate. Example, rather than relying solely on 3rd party data sources that arrive quarterly or system extracts that come weekly or even daily, your system design might employ business strategies supported by technology, that take advantage of customer interactions that not only occur more frequently than the previously mentioned updates, but interact directly with the entity, providing first-party information. Here we’re talking about website visits, discussions with the call center agent, or a chatbot. Imagine these interactions occur numerous times per day and have the ability to update critical data elements (CDEs), or even just reaffirm their present accuracy within MDM by bringing in-the-moment affirmations into MDM with a current timestamp.
Sources and Targets
All systems that contribute information to MDM are by definition, upstream from MDM and considered sources. All systems that receive information from MDM are by definition downstream and referred to as targets. A system can be both an upstream system (i.e. a contributor to MDM) and a downstream system (i.e. a target from MDM). In a case such as this, practitioners often speak of a bidirectional integration but to be clear, the term “bidirectional integration” is conceptual wherein reality, you’re building two completely separate, physical integrations each with its own behavior and uses of technology.
Integrations should attempt to support business requirements. For example, during interviews, users might indicate a need for updates from System A to MDM to occur within three seconds but go on to state that updates in the reverse direction, that is from MDM to System A, need only occur once a day at 1am. In fact they may have a business reason why they only want that flow of data to occur once per day, due to coordination that has to occur with other scheduled systems.
Therefore no one integration pattern or technology fits every user requirement. Similarly, not every requirement, which frankly I like to position more accurately as a “business request”, can be accommodated in practice due to system limitations that are often present. Perhaps system A doesn’t support an outbound, event-driven model and another approach has to be employed, increasing the latency from the aforementioned “three seconds” to something significantly longer. A discussion with the business users ensues and a compromise is reached. Long point made short then, integration design should be experience-driven, and vice-versa, any implications resulting from an agreed-upon integration scheme should be clearly explained and discussed with the users. This will become even more evident further in this article.
Now let’s examine synchronicity and circumstances that create out-of-sync outcomes. Suppose we built an integration bus that pushed source system information into MDM in real-time (for argument's sake, say 100 milliseconds), and another integration that pushed MDM information to target systems also in real-time. Given this premise, could these three systems – a source, the MDM system, and a target — ever be out of sync with each other by more than the nominal 100 milliseconds? Based on what we’ve agreed to so far, one might say no, but in the real-world there are precisely three reasons why this could and does happen.
Case #1: The integration from MDM to a target system experiences a failure, thus the target is out of sync with MDM. Here the source and MDM are in sync, but the target still has stale information until the failure is repaired and past updates are brought forward.
Case #2: The integration from the source system to MDM experiences a failure. If the survivorship rules would have regarded the update from the source to be the new truth, then the failure will cause the truth to be held hostage by the source system rendering MDM and all targets to be out of sync with that source until the failure is repaired and past updates are brought forward into MDM and propagated to the targets.
Case #3: The information provided by the source is rejected by the MDM survivorship strategy yet there is no provision within the integration scheme to coerce MDM’s prevailing truth back into the source system. Thus the source system appears out of sync with MDM.
This last case, #3, is particularly interesting and served as the genesis of this article because it asks the system owner and more broadly the data governance council to consider what should happen when a source system provides information that MDM rejects as “non-golden”, and what the resulting experience of the users of that system should be. As we explore this, let’s remember that the survivorship rules held within MDM are presumably agreed-upon by the organization and are designed to generate enterprise-level truth. The MDM system is then just a proxy for that enterprise-truth. Thus if the information from the source is rejected by MDM, it’s really being rejected by the rules for enterprise-truth. The quarrel then is not really between the system and MDM, it's between the system and the data quality rules established by the enterprise.
To coerce or not coerce, that is the question
Continuing on with case #3, the question is, if MDM rejects an update from a source that is also serving as a target, should MDM send a remediating event back to that system, coercing the offending record back to accepting the golden values presently held within MDM? Suppose this were implemented. If you were a user of that system, your experience would be that you press “Save” on a record and a few seconds later the values you thought you saved (at least for the “CDE”s within the record), revert back to their previous values. And each time you try to fix them and re-save the record, the same thing happens. Crazy huh? Well, maybe not.
As you ponder this question, you realize there is no absolute right or wrong behavior here. Instead the debate brings to light a more fundamental question of what role and behavior the data governance council wishes for MDM in its enterprise. Recall from the first paragraph of this article that one model for MDM suggests that it contains the most accurate set of CDE values available across the ecosystem and as such, harmony of data across the ecosystem results from all systems unconditionally absorbing the values that appear within MDM whenever MDM synthesizes an updated set of values. If the data governance council subscribes to this model, then that “crazy” behavior described above isn’t so crazy after all and is precisely what should happen! After all, if the values were accepted by MDM as the new truth and sync’d to other systems, no one would argue about those other systems being updated. The difference here is that the very system that is providing an updated opinion is being told that its opinion is unacceptable and moreover, is having its opinion in its own system, overwritten. Naturally, proper training is warranted to ensure users understand why this behavior is occurring.
But there’s another way this discussion could go. The council could make a decision that a source should be able to maintain values for CDEs that make sense for that source but don’t necessarily have to align with enterprise values. In this case, MDM’s role is quite different; one might say that the information in MDM is perfect according to an enterprise definition of data quality standards, AND that by decree, the information in one or more source systems is potentially and allowably imperfect. That is, imperfect from an enterprise perspective while considered perfect and true within the context of the siloed, source system.
Hopefully this article spurs thought and helps to influence future conversations you may have with system designers and data governance professionals.