Reltio Connect

 View Only

Evolution of the Data Quality Space - Show

By Chris Detzel posted 05-24-2022 09:57

  


In this Reltio Community Show, Chris Detzel, Director of Customer Community and Engagement will interview @Michael Burke, Senior Director of AI/ML at Reltio in a series of questions about Data Quality and MDM. In our conversation with Michael, we discuss his team’s recent research on data quality.

We discuss the relationship between traditional data quality systems and the current trend of leveraging artificial intelligence. We will also will dig into the advantages of having both a data quality platform and MDM be part of the same system.

 

The Evolution of Data Quality Space and Importance of Data Quality and Observability in Machine Learning

Data quality is essential to ensure that data is reliable for efficient decision-making. The evolution of data quality has been significant in recent years, evolving from being reactive to proactive, continuous, and predictive. The importance of data quality is increasing as companies gather more data to make better-informed decisions. In this article, we will explore the evolution of data quality space and the importance of data quality and observability in machine learning.

Michael Burke's Background

Michael Burke is the Senior Director of AI and ML, and he has been working in the data space for about a decade. He has moved from solving data problems for specific customers to SAS-based products and then to the MDM space. Burke started learning about machine learning about a decade ago, which led him to work on social data sets and understand how to make sense of information.

Definition of Data Quality and Its Importance

Data quality is a measure of the reliability, completeness, and accuracy of data. It is crucial for efficient decision-making and affects various aspects such as business, compliance, and efficiency.

The Evolution of Data Quality

The evolution of data quality can be broken down into four stages:

Stage 1: Reactive

The Reactive stage involves discovering data quality issues after the fact, and it typically involves manual processes. In this stage, there is no formal data quality management process, and the focus is on finding and correcting data quality problems as they arise.


Stage 2: Proactive

The Proactive stage involves identifying and fixing data quality issues before they become a problem. The focus is on prevention, and there is a formal data quality management process in place.


Stage 3: Continuous

The Continuous stage involves monitoring and measuring data quality in real-time. In this stage, there are automated processes that ensure data quality is continuously being monitored and improved.


Stage 4: Predictive

The Predictive stage involves predicting data quality issues before they occur. In this stage, there are advanced analytics and machine learning algorithms that can predict data quality problems and provide insights to prevent them from happening.


The Importance of Data Quality and Observability in Machine Learning

Data scientists work hard to create machine learning models that can perform well. However, it is not enough to rely solely on the model's performance without looking at the back end. Misclassifying users or sending users down the wrong pipeline of recommendations can cause significant issues. As a result, data scientists need to spend time maintaining and tracking variance to prevent downstream problems.

The increasing volatility of data sets means that observability is essential. It is no longer enough to perform data quality checks or maintain a feature store; the entire pipeline must be monitored. Kurt and Michael ask how Reltio is looking at AI and ML for its data quality platform. Michael promises to share more about this in a few weeks but mentions that Reltio is taking a proactive stance on identifying and remediating issues within the platform.

The benefits of having both master data management (MDM) and data quality in the same system are significant. MDM is the source of truth, and it should be actively managed and monitored to protect it. Traditional methods of analyzing quality once, ingesting, and then forgetting it are no longer viable. With the variability in data, proactive monitoring and real-time data quality are essential.

Real-time data quality allows for concise and accountable management of data, as well as better communication with data source owners. It is also essential to share the messaging of how data is improving over time with stakeholders. Data quality is a gateway to tell the story of how investment in data quality led to better business outcomes.

One common misconception is that people find correlative trends in data and jump to conclusions without proper analysis or consideration of other factors that may be influencing the trends. This can lead to inaccurate or incomplete insights that can harm business decisions. To avoid this, it is important to have a strong understanding of the data being used, the context in which it was collected, and any potential biases or limitations that may exist. Additionally, investing in data quality through measures such as data cleansing, validation, and standardization can help ensure that the data being used is accurate and reliable, leading to better insights and ultimately, better business outcomes.



Some of the Interview questions:

  • Share a little bit about your background and how you came to work in machine learning?
  • What is data quality and why is it useful?
  • How does traditional DQ differ from current trends?  
  • What are the benefits of having both MDM and data quality in the same system?  
  • What are some of the areas that todays technology struggles to solve?
  • Does having more data mean I can generally answer harder questions?
  • What are some of the common pitfalls data stewards run into when they are starting to improve data quality? 
  • Where do you see Data Quality heading?
  • Where does AI and ML play within the DQ Space?

#dataquality
#MDM
#communitywebinar

​​​​​​
#CommunityWebinar

0 comments
3512 views

Permalink