Data Profiler: Data Drift Model Monitoring Tool

@faun ・ May 07,2023

https://www.capitalone.com/tech/open-source/dataprofiler-mon...

Data Profiler: Data Drift Model Monitoring Tool

The article discusses the importance of monitoring machine learning models to detect data drift and maintain efficiency. It presents a framework for detecting data drift that involves four stages:

data retrieval,
data modeling,
test statistics calculation,
hypothesis testing.

The Kubeflow Data Profiler component, a Python library that automates data analysis, monitoring, and sensitive data detection, is used to detect feature drift.

A pipeline is created using this component to retrieve batches of training and test samples, profile the data, merge the profile model objects, and compute dissimilarity metrics.
The pipeline returns a difference report containing key-value pairs for several data drift measures.
The article emphasizes the importance of evaluating data drift and provides actionable steps for detecting and monitoring it using the Kubeflow Data Profiler component.