What Is Concept Drift And Why Does It Go Undetected?

‍What is Drift?

Data Drift

  • Data drift is concerned with the change in the attributes of the independent variables.
  • Data Drift is also known as feature drift, covariate shift.

Concept Drift

  • As a result, the model based on historical data is no longer valid.
  • The model’s assumptions based on historical data must be changed using current data.
  • This raises issues because as time goes, the predictions become less accurate.

Understanding Concept Drift

Types of concept drift(Image by Author)
Visual Representation of Sudden Drift | Data Source: J.P. Morgan Research | Image by Author

Gradual Drift

Recurring Drift

Visual representation of Weekly Sales | Image By Author

Dealing with Concept Drift

Causes of drift

  • If the data distribution changes because of external activities.
  • A shift in input data, such as changing customer preferences due to Pandemic, or launching a product in a new market, and so on.
  • Problems with data integrity.
  • Data was collected incorrectly, or there was an issue with the data source.
  • Sometimes data is correct, but due to poor data engineering, it might cause drift.

How to detect drifts?

  • Adaptive Windowing(ADWIN)
  • Drift Detection Method (DDM)
  • Early Drift Detection Method (EDDM)

How to prevent drifts?

  • Retrain the model regularly when the model’s performance falls below a certain level.
  • You can train your model online, which means that your model weights are automatically updated with new data on a regular basis. The frequency of updates could be daily, monthly, or whenever new data is received. This solution is ideal if you expect incremental concept drift or an unstable model.
  • Another technique to deal with drift is to drop features. Multiple models are built one at a time, and if you discover that some features aren’t working, you may remove some of them and conduct A/B testing.
  • To prevent drift, you can work with missing values, outliers, label encoding, and other difficulties.
  • Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the study’s statistical power, and eventually, the reliability of its results.
  • Maintaining a static model as a baseline for comparison, it might be difficult to spot concept drift and determine whether a model has degraded over time. To understand any changes in model correctness, a static model can be utilized as a baseline. Having a baseline model to monitor the success of any changes you make to avoid concept drift is beneficial. After each intervention, a baseline static model can be used to assess the correctness of the updated models.
  • Continuously monitor machine learning models -inputs, outputs and data, while keeping an eye on the ML pipeline. This is where Censius AI Observability Platform comes to the rescue. With Censius, you can :
  • Track for prediction, data, and concept drift
  • Receive real-time alerts for monitoring violations
  • Check for data integrity across the pipeline
  • Get access to Censius.
  • With Censius’ intuitive user interface, you can easily add/update models, projects, datasets, and much more.
  • You can submit a model log by using REST API or Python SDK. You can create API keys from the settings page after logging in.
  • With Censius, you can set up different types of monitors on specific features of a model. Censius will then monitor these features continuously and alert you when violations occur.
  • For Concept drift, you can track prediction data and alert users on statistical changes compared to actual outputs.
  • Censius provides a host of monitors across various data categories that can be used to monitor data and model health, providing a broad metric view of the model and its performance. Send your queries to hello@censius.ai.

Model Monitoring Best Practices

  • Training machine learning models/applications with large data sets improves output accuracy. Using these data sets, the algorithm will learn a variety of factors that will aid the model in finding relevant information in the database. With improvement in precision, the model will perform well in production, and chances of drift will reduce.
  • Updating machine learning models regularly. For example, most machine learning methods that use weights or coefficients, such as regression algorithms and neural networks, can benefit from periodic updates.
  • Developing new models to address concept drift that occurs frequently or unexpectedly. As behavior evolves, models trained on historical data will become less trustworthy.
  • In some domains, such as time series problems, the data may be expected to change over time. In these types of problems, it is common to prepare the data in such a way as to remove the systematic changes to the data over time, such as trends and seasonality, by differencing.

Conclusion

--

--

--

Software Developer and Technical Writer.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Presentation: Control Chart by Balaji Reddie

Cognitive Analytics Answers the Question: What’s Interesting in Your Data?

Dealing with Cognitive biases: A data scientist perspective

Style and Rich Detail: On Viewing an Original W.E.B. Du Bois Data Visualization (Part 4)

Predicting the Impact of Climate Change on Birds Using GridDB | GridDB: Open Source Time Series…

The Political Twittersphere of the UK

Data Visualization — How Important is it for an Organization?

Experimenting with PySpark to Match Large Data Sources

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshil Patel

Harshil Patel

Software Developer and Technical Writer.

More from Medium

Factorization Machine on AWS: The best algorithm for recommender systems

Hien’s 15 Picks for Data+AI Summit 2022

Unlocking the full potential of data

AutoML: What is Automated Machine Learning and will AutoML transform Data Science?