Analyzing Machine Learning Models with Yellowbrick

Anscombe’s quartet demonstrates a very significant idea: we need to visualize data before analyzing it. The quartet consists of four hypothetical datasets each containing eleven data points.

Whereas all these datasets have essentially the same descriptive statistics including the mean, variance, correlation, and regression line, they have very different distributions when graphed.

This is a classic example that reiterates the fact that looking at data is as important as performing numerical computations on it. Although statistical tests are important and mostly necessary for analyzing datasets, so is visual analysis.

Visualization thus has a critical role to play throughout the analytical process and is a, frankly, a must-have for any effective analysis, for model selection, and for evaluation. This article aims to discuss a diagnostic platform called Yellowbrick that allows data scientists to visualize the entire model selection process to steer us towards better, more explainable models—and avoid pitfalls and traps along the way.

Model Selection Process

It’s been seen that more often than not, machine learning relies primarily on the models being used for inference. Practitioners have their favorites when it comes to model selection. This preference is built over time through experience and knowledge, but what actually happens under the hood is often not given enough importance.

What is important to note is that the model selection process isn’t all about picking the “right” or “wrong” algorithm—it’s actually much deeper and an iterative process that involves the following steps:

  1. Selecting and/or engineering the smallest and most predictive feature set.
  2. Choosing a set of algorithms from a model family.
  3. Tuning the algorithm hyperparameters to optimize performance.

All of the above points together constitute the Model Selection Triple, which was first discussed in a 2015 SIGMOD¹ paper by Kumar et al.

The Yellowbrick library is a diagnostic visualization platform for machine learning that allows data scientists to steer the model selection process and assist in diagnosing problems throughout the machine learning workflow. In short, it tries to find a model described by a triple composed of features, an algorithm, and hyperparameters that best fit the data.


Yellowbrick is an open source, Python project that extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create interactive data explorations.

It extends the scikit-learn API with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the scikit-learn pipeline process, providing visuals throughout the transformation of high-dimensional data.


Yellowbrick isn’t a replacement for other data visualization libraries but helps to achieve the following:

  • Model Visualization
  • Data visualization for machine learning
  • Visual Diagnostics
  • Visual Steering


Yellowbrick can either be installed through pip or through conda distribution. For detailed instructions, you may want to refer the documentation.


The Yellowbrick API should appear easy if you are familiar with the scikit-learn interface.

The primary interface is a Visualizer – an object that learns from data to produce a visualization. In order to use visualizers, import the visualizer, instantiate it, call the visualizer’s fit() method, and then, in order to render the visualization, call the visualizer’s poof() method, which does the magic!


Yellowbrick hosts several datasets wrangled from the UCI Machine Learning Repository. We’ll be working with the ‘occupancy’ dataset. It’s an experimental dataset used for binary classification wherein the idea is to predict room occupancy given the variables such as Temperature, Humidity, Light and CO2. You can download the dataset here.

The code for this article can be accessed from the associated Github Repository or can be viewed on my binder by clicking the image below.

Importing the necessary libraries

Loading the dataset

Specifying the feature and target column

Feature Analysis with Yellowbrick

Feature engineering requires an understanding of the relationships between features—hence the feature analysis visualizers in Yellowbrick help to visualize the data in space so that important features can be detected.

The visualizers focus on aggregation, optimization, and other techniques to give overviews of the data. The following feature analysis visualizers have been implemented in Yellowbrick currently:

Let’s go through some of them to see how they’re implemented.

Rank Features

Rank Features rank single and pairs of features to detect covariance. Ranking can be 1D or 2D depending on the number of features utilized for ranking.

Rank 1D

Rank 1D utilizes a ranking algorithm that takes into account only a single feature at a time. By default, the Shapiro-Wilk algorithm is used to assess the normality of the distribution of instances with respect to the feature:

Rank 2D

Rank 2D, on the other hand, performs pairwise feature analysis as a heatmap. The default ranking algorithm is covariance, but we can also use the Pearson score:


RadViz is a multivariate data visualization algorithm that plots each feature dimension uniformly around the circumference of a circle and then plots data points on the interior of the circle. This allows many dimensions to easily fit on a circle, greatly expanding the dimensionality of the visualization:

Parallel Coordinates

This technique is useful when we need to detect clusters of instances that have similar classes, and to note features that have high variance or different distributions. Points that tend to cluster will appear closer together:

Parallel coordinates is a visualization technique used to plot individual data elements across many dimensions. Each of the dimensions corresponds to a vertical axis, and each data element is displayed as a series of connected points along the dimensions/axes.

The groups of similar instances are called ‘braids’, and when there are distinct braids of different classes, it suggests there’s enough separability that a classification algorithm might be able to discern between each class.

Model Evaluation Visualizers

Model evaluation signifies how well the values predicted by the model match the actual labeled ones. Yellowbrick has visualizers for classification, regression, and clustering algorithms. Let’s see a select few.

Evaluating Classifiers

Classification models try to assign the dependent variables one or more categories. The sklearn.metrics module implements a function to measure classification performance.

Yellowbrick implements the following classifier evaluations:

Let’s implement a few of them on our data:

Split the dataset into training and testing sets:

Classification Report

The classification report visualizer displays the precision, recall, and F1 scores for the model:

Let’s visualize classification reports for two models to decide which is better.

  • Classification report using Gaussian NB
  • Classification report using Logistic Regression

Visual classification reports are used to compare classification models to select models that are “redder”, e.g. have stronger classification metrics or that are more balanced.

Confusion Matrix

The ConfusionMatrix visualizer displays the accuracy score of the model, i.e. it shows how each of the predicted classes compares to their actual classes. Let’s check out the confusion matrix for the Logistic Regression Model:

Evaluating Regressors

Regression models try to predict a target in a continuous space. The sklearn.metrics module implements function to measure classification performance.

Yellowbrick implements the following regressor evaluation methods:

For implementing the regressor visualizer, let’s quickly import a regression dataset. We’ll use the concrete dataset, which contains 1030 instances and 9 attributes. Eight of the attributes are explanatory variables, including the age of the concrete and the materials used to create it, while the target variable strength is a measure of the concrete’s compressive strength (MPa). Download the dataset here.

Loading the dataset

Saving feature names as a list and target variable as a string:

Residuals Plot

A residual is a difference between the target and predicted values, i.e. the error of the prediction. The ResidualsPlot Visualizer shows the difference between residuals on the vertical axis and the dependent variable on the horizontal axis, allowing you to detect regions within the target that may be susceptible to more or less error. It also enables visualizing the train and test data with different colors.

If the points are well dispersed around the horizontal dark line, this means linear regression will work well on the data; otherwise, non-linear will work better. The above example shows that the data is pretty uniformly distributed.

Prediction Error Plot

The Prediction Error Visualizer visualizes prediction errors as a scatterplot of the predicted and actual values. We can then visualize the line of best fit and compare it to the 45º line.

Alpha Selection Visualizer

The AlphaSelection Visualizer demonstrates how different alpha values influence model selection during the regularization of linear models. A higher alpha value denotes a less complex model, and vice versa, decreasing the error due to variance (overfit).

However, alphas that are too high increase the error due to bias (underfit). Therefore, it’s important to choose an optimal alpha so that the error is minimized in both directions.

We can experiment with Lasso, Ridge, and ElasticNet and see which has an optimum alpha value.

Hyperparameter Tuning

Tuning a model is as important as model selection. One of the ways you can use Yellowbrick for hyperparameter tuning apart from the alpha selection includes:

Silhouette Visualizer

The Silhouette Visualizer displays the silhouette coefficient for each sample on a per-cluster basis, visualizing which clusters are dense and which are not.

Apart from this, there are a bunch of other visualizer APIs that can be used for tuning, like the Elbow method, which is also widely used. But for the sake of demonstration, we’ll stick with just this one method.


The code and the datasets used in this article are available on my GitHub Repository.


The Yellowbrick library allows data scientists to steer the model selection process. The fact that it extends the scikit-learn API lowers the learning curve considerably. This can help in understanding a large variety of algorithms and methods and in monitoring model performance in real-world applications.


  1. Model Selection Management Systems: The Next Frontier of Advanced Analytics
  2. Visual diagnostics for more effective machine learning
  3. Learning machine learning with Yellowbrick.
  4. Yellowbrick documentation
Avatar photo


Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *