Linear Regression often is the introductory chapter of Machine Leaning and Gradient Descent probably is the first optimization technique anyone learns. Most of the time, the instructor uses a Contour Plot in order to explain the path of the Gradient Descent optimization algorithm. I used to wonder how to create those Contour plot. Today I will try to show how to visualize Gradient Descent using Contour plot in Python. Show Contour Plot is like a 3D surface plot, where the 3rd dimension (Z) gets plotted as constant slices (contour) on a 2 Dimensional surface. The left plot at the picture below shows a 3D plot and the right one is the Contour plot of the same 3D plot. You can see how the 3rd dimension (Y here) has been converted to contours of colors ( and lines ). The important part is, the value of Y is always same across the contour line for all the values of X1 & X2. Contour Plot using Python: Before jumping into gradient descent, lets understand how to actually plot Contour plot using Python. Here we will be using Python’s most popular data visualization library matplotlib. Data Preparation:I will create two vectors ( numpy array ) using 1 2 3 4 5 6 import numpy as np import matplotlib.pyplot as plt
x1 = np.linspace(-10.0, 10.0, 100) x2 = np.linspace(-10.0, 10.0, 100) If we simply make a scatter plot using x1 and x2, it will look like following: Python1 2 plt.scatter(x1, x2) plt.show() Now, in order to create a contour plot, we will use np.meshgrid():Lets looks at what 1 2 a=np.array((1,2,3)) a1,a2=np.meshgrid(a,a) If you look at a1 and a2, you will see now they both are 3X3 matrix and a1 has repeated rows and a2 has repeated cols. The In order to understand this visually, if you look at the 3D plot in the first picture, we have now created the bottom plane of that 3D plot, a mesh/grid. 1 2 3 4 5 6 7 8 9 10 a1 Out[11]: array([[1, 2, 3], [1, 2, 3], [1, 2, 3]]) a2 Out[12]: array([[1, 1, 1], [2, 2, 2], [3, 3, 3]]) Once the mesh/grid values have been created, we can now create the data for the 3rd (virtual) dimension. Here I am just using an eclipse function. Y will also be a 100 X 100 matrix. \(y=x1^2 + x2^2 \)Python 1 2 X1, X2 = np.meshgrid(x1, x2) Y = np.sqrt(np.square(X1) + np.square(X2)) Before even creating a proper contour plot, if we just plot the values of X1 & X2 and choose the color scale according to the values of Y, we can easily visualize the graph as following: Python1 2 3 cm = plt.cm.get_cmap('viridis') plt.scatter(X1, X2, c=Y, cmap=cm) plt.show() plt.contour() and plt.contourf():We will use matplotlib’s 1 2 3 4 5 cp = plt.contour(X1, X2, Y) plt.clabel(cp, inline=1, fontsize=10) plt.xlabel('X1') plt.ylabel('X2') plt.show() You can see the scatter plot and contour plots looks kind of same. However, we get much more control which creating the Contour plot over the scatter plot. Fill Contour Plot:The 1 2 3 4 5 6 cp = plt.contour(X1, X2, Y, colors='black', linestyles='dashed', linewidths=1) plt.clabel(cp, inline=1, fontsize=10) cp = plt.contourf(X1, X2, Y, ) plt.xlabel('X1') plt.ylabel('X2') plt.show() Choose custom levels:We will look at one more important feature of the plotting library. We can define the levels where we want to draw the contour lines using the level or 4th parameter of the both Python 1 2 3 4 5 6 7 levels = [0.0, 1.0, 2.0, 4.0, 8.0, 12.0, 14.0] cp = plt.contour(X1, X2, Y, levels, colors='black', linestyles='dashed', linewidths=1) plt.clabel(cp, inline=1, fontsize=10) cp = plt.contourf(X1, X2, Y, levels) plt.xlabel('X1') plt.ylabel('X2') plt.show() Gradient Descent Algorithm:
1 2 3 4 5 6 7 8 9 10 import pandas as pd
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv') y = data['sales'] X = np.column_stack((data['TV'], data['radio']))
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() X_scaled = scaler.fit_transform(X) Calculate Gradient and MSE:Using the following function to calculate the mse and derivate w.r.t w Python1 2 3 4 5 6 def gradient_descent(W, x, y): y_hat = x.dot(W).flatten() error = (y - y_hat) mse = (1.0 / len(x)) * np.sum(np.square(error)) gradient = -(1.0 / len(x)) * error.dot(x) return gradient, mse Next, choosing a starting point for Also, creating two more arrays, one for storing all the intermediate 1 2 3 4 5 6 w = np.array((-40, -40)) alpha = .1 tolerance = 1e-3
old_w = [] errors = [] Gradient Descent Loop:Below is the loop for Gradient Descent where we update w based on the learning rate. We are also capturing the w and mse values at every 10 iterations. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # Perform Gradient Descent iterations = 1 for i in range(200): gradient, error = gradient_descent(w, X_scaled, y) new_w = w - alpha * gradient
# Print error every 10 iterations if iterations % 10 == 0: print("Iteration: %d - Error: %.4f" % (iterations, error)) old_w.append(new_w) errors.append(error)
# Stopping Condition if np.sum(abs(new_w - w)) < tolerance: print('Gradient Descent has converged') break
iterations += 1 w = new_w
print('w =', w) That’s all, you can see that w is converging at the following values. 1 2 w Out[19]: array([3.91359776, 2.77964408]) Note: You can refer my other tutorial on gradient descent, where I have explained the math and program step by step.
Before we start writing the code for the Contour plot, we need to take care of few things. Convert the list ( Then I am adding 5 additional levels manually just to make the Contour plot look better. You can skip them. Finally, converting the errors list to numpy array, sorting it and saving it as the levels variable. We need to sort the level values from small to larger since that the way the contour() function expects. Python1 2 3 4 5 6 7 8 9 10 all_ws = np.array(old_w)
# Just for visualization errors.append(600) errors.append(500) errors.append(400) errors.append(300) errors.append(225)
levels = np.sort(np.array(errors)) Draw the Contour plot:Its always helpful to see first before going through the code. Here is the plot of our gradient descent algorithm we will be creating next. Prepare Axis (w0, w1)As we have done earlier, we need to create the Our The 1 2 3 w0 = np.linspace(-w[0] * 5, w[0] * 5, 100) w1 = np.linspace(-w[1] * 5, w[1] * 5, 100) mse_vals = np.zeros(shape=(w0.size, w1.size)) Last time use have used the eclipse formula to create the 3rd dimension, however here need to manually calculate the mse for each combination of Note: There is shortcut available for the below code, however wanted to keep it like this way since its easy to see whats going on. Prepare the 3rd Dimension :We will loop through each values of This time we are not using the meshgrid, however the concept is the same. Python1 2 3 4 for i, value1 in enumerate(w0): for j, value2 in enumerate(w1): w_temp = np.array((value1,value2)) mse_vals[i, j] = gradient_descent(w_temp, X_scaled, y)[1] Final Plot:We have w0, w1 and mse_vals (the 3rd dimension), now its pretty easy to create the contour plot like we saw earlier.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 plt.contourf(w0, w1, mse_vals, levels,alpha=.7) plt.axhline(0, color='black', alpha=.5, dashes=[2, 4],linewidth=1) plt.axvline(0, color='black', alpha=0.5, dashes=[2, 4],linewidth=1) for i in range(len(old_w) - 1): plt.annotate('', xy=all_ws[i + 1, :], xytext=all_ws[i, :], arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1}, va='center', ha='center')
CS = plt.contour(w0, w1, mse_vals, levels, linewidths=1,colors='black') plt.clabel(CS, inline=1, fontsize=8) plt.title("Contour Plot of Gradient Descent") plt.xlabel("w0") plt.ylabel("w1") plt.show() Conclusion:Notice the mse values are getting reduced from Contour plot is very useful to visualize complex structure in an easy way. Later we will use this same methodology for Ridge and Lasso regression. I hope this How to visualize Gradient Descent using Contour plot in Python tutorial will help you build much more complex visualization. |