吴恩达机器学习L1W3L06-逻辑回归的梯度下降

目标

在本实验室，你可以看到

更新逻辑回归的梯度下降。
在一个熟悉的数据集上探索梯度下降

import copy, math
import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
from lab_utils_common import  dlc, plot_data, plt_tumor_data, sigmoid, compute_cost_logistic
from plt_quad_logistic import plt_quad_logistic, plt_prob
plt.style.use('./deeplearning.mplstyle')

数据集

让我们从决策边界实验室中使用的相同的两个特征数据集开始。

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])

和前面一样，我们将使用一个辅助函数来绘制这些数据。标签为 $y = 1$ 的数据点显示为红色叉，而标签为 $y = 0$ 的数据点显示为蓝色圆。

fig,ax = plt.subplots(1,1,figsize=(4,4))
plot_data(X_train, y_train, ax)ax.axis([0, 4, 0, 3.5])
ax.set_ylabel('$x_1$', fontsize=12)
ax.set_xlabel('$x_0$', fontsize=12)
plt.show()

逻辑回归的梯度下降

回想一下梯度下降算法利用了梯度计算:
在这里插入图片描述 $\begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1} \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\ &\rbrace \end{align*}$

其中每次迭代对所有 $j$ 同时执行 $w_j$ 的更新，
$\begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} \end{align*}$

M是数据集中训练样例的个数
$f_{\mathbf{w}，b}(x^{(i)})$ 是模型的预测，而 $y^{(i)}$ 是目标
对于逻辑回归模型
$\mathbf{w} \cdot \mathbf{x} + b$
$f_{\mathbf{w},b}(x) = g(z)$
where $g (z)$ is the sigmoid function:
$\frac{1}{1+e^{-z}}$

梯度下降实现

梯度下降算法的实现有两个部分:

实现上述公式(1)的循环。这是下面的gradient_descent，通常在可选和实践实验室中提供给您。
流速梯度的计算，如式(2、3)所示。这是下面的compute_gradient_logistic。你将被要求完成本周的实践实验。

计算梯度，代码描述

对所有 $w_j$ 和 $b$ 实现上述式(2)、(3)。
有很多方法可以实现这一点。下面概述如下:

初始化变量累加’ dj_dw ‘和’ dj_db ’

对于这些例子

计算该示例的误差 $g(\mathbf{w} \cdot \mathbf{x}^{(i)} + b) - \mathbf{y}^{(i)}$
对于本例中的每个输入值 $x_{j}^{(i)}$ ，
- 将错误值乘以输入的 $x_{j}^{(i)}$ ，并加上’ dj_dw '的相应元素。(上式2)
将错误添加到’ dj_db '(上面的公式3)
用’ dj_db ‘和’ dj_dw '除以样本总数(m)
注意, $\mathbf{x}^{(i)}$ 在numpy 或者 X[i,:]orX[i]和 $x_{j}^{(i)}$ is `X[i,j]

def compute_gradient_logistic(X, y, w, b): """Computes the gradient for linear regression Args:X (ndarray (m,n): Data, m examples with n featuresy (ndarray (m,)): target valuesw (ndarray (n,)): model parameters  b (scalar)      : model parameterReturnsdj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. dj_db (scalar)      : The gradient of the cost w.r.t. the parameter b. """m,n = X.shapedj_dw = np.zeros((n,))                           #(n,)dj_db = 0.for i in range(m):f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalarerr_i  = f_wb_i  - y[i]                       #scalarfor j in range(n):dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalardj_db = dj_db + err_idj_dw = dj_dw/m                                   #(n,)dj_db = dj_db/m                                   #scalarreturn dj_db, dj_dw

使用下面的单元格检查梯度函数的实现。

X_tmp = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_tmp = np.array([0, 0, 0, 1, 1, 1])
w_tmp = np.array([2.,3.])
b_tmp = 1.
dj_db_tmp, dj_dw_tmp = compute_gradient_logistic(X_tmp, y_tmp, w_tmp, b_tmp)
print(f"dj_db: {dj_db_tmp}" )
print(f"dj_dw: {dj_dw_tmp.tolist()}" )

预期结果

dj_db: 0.49861806546328574
dj_dw: [0.498333393278696, 0.49883942983996693]

梯度下降代码

实现上述方程(1)的代码如下所示。花点时间定位和比较例程中的函数与上面的方程。

def gradient_descent(X, y, w_in, b_in, alpha, num_iters): """Performs batch gradient descentArgs:X (ndarray (m,n)   : Data, m examples with n featuresy (ndarray (m,))   : target valuesw_in (ndarray (n,)): Initial values of model parameters  b_in (scalar)      : Initial values of model parameteralpha (float)      : Learning ratenum_iters (scalar) : number of iterations to run gradient descentReturns:w (ndarray (n,))   : Updated values of parametersb (scalar)         : Updated value of parameter """# An array to store cost J and w's at each iteration primarily for graphing laterJ_history = []w = copy.deepcopy(w_in)  #avoid modifying global w within functionb = b_infor i in range(num_iters):# Calculate the gradient and update the parametersdj_db, dj_dw = compute_gradient_logistic(X, y, w, b)   # Update Parameters using w, b, alpha and gradientw = w - alpha * dj_dw               b = b - alpha * dj_db               # Save cost J at each iterationif i<100000:      # prevent resource exhaustion J_history.append( compute_cost_logistic(X, y, w, b) )# Print cost every at intervals 10 times or as many iterations if < 10if i% math.ceil(num_iters / 10) == 0:print(f"Iteration {i:4d}: Cost {J_history[-1]}   ")return w, b, J_history         #return final w,b and J history for graphing

让我们对数据集运行梯度下降。

w_tmp  = np.zeros_like(X_train[0])
b_tmp  = 0.
alph = 0.1
iters = 10000w_out, b_out, _ = gradient_descent(X_train, y_train, w_tmp, b_tmp, alph, iters) 
print(f"\nupdated parameters: w:{w_out}, b:{b_out}")

我们来绘制梯度下降的结果:

fig,ax = plt.subplots(1,1,figsize=(5,4))
# plot the probability 
plt_prob(ax, w_out, b_out)# Plot the original data
ax.set_ylabel(r'$x_1$')
ax.set_xlabel(r'$x_0$')   
ax.axis([0, 4, 0, 3.5])
plot_data(X_train,y_train,ax)# Plot the decision boundary
x0 = -b_out/w_out[1]
x1 = -b_out/w_out[0]
ax.plot([0,x0],[x1,0], c=dlc["dlblue"], lw=1)
plt.show()

在上图中:

阴影反映了概率y=1(决策边界之前的结果)
决策边界是概率= 0.5处的那条线

添加另外一组数据集

让我们回到单变量数据集。仅使用两个参数， $w$ ， $b$ ，就可以使用等高线图来绘制成本函数，从而更好地了解梯度下降的情况。

x_train = np.array([0., 1, 2, 3, 4, 5])
y_train = np.array([0,  0, 0, 1, 1, 1])

和前面一样，我们将使用一个辅助函数来绘制这些数据。标签为 $y = 1$ 的数据点显示为红色叉，而标签为 $y = 0$ 的数据点显示为黑色圆。

fig,ax = plt.subplots(1,1,figsize=(4,3))
plt_tumor_data(x_train, y_train, ax)
plt.show()

在下面的图中，尝试:

通过点击右上角的等高线图来改变 $w$ 和 $b$ 。
- 改变可能需要一两秒钟
- 请注意左上角图中成本值的变化。
- 注意，每个示例中的成本是通过损失累积的(垂直虚线)
- 点击等高线图将重置模型以进行新的运行
要重置绘图，请重新运行单元格

w_range = np.array([-1, 7])
b_range = np.array([1, -14])
quad = plt_quad_logistic( x_train, y_train, w_range, b_range )

祝贺

你已经

考察了逻辑回归的梯度计算的公式和实现
利用这些例程
- 探索单个变量数据集
- 探索一个双变量数据集