Exercise 4: Neural Networks Learning

一天两发,是因为这两次的练习内容都是神经网络,比较接近。

回顾第四周的作业,最后使用神经网络进行多分类预测的时候,Ng给出了训练好的Θ,本周的主要内容就是学习如何训练一个神经网络,最终得出Θ。

正向传播:Cost Function

参考公式:

K表示分类的数量,$ y^{(i)}$ 表示第 $i$ 个数据集的预测值,这个预测值的可能取值如下:

那么 $y_k^{(i)} = {0,1}$ ,正向传播的意思就是从左到右一层一层的计算,它的Cost Function就是每一层之间的逻辑回归Cost累加。

反向传播:误差计算和梯度

为求得 $ min_Θ J(Θ)$,使用梯度下降算法,则需要计算下面两个值:

  • $J(Θ)$
  • $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$

对于$J(Θ)$ ,用正向传播可以求得,而$ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ)$ 可以使用反向传播来求得,反向传播的计算流程如下。

$δ_j^{(l)}$ 表示 $Layer_l$ 的第 $j$ 个单元的误差,所以

  • $δ_j^{(4)}$ = $a_j^{(4)} - y_i$

而 $δ_j^{(3)}$ 、$δ_j^{(2)}$ 的推导过程比较复杂,这里直接给出计算公式。

并且 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) = a_j^{(l)}δ_i^{(l+1)} $ ,这个证明过程也很繁琐。

梯度检测

在计算 $ \frac {∂} {∂ Θ_{i, j}^{(l)}} J(Θ) $ 时,代码上可能会出现一些bug,所以为了检测代码是否写对了,在测试时可以对其计算的梯度进行检测,原理是:

我们取 $\epsilon$ 是一个很小的值,就能近似的计算偏导数,检查两者的差值就能得到我们的偏导数算法是否写对了。

作业1:实现正向传播

由于分类数量是K,所以对于每一组训练数据,都需要计算Cost值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);

% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================

X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

for i = 1:m
yi = zeros(num_labels, 1);
yi(y(i),1) = 1;
a3i = a3(i,:)';
J = J + sum(-yi .* log(a3i) - (1 - yi) .* log(1 - a3i));
end

J = 1/m * J;
rTheta1 = Theta1(:,2:size(Theta1,2));
rTheta2 = Theta2(:,2:size(Theta2,2));

J = J + lambda/(2*m) * (sum(sum(rTheta1 .^ 2)) + sum(sum(rTheta2 .^ 2)));

% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

作业2:sigmoid的导数

其实就是对sigmoid函数求导

作业3:反向传播

其实和正向传播是写在同一个地方的,因为本质上实在计算偏导数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);

% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
X = [ones(m,1) X];
a1 = X;
a2 = sigmoid(X * Theta1');
a2 = [ones(size(a2, 1), 1) a2];
a3 = sigmoid(a2 * Theta2');

bdelta_2 = 0;
bdelta_1 = 0;

for i = 1:m
yi = zeros(num_labels, 1);
yi(y(i),1) = 1;
a3i = a3(i,:)';
a2i = a2(i,:)';
a1i = a1(i,:)';

delta_3 = (a3i - yi);
delta_2 = Theta2' * delta_3;
delta_2 = delta_2(2:size(delta_2,1));
delta_2 = delta_2 .* sigmoidGradient(Theta1 * a1i);

bdelta_2 = bdelta_2 + delta_3*(a2i)';
bdelta_1 = bdelta_1 + delta_2*(a1i)';
end


Theta1_grad = 1/m * bdelta_1 + [zeros(size(Theta1, 1),1) lambda/m*rTheta1];
Theta2_grad = 1/m * bdelta_2 + [zeros(size(Theta2, 1),1) lambda/m*rTheta2];


% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

总结

感觉机器学习理解起来可能不难,但是写代码上却需要非常小心,尤其是对偏置单元的处理,做矩阵乘法时要检查size是否匹配,这些都很重要。

坚持原创文章分享,您的支持将鼓励我继续创作!