Backpropagation
1. Backpropagation
- 네트워크 전체에 대해 반복적인 연쇄법칙(Chain rule)을 적용하여 그라디언트(Grient)를 계산하는 방법중 하나
2. Sigmoid 미분
sigmoid(x)=11+e−wx∂∂xsigmoid(x)=∂∂x(11+e−wx)=−∂∂x(1+e−wx)(1+e−wx)2=−∂∂x1+∂∂X(e−wx)(1+e−wx)2=−0+−(e−wx)(1+e−wx)2=−(−1∗(e−wx)(1+e−wx)2)=(e−wx)(1+e−wx)2=(1+e−wx−1)(1+e−wx)2=(1+e−wx)(1+e−wx)2−1(1+e−wx)2=11+e−wx−1(1+e−wx)2=11+e−wx(1−11+e−wx)=sigmoid(x)(1−sigmoid(x))
2. Cost(Loss) 함수 미분
a=sigmoid(x)=11+e−wxE=loss(a,t)=−tloga+(1−t)log(1−a)∂E∂a=∂E∂a(−tloga+(1−t)log(1−a))=∂E∂a(−tloga)+∂E∂a((1−t)log(1−a))=−ta+1−t1−a=−t(1−a)+a(1−t))a(1−a)=−t+at+a−ata(1−a)=a−ta(1−a)
3. Sigmoid Backpropagation
1) Chain rule에 따라 계산
(1)∂E∂a1=a1−ta1(1−a1)(2)∂E∂l=∂E∂a1∗∂a1∂l=a1−ta1(1−a1)∗a1(1−a1)=a1−t(3)∂E∂b=∂E∂l∗∂l∂b=(a1−t)∗1=a1−t(4)∂E∂o=∂E∂l∗∂l∂o=(a1−t)∗1=a1−t(5)∂E∂w=∂E∂o∗∂o∂w=(a1−t)∗a0=(a1−t)∗aT0
2) Sigmoid Backpropagation
소스코드
행렬을 transpose(전치)하는 것의 이유는 모르겠음...
import tensorflow as tf import numpy as np xy = np.loadtxt('Data/data-04-zoo.csv', delimiter=',', dtype=np.float32) X_data = xy[:, 0:-1] Y_data = xy[:, [-1]] print("Shape of X data: ", X_data.shape) print("Shape of Y data: ", Y_data.shape) print("Y data unique values: ", np.unique(Y_data)) nb_classes = 7 # 0~6 X = tf.placeholder(tf.float32, [None, 16]) Y = tf.placeholder(tf.int32, [None, 1]) target = tf.one_hot(Y, nb_classes) print("Before reshape: ", target.shape) target = tf.reshape(target, [-1, nb_classes]) print("After reshape: ", target.shape) target = tf.cast(target, tf.float32) W = tf.Variable(tf.random_normal([16, nb_classes]), name='weight') b = tf.Variable(tf.random_normal([nb_classes]), name='bias') def sigmoid_func(x): return 1. / (1. + tf.exp(-x)) def sigmoid_func_prime(x): return sigmoid_func(x) * (1. - sigmoid_func(x)) layer1 = tf.matmul(X, W) + b y_hat = sigmoid_func(layer1) loss_i = -target*tf.log(y_hat) + (1-target)*tf.log(1-y_hat) loss = tf.reduce_sum(loss_i) d_loss = (y_hat - target) / ((1. - y_hat) + 1e-7) d_sigmoid = sigmoid_func_prime(layer1) d_layer = d_loss * d_sigmoid d_b = d_layer d_W = tf.matmul(tf.transpose(X), d_layer) learning_rate = 0.01 train_step = [ tf.assign(W, W - learning_rate * d_W), tf.assign(b, b - learning_rate * tf.reduce_sum(d_b)), ] prediction = tf.argmax(y_hat, 1) acct_mat = tf.equal(tf.argmax(y_hat, 1), tf.argmax(target, 1)) acct_res = tf.reduce_mean(tf.cast(acct_mat, tf.float32)) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for step in range(500): sess.run(train_step, feed_dict={X: X_data, Y: Y_data}) if step % 10 == 0: step_loss, acc = sess.run([loss, acct_res], feed_dict={X: X_data, Y: Y_data}) print("Step: {:5} \t Loss: {:10.5f}\t Acc: {:.2%}".format(step, step_loss, acc)) pred = sess.run(prediction, feed_dict={X: X_data}) for p, y in zip(pred, Y_data): msg = "[{}]\tPrediction: {:d} \tY: {:d}" print(msg.format(p == int(y[0]), p, int(y[0])))
'AI&BigData > Deep Learning' 카테고리의 다른 글
Lab10-1. ReLU: Better non-linearity (0) | 2018.05.18 |
---|---|
Lab09-2. TensorBoard (0) | 2018.05.05 |
Lab09-1. NN for XOR (0) | 2018.04.27 |
Lab08-2. Deep Learning의 기본 개념 (0) | 2018.04.26 |
Lab08-1. Tensor Manipulation (0) | 2018.04.20 |
Lab07-2. MNIST data (0) | 2018.04.18 |
Lab07-1. Application&Tip (0) | 2018.04.18 |
Lab06. Softmax Classification (0) | 2018.04.18 |
Lab05. Logistic classification (0) | 2018.04.18 |