AI&BigData/Deep Learning
Lab10-1. ReLU: Better non-linearity
eunguru
2018. 5. 18. 18:05
ReLU: Better non-linearity
1. XOR문제 - 9 hidden layers: poor result
1) 소스코드
1 input layer, output layer, 9 hidden layers
sigmoid 함수 사용
(...) with tf.name_scope('Layer1') as scope: W1 = tf.Variable(tf.random_uniform([2, 5], -1.0, 1.0), name='weight1') b1 = tf.Variable(tf.zeros([5]), name='bias1') L1 = tf.sigmoid(tf.matmul(X, W1) + b1) with tf.name_scope('Layer2') as scope: W2 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight2') b2 = tf.Variable(tf.zeros([5]), name='bias2') L2 = tf.sigmoid(tf.matmul(L1, W2) + b2) with tf.name_scope('Layer3') as scope: W3 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight3') b3 = tf.Variable(tf.zeros([5]), name='bias3') L3 = tf.sigmoid(tf.matmul(L2, W3) + b3) with tf.name_scope('Layer4') as scope: W4 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight4') b4 = tf.Variable(tf.zeros([5]), name='bias4') L4 = tf.sigmoid(tf.matmul(L3, W4) + b4) with tf.name_scope('Layer5') as scope: W5 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight5') b5 = tf.Variable(tf.zeros([5]), name='bias5') L5 = tf.sigmoid(tf.matmul(L4, W5) + b5) with tf.name_scope('Layer6') as scope: W6 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight6') b6 = tf.Variable(tf.zeros([5]), name='bias6') L6 = tf.sigmoid(tf.matmul(L5, W6) + b6) with tf.name_scope('Layer7') as scope: W7 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight7') b7 = tf.Variable(tf.zeros([5]), name='bias7') L7 = tf.sigmoid(tf.matmul(L6, W7) + b7) with tf.name_scope('Layer8') as scope: W8 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight8') b8 = tf.Variable(tf.zeros([5]), name='bias8') L8 = tf.sigmoid(tf.matmul(L7, W8) + b8) with tf.name_scope('Layer9') as scope: W9 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight9') b9 = tf.Variable(tf.zeros([5]), name='bias9') L9 = tf.sigmoid(tf.matmul(L8, W9) + b9) with tf.name_scope('Layer10') as scope: W10 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight10') b10 = tf.Variable(tf.zeros([5]), name='bias10') L10 = tf.sigmoid(tf.matmul(L9, W10) + b10) with tf.name_scope('Hypothesis') as scope: W11 = tf.Variable(tf.random_uniform([5, 1], -1.0, 1.0), name='weight11') b11 = tf.Variable(tf.zeros([1]), name='bias11') hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11) (...)
Graph
2) 결과
Cost가 떨어지지 않으며 Accuracy가 2개의 NN보다 나쁜 결과로 나옴
(...) Step: 196000 Cost: 0.6931472 [array([[ 0.15111303, -0.7233987 , -0.08999562, 0.4218123 , -0.38752127], (...) Step: 198000 Cost: 0.6931472 [array([[ 0.15111303, -0.7233987 , -0.08999562, 0.4218123 , -0.38752127], (...) Hypothesis: [[0.49999994] [0.49999994] [0.50000006] [0.50000006]] Correct: [[0.] [0.] [1.] [1.]] Accuracy: 0.5
Cost / Accuracy graph
3) 문제 발생의 원인: backpropagation의 문제
Active function으로 sigmoid를 사용했기 때문에, backpropagation에서 vanishing gradient 문제 발생
sigmoid의 결과는 0~1의 값, 입력값이 0보다 작은 경우 0에 가까운 값이 출력
backpropagation에서는 gradient를 chain rule에 의해 계산하는데 만약 0에 가까운 값이 계속 곱해지는 경우
미분값이 점점 작이지는 문제 발생
Vanishing gadient의 의미
- 경사도가 사라진다
- 학습하기 어렵다
- 입력이 출력에 미치는 영향도가 없다
2. 문제의 해결: ReLU activation function의 사용
1) ReLU 함수: max(0, x)
2) 소스코드
ReLU activation function 사용(단, 마지막은 0~1사이의 값을 출력해야하므로 sigmoid 함수를 사용해야 함)
(...) with tf.name_scope('Layer1') as scope: W1 = tf.Variable(tf.random_uniform([2, 5], -1.0, 1.0), name='weight1') b1 = tf.Variable(tf.zeros([5]), name='bias1') L1 = tf.nn.relu(tf.matmul(X, W1) + b1) (...) with tf.name_scope('Layer10') as scope: W10 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight10') b10 = tf.Variable(tf.zeros([5]), name='bias10') L10 = tf.nn.relu(tf.matmul(L9, W10) + b10) with tf.name_scope('Hypothesis') as scope: W11 = tf.Variable(tf.random_uniform([5, 1], -1.0, 1.0), name='weight11') b11 = tf.Variable(tf.zeros([1]), name='bias11') hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11) (...)
Graph
3) 결과
Cost가 낮은 값으로 출력되며 Accuracy 또한 높음
(...) Step: 196000 Cost: 0.00051748654 [array([[-0.36031628, -0.7233987 , -0.08999562, 0.7240177 , -0.73926085], (...) Step: 198000 Cost: 0.0005120867 [array([[-0.3604096 , -0.7233987 , -0.08999562, 0.7241369 , -0.73939353], (...) Hypothesis: [[0.00101205] [0.9999987 ] [0.9999994 ] [0.00101205]] Correct: [[0.] [1.] [1.] [0.]] Accuracy: 1.0
4) sigmoid와 ReLU 함수 사용 시 cost/accuracy graph 비교
3. Non-linear activation function
- sigmoid
- tanh
- ReLU
- Leaky ReLU
- Maxout
- ELU