Lab10_ReLU_Better_non_linearity

ReLU: Better non-linearity

1. XOR문제 - 9 hidden layers: poor result

1) 소스코드

  • 1 input layer, output layer, 9 hidden layers

  • sigmoid 함수 사용

    (...)
    with tf.name_scope('Layer1') as scope:
        W1 = tf.Variable(tf.random_uniform([2, 5], -1.0, 1.0), name='weight1')
        b1 = tf.Variable(tf.zeros([5]), name='bias1')
        L1 = tf.sigmoid(tf.matmul(X, W1) + b1)
    with tf.name_scope('Layer2') as scope:
        W2 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight2')
        b2 = tf.Variable(tf.zeros([5]), name='bias2')
        L2 = tf.sigmoid(tf.matmul(L1, W2) + b2)
    with tf.name_scope('Layer3') as scope:
        W3 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight3')
        b3 = tf.Variable(tf.zeros([5]), name='bias3')
        L3 = tf.sigmoid(tf.matmul(L2, W3) + b3)
    with tf.name_scope('Layer4') as scope:
        W4 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight4')
        b4 = tf.Variable(tf.zeros([5]), name='bias4')
        L4 = tf.sigmoid(tf.matmul(L3, W4) + b4)
    with tf.name_scope('Layer5') as scope:
        W5 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight5')
        b5 = tf.Variable(tf.zeros([5]), name='bias5')
        L5 = tf.sigmoid(tf.matmul(L4, W5) + b5)
    with tf.name_scope('Layer6') as scope:
        W6 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight6')
        b6 = tf.Variable(tf.zeros([5]), name='bias6')
        L6 = tf.sigmoid(tf.matmul(L5, W6) + b6)
    with tf.name_scope('Layer7') as scope:
        W7 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight7')
        b7 = tf.Variable(tf.zeros([5]), name='bias7')
        L7 = tf.sigmoid(tf.matmul(L6, W7) + b7)
    with tf.name_scope('Layer8') as scope:
        W8 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight8')
        b8 = tf.Variable(tf.zeros([5]), name='bias8')
        L8 = tf.sigmoid(tf.matmul(L7, W8) + b8)
    with tf.name_scope('Layer9') as scope:
        W9 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight9')
        b9 = tf.Variable(tf.zeros([5]), name='bias9')
        L9 = tf.sigmoid(tf.matmul(L8, W9) + b9)
    with tf.name_scope('Layer10') as scope:
        W10 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight10')
        b10 = tf.Variable(tf.zeros([5]), name='bias10')
        L10 = tf.sigmoid(tf.matmul(L9, W10) + b10)
    with tf.name_scope('Hypothesis') as scope:
        W11 = tf.Variable(tf.random_uniform([5, 1], -1.0, 1.0), name='weight11')
        b11 = tf.Variable(tf.zeros([1]), name='bias11')
        hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)
    (...)
    

     

  • Graph


2) 결과

  • Cost가 떨어지지 않으며 Accuracy가 2개의 NN보다 나쁜 결과로 나옴

    (...)
    Step:  196000 	Cost:  0.6931472 [array([[ 0.15111303, -0.7233987 , -0.08999562,  0.4218123 , -0.38752127],
    (...)
    Step:  198000 	Cost:  0.6931472 [array([[ 0.15111303, -0.7233987 , -0.08999562,  0.4218123 , -0.38752127],
    (...)
    
    Hypothesis:  [[0.49999994]
     [0.49999994]
     [0.50000006]
     [0.50000006]] 
    Correct:  [[0.]
     [0.]
     [1.]
     [1.]] 
    Accuracy:  0.5
    

     

  • Cost / Accuracy graph


 

3) 문제 발생의 원인: backpropagation의 문제

  • Active function으로 sigmoid를 사용했기 때문에, backpropagation에서 vanishing gradient 문제 발생


    • sigmoid의 결과는 0~1의 값, 입력값이 0보다 작은 경우 0에 가까운 값이 출력

    • backpropagation에서는 gradient를 chain rule에 의해 계산하는데 만약 0에 가까운 값이 계속 곱해지는 경우

      미분값이 점점 작이지는 문제 발생

  • Vanishing gadient의 의미

    • 경사도가 사라진다
    • 학습하기 어렵다
    • 입력이 출력에 미치는 영향도가 없다

     

2. 문제의 해결: ReLU activation function의 사용

1) ReLU 함수: max(0, x)


2) 소스코드

  • ReLU activation function 사용(단, 마지막은 0~1사이의 값을 출력해야하므로 sigmoid 함수를 사용해야 함)

    (...)
    with tf.name_scope('Layer1') as scope:
        W1 = tf.Variable(tf.random_uniform([2, 5], -1.0, 1.0), name='weight1')
        b1 = tf.Variable(tf.zeros([5]), name='bias1')
        L1 = tf.nn.relu(tf.matmul(X, W1) + b1)
    (...)
    with tf.name_scope('Layer10') as scope:
        W10 = tf.Variable(tf.random_uniform([5, 5], -1.0, 1.0), name='weight10')
        b10 = tf.Variable(tf.zeros([5]), name='bias10')
        L10 = tf.nn.relu(tf.matmul(L9, W10) + b10)
    with tf.name_scope('Hypothesis') as scope:
        W11 = tf.Variable(tf.random_uniform([5, 1], -1.0, 1.0), name='weight11')
        b11 = tf.Variable(tf.zeros([1]), name='bias11')
        hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)
    (...)
    

 

  • Graph


 

3) 결과

  • Cost가 낮은 값으로 출력되며 Accuracy 또한 높음

    (...)
    Step:  196000 	Cost:  0.00051748654 [array([[-0.36031628, -0.7233987 , -0.08999562,  0.7240177 , -0.73926085],
    (...)
    Step:  198000 	Cost:  0.0005120867 [array([[-0.3604096 , -0.7233987 , -0.08999562,  0.7241369 , -0.73939353],
    (...)
    Hypothesis:  [[0.00101205]
     [0.9999987 ]
     [0.9999994 ]
     [0.00101205]] 
    Correct:  [[0.]
     [1.]
     [1.]
     [0.]] 
    Accuracy:  1.0
    

     

4) sigmoid와 ReLU 함수 사용 시 cost/accuracy graph 비교


 

3. Non-linear activation function

  • sigmoid
  • tanh
  • ReLU
  • Leaky ReLU
  • Maxout
  • ELU


'AI&BigData > Deep Learning' 카테고리의 다른 글

Lab09-3. Sigmoid Backpropagation  (0) 2018.05.18
Lab09-2. TensorBoard  (0) 2018.05.05
Lab09-1. NN for XOR  (0) 2018.04.27
Lab08-2. Deep Learning의 기본 개념  (0) 2018.04.26
Lab08-1. Tensor Manipulation  (0) 2018.04.20
Lab07-2. MNIST data  (0) 2018.04.18
Lab07-1. Application&Tip  (0) 2018.04.18
Lab06. Softmax Classification  (0) 2018.04.18
Lab05. Logistic classification  (0) 2018.04.18