Lab06. Softmax Classification

2018. 4. 18. 11:11

lab06.md

Softmax Classification

1. Multinomial Classification

다중 클래스(multi class) 분류 문제는 각 클래스에 소속될 확률이 가장 높은 클래스를 분류 결과로 도출함

1) Hypothesis: Softmax 함수

각 클래스의 확률은 softmax 함수를 이용해 구함
정답인 클래스에 해당하는 확률값이 1.0, 나머지는 0.0일때 최상
softmax 함수로 나온 전체 값을 더하면 1이 됨: 정규화

2) Cost: Cross entropy

Negative log-likelihood(NLL)

Cross entropy를 cost function으로 사용하며 cost를 loss라고도 함

Entropy: 정보를 최적으로 인코딩하기 위해 필요한 bit의 수
- 그러나 단순히 정보량의 가지수에 비례하지 않음, 각 lable의 확률분포와 관련됨
Cross entropy: entropy 보다 약간 큰값, 즉, 잘못 획득된 정보로 구한 확률
- softmax와 NLL loss를 cross entropy loss라고 부름

Cross entropy의 Y값은 one-hot encoding하여 사용
- One-hot encoding: softmax로 나온 값중 가장 큰 값은 1, 다른값들은 모두 0으로 만드는 것

3) Optimizer

Cost Function 최소화를 위해 학습을 진행하며 gradient decent optimizer를 사용

$\begin{align*} H(X) &= Y = WX \\\\\\ \hat{Y} &= S(y_i) \\ &= \frac{e^{y_i}}{\sum_{j=1}^i{e^{y_j}}} \qquad y_j = {w^T}x = w_{i1}x_1 + w_{i2}x_2 + w_{i3}x_3 + ...\\\\\\ D(S,L) =& -\sum_iL_i\log(S_i) \\ &= - \sum_iL_i\log(\bar{Y_i})\\ &= \sum_iL_i*(-\log(\bar{Y_i)}) \\\\\\ \mathcal{L} &= \frac{1}{N}\sum_iD(S(WX_i + b), L_i) \end{align*}$

2. Example Code

1) Softmax Classification

소스코드

softmax는 tf.nn.softmax를 사용

import tensorflow as tf

x_data = [[1, 2, 1, 1], [2, 1, 3, 2], [3, 1, 3, 4], [4, 1, 5, 5],
          [1, 7, 5, 5], [1, 2, 5, 6], [1, 6, 6, 6], [1, 7, 7, 7]]

# one-hot encoding
y_data = [[0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0], [1, 0, 0], [1, 0, 0]]

X = tf.placeholder("float", [None, 4])
Y = tf.placeholder("float", [None, 3])
nb_classes = 3

W = tf.Variable(tf.random_normal([4, nb_classes]), name='weight')
b = tf.Variable(tf.random_normal([nb_classes]), name='bias')

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(2001):
       sess.run(optimizer, feed_dict={X: x_data, Y: y_data})

       if step % 20 == 0:
           print("step: ", step, "\tcost: ", sess.run(cost, feed_dict={X: x_data, Y: y_data}))

    # test
    a = sess.run(hypothesis, feed_dict={X: [[1, 11, 7, 9]]})
    print("\na: ", a, "argmax(a):", sess.run(tf.argmax(a, 1)))

    all = sess.run(hypothesis, feed_dict={X: [[1, 11, 7, 9],
                                              [1, 3, 4, 3],
                                              [1, 1, 0, 1]]})
    print("\nall: ", all, "argmax(all):", sess.run(tf.argmax(all, 1)))

결과

(...)
step:  1900 	cost:  0.171269
step:  1920 	cost:  0.16998039
step:  1940 	cost:  0.16871029
step:  1960 	cost:  0.1674581
step:  1980 	cost:  0.16622359
step:  2000 	cost:  0.16500616

a:  [[2.3063349e-02 9.7692788e-01 8.7165672e-06]] argmax(a): [1]

all:  [[2.3063349e-02 9.7692788e-01 8.7165672e-06]
 [7.2510439e-01 2.5552630e-01 1.9369263e-02]
 [1.8163204e-08 3.8551138e-04 9.9961448e-01]] argmax(all): [1 0 2]

2) Softmax_Zoo_Classification