Softmax Classification

1. Multinomial Classification

다중 클래스(multi class) 분류 문제는 각 클래스에 소속될 확률이 가장 높은 클래스를 분류 결과로 도출함

1) Hypothesis: Softmax 함수

각 클래스의 확률은 softmax 함수를 이용해 구함
정답인 클래스에 해당하는 확률값이 1.0, 나머지는 0.0일때 최상
softmax 함수로 나온 전체 값을 더하면 1이 됨: 정규화

2) Cost: Cross entropy

Negative log-likelihood(NLL)

Cross entropy를 cost function으로 사용하며 cost를 loss라고도 함

Entropy: 정보를 최적으로 인코딩하기 위해 필요한 bit의 수
- 그러나 단순히 정보량의 가지수에 비례하지 않음, 각 lable의 확률분포와 관련됨
Cross entropy: entropy 보다 약간 큰값, 즉, 잘못 획득된 정보로 구한 확률
- softmax와 NLL loss를 cross entropy loss라고 부름

Cross entropy의 Y값은 one-hot encoding하여 사용
- One-hot encoding: softmax로 나온 값중 가장 큰 값은 1, 다른값들은 모두 0으로 만드는 것

3) Optimizer

Cost Function 최소화를 위해 학습을 진행하며 gradient decent optimizer를 사용

$\begin{align*} H(X) &= Y = WX \\\\\\ \hat{Y} &= S(y_i) \\ &= \frac{e^{y_i}}{\sum_{j=1}^i{e^{y_j}}} \qquad y_j = {w^T}x = w_{i1}x_1 + w_{i2}x_2 + w_{i3}x_3 + ...\\\\\\ D(S,L) =& -\sum_iL_i\log(S_i) \\ &= - \sum_iL_i\log(\bar{Y_i})\\ &= \sum_iL_i*(-\log(\bar{Y_i)}) \\\\\\ \mathcal{L} &= \frac{1}{N}\sum_iD(S(WX_i + b), L_i) \end{align*}$

2. Example Code

1) Softmax Classification

소스코드

softmax는 tf.nn.softmax를 사용

import tensorflow as tf

x_data = [[1, 2, 1, 1], [2, 1, 3, 2], [3, 1, 3, 4], [4, 1, 5, 5],
          [1, 7, 5, 5], [1, 2, 5, 6], [1, 6, 6, 6], [1, 7, 7, 7]]

# one-hot encoding
y_data = [[0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 1, 0], [1, 0, 0], [1, 0, 0]]

X = tf.placeholder("float", [None, 4])
Y = tf.placeholder("float", [None, 3])
nb_classes = 3

W = tf.Variable(tf.random_normal([4, nb_classes]), name='weight')
b = tf.Variable(tf.random_normal([nb_classes]), name='bias')

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(2001):
       sess.run(optimizer, feed_dict={X: x_data, Y: y_data})

       if step % 20 == 0:
           print("step: ", step, "\tcost: ", sess.run(cost, feed_dict={X: x_data, Y: y_data}))

    # test
    a = sess.run(hypothesis, feed_dict={X: [[1, 11, 7, 9]]})
    print("\na: ", a, "argmax(a):", sess.run(tf.argmax(a, 1)))

    all = sess.run(hypothesis, feed_dict={X: [[1, 11, 7, 9],
                                              [1, 3, 4, 3],
                                              [1, 1, 0, 1]]})
    print("\nall: ", all, "argmax(all):", sess.run(tf.argmax(all, 1)))

결과

(...)
step:  1900 	cost:  0.171269
step:  1920 	cost:  0.16998039
step:  1940 	cost:  0.16871029
step:  1960 	cost:  0.1674581
step:  1980 	cost:  0.16622359
step:  2000 	cost:  0.16500616

a:  [[2.3063349e-02 9.7692788e-01 8.7165672e-06]] argmax(a): [1]

all:  [[2.3063349e-02 9.7692788e-01 8.7165672e-06]
 [7.2510439e-01 2.5552630e-01 1.9369263e-02]
 [1.8163204e-08 3.8551138e-04 9.9961448e-01]] argmax(all): [1 0 2]

2) Softmax_Zoo_Classification

소스코드

one-hot encoding은 tf.one_hot 사용, one_hot 이후 tf.reshape을 하지 않으면 에러 발생
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y_one_hot)를 통해 cost를 구하며 전달 파라미터에 유의

import tensorflow as tf
import numpy as np

xy = np.loadtxt('Data/data-04-zoo.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

# classes: 0~6
nb_classes = 7

X = tf.placeholder(tf.float32, [None, 16])
Y = tf.placeholder(tf.int32, [None, 1])

# one-hot encoding: input N, output N+1
Y_one_hot = tf.one_hot(Y, nb_classes)
# reshape: (?, 1, 7) -> (?, 7)
#print("one_hot", Y_one_hot)
Y_one_hot = tf.reshape(Y_one_hot, [-1, nb_classes])
#print("reshape", Y_one_hot)

W = tf.Variable(tf.random_normal([16, nb_classes]), name='weight')
b = tf.Variable(tf.random_normal([nb_classes]), name='bias')

logits = tf.matmul(X, W) + b
hypothesis = tf.nn.softmax(logits)

# cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis), axis=1))
# cross_entropy
cost_i = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y_one_hot)
cost = tf.reduce_mean(cost_i)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

prediction = tf.argmax(hypothesis, 1)
correct_prediction = tf.equal(prediction, tf.argmax(Y_one_hot, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Lanch graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(2000):
        sess.run(optimizer, feed_dict={X: x_data, Y: y_data})

        if step % 100 == 0:
            loss, acc = sess.run([cost, accuracy], feed_dict={X: x_data, Y: y_data})

            print("Step: {:5} \tLoss: {:.3f} \tAcc: {:.2%}".format(step, loss, acc))

    pred = sess.run(prediction, feed_dict={X: x_data})
    for p, y in zip(pred, y_data.flatten()):
        print("[{}] \tprediction: {} \tTRUE \tY: {}".format(p == int(y), p, int(y)))

결과

(...)
Step:  1500 	Loss: 0.076 	Acc: 100.00%
Step:  1600 	Loss: 0.072 	Acc: 100.00%
Step:  1700 	Loss: 0.067 	Acc: 100.00%
Step:  1800 	Loss: 0.064 	Acc: 100.00%
Step:  1900 	Loss: 0.060 	Acc: 100.00%
[True] 	prediction: 0 	TRUE 	Y: 0
[True] 	prediction: 0 	TRUE 	Y: 0
[True] 	prediction: 3 	TRUE 	Y: 3
[True] 	prediction: 0 	TRUE 	Y: 0
[True] 	prediction: 0 	TRUE 	Y: 0
(...)

저작자표시 비영리 변경금지