Lecture 10~12

<Lec 10 ์ •๋ฆฌ>

Sigmoid function ์„ network ์—์„œ๋Š” Activation function ์ด๋ผ๊ณ  ๋งŽ์ด ๋ถ€๋ฅธ๋‹ค.

- Layer๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ ์žˆ์„ ๋•Œ ์ฒ˜์Œ ๋“ค์–ด๊ฐ€๋Š” ๋ถ€๋ถ„์€ Input layer ์ถœ๋ ฅ ๋ถ€๋ถ„์€ output layer, ๊ทธ๋ฆฌ๊ณ  ๊ฐ€์šด๋ฐ๋Š” Hidden Layer๋ผ๊ณ  ํ•œ๋‹ค.

- Layer๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ ์žˆ์„๋•Œ ์„ ํ–‰ํ•˜๋Š” ์•ž lyaer์˜ output๊ณผ ๊ทธ ๋ฐ”๋กœ ๋’ค์˜ input์ด ์ผ์น˜ํ•ด์•ผํ•œ๋‹ค.

Backpropagation (lec 9-2 ๋‚ด์šฉ)

- ๊ฒฐ๊ณผ ๊ฐ’์— ๋ฏธ์นœ ์˜ํ–ฅ์„ ์•Œ๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ์˜ ๊ฐ’์„ ๋ฏธ๋ถ„. ์—ฌ๊ธฐ์„œ Sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ : ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•ด ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์ •๋„๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ์ถœ๋ ฅ์„ ์กฐ์ •. ์ด ๋•Œ Sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 0~1 ์‚ฌ์ด์˜ ๊ฐ’์ด ์ถœ๋ ฅ์ด ๋˜๋ฏ€๋กœ ์ œ๋Œ€๋กœ ๋œ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์ง€ ๋ชปํ•จ. ์ด๋ฅผ Vanishing Gradient ์ด๋ผ ํ•จ. - ์ตœ์ข… ๋‹จ ๊ทผ์ฒ˜์— ์žˆ๋Š” ๊ฒฝ์‚ฌ๋‚˜ ๊ธฐ์šธ๊ธฐ๋Š” ๋‚˜ํƒ€๋‚˜์ง€๋งŒ, ์•ž์œผ๋กœ ๊ฐˆ ์ˆ˜๋ก ๊ฒฝ์‚ฌ๋„๊ฐ€ ์‚ฌ๋ผ์ง.

Vanishing Gradient

=> Layer์—์„œ Sigmoid ํ•จ์ˆ˜ ๋Œ€์‹  ReLU ๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ ๋งˆ์ง€๋ง‰ ๋‹จ์—์„  sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. 0~1 ์‚ฌ์ด ๊ฐ’์„ ์ถœ๋ ฅํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ.

Weight ์ดˆ๊ธฐํ™” ๋ชจ๋“  weight ์„ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ๋ชจ๋“  ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ด ๋˜๋ฏ€๋กœ ๋ชจ๋“  gradient๊ฐ€ ์‚ฌ๋ผ์ง„๋‹ค. โ†’ ๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋ชจ๋“  ๊ฐ’์„ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ์•ˆ๋จ. => RBM ์„ ์ด์šฉํ•ด ์ดˆ๊ธฐํ™” ํ•ด์•ผ ํ•œ๋‹ค.

RBM : ๊ฐ€์ง€๊ณ  ์žˆ๋˜ x ๊ฐ’๊ณผ ์ƒ์„ฑ๋œ x' ์˜ ๊ฐ’์„ ๋น„๊ตํ•จ. ์ด ์ฐจ์ด๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋„๋ก weight ์„ ์กฐ์ •.

- ์ธต์ด ์—ฌ๋Ÿฌ๊ฐœ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ ๊ฐ layer ๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ weight๋ฅผ ์ดˆ๊ธฐํ™” ์‹œํ‚จ๋‹ค.

Drop out - ๋žœ๋คํ•˜๊ฒŒ ์–ด๋–ค ๋‰ด๋Ÿฐ๋“ค์„ ๋Š์–ด๋‚ธ ํ›„ ํ›ˆ๋ จํ•œ๋‹ค. (๋‰ด๋Ÿฐ๋“ค ์‚ญ์ œ) - ๋งˆ์ง€๋ง‰์— ๋Š์–ด๋ƒˆ๋˜ ๋‰ด๋Ÿฐ๋“ค์„ ๋™์› ํ•ด ์˜ˆ์ธกํ•œ๋‹ค. โ†’ overfitting ํ•ด๊ฒฐ ๊ฐ€๋Šฅ. ๋” ์ข‹์€ ์„ฑ๋Šฅ.

Ensemble (์•™์ƒ๋ธ”)

- ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œ์ผœ ๋งŒ๋“  ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•ฉ์นœ ๋ชจ๋ธ์ด๋‹ค.

- ๋” ์ข‹์€ ์„ฑ๋Šฅ์œผ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. (๋Œ€๋žต 4~5% ํ–ฅ์ƒ ๊ฐ€๋Šฅ.)

๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ

- Fast forward : ์—ฌ๋Ÿฌ๊ฐœ์˜ Layer ๊ฐ€ ์žˆ์„ ๋•Œ ๋ช‡ ๋‹จ์„ ๊ฑด๋„ˆ ๋›ฐ๋„๋ก ํ•˜๋Š” ๊ตฌ์กฐ.

- Split & Merge : ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ํ›ˆ๋ จ์‹œํ‚ค๋‹ค ์—ฌ๋Ÿฌ๊ฐœ๋กœ ํ•ฉ์น˜๋Š” ๊ตฌ์กฐ.

- Recurrent network : ์˜†์œผ๋กœ๋„ Layer๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ์žฌ๊ท€์ ์œผ๋กœ.

<Lec 11 ์ •๋ฆฌ>

Convolutional Neural Networks (CNN)

- ๋ถ€๋ถ„์„ ๋‚˜๋ˆ„์–ด ์ฝ์€ ํ›„ ์ „์ฒด๋ฅผ ํ•ฉ์น˜๋Š” ๊ธฐ๋ฒ•.

<์˜ˆ์‹œ>

1) 32*32*3 ์ด๋ผ๋Š” image๋ฅผ ์ž…๋ ฅ.

2) 5*5*3 filter๋ฅผ ๋ณธ๋‹ค. (ํ•„๋“œ์˜ ํฌ๊ธฐ๋Š” ์ž„์˜๋กœ ์ž…๋ ฅ ๊ฐ€๋Šฅ.)

3) ์œ„์˜ ํ•„ํ„ฐ๋Š” ๊ถ๊ทน์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๊ฐ’์„ ์˜๋ฏธํ•จ. = ํ•œ ์ ๋งŒ ๋ฝ‘์•„๋‚ด ์ถœ๋ ฅํ•œ๋‹ค.

4) ์œ„์˜ ์ถœ๋ ฅ ๊ฐ’์„ weight์œผ๋กœ ์ง€์ •ํ•ด ์ „์ฒด ๊ทธ๋ฆผ์„ ํ›‘๋Š”๋‹ค. (๋ช‡ ์นธ์”ฉ ์›€์ง์ผ์ง€๋ฅผ 'stride'๋ผ ํ•˜๋Š”๋ฐ, ์ด ๊ฐ’์€ ์ž„์˜๋กœ ์„ค์ •)

5) ์ „์ฒด ๊ทธ๋ฆผ์ด ๋ช‡๊ฐœ์˜ ๊ฐ’์„ ๋ชจ์•˜๋Š”์ง€๋ฅผ ๊ณ„์‚ฐ. (ex : 7*7 input ์—์„œ 3*3 filter๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 5*5 output ์ด ๋‚˜์˜จ๋‹ค.)

=> Output size = (N-F) / stride + 1

(์—ฌ๊ธฐ์„œ N์€ input image์˜ ํฌ๊ธฐ, F ๋Š” Filter ์˜ size)

์œ„์˜ ์˜ˆ์‹œ์™€ ๊ฐ™์€ ๊ฒฝ์šฐ image๊ฐ€ ์ ์  ์ž‘์•„์ง€๋Š”๋ฐ, ๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด ์ •๋ณด๋ฅผ ์žƒ์–ด๋ฒ„๋ฆฐ๋‹ค.

โ†’ Padding ์ด๋ผ๋Š” ๊ฐœ๋… ์‚ฌ์šฉ.

Padding : ๊ทธ๋ฆผ์ด ๋„ˆ๋ฌด ์ž‘์•„์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€, ๋ชจ์„œ๋ฆฌ ๋ถ€๋ถ„์„ ๋„คํŠธ์›Œํฌ์— ์•Œ๋ ค์คŒ. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ž…๋ ฅ์˜ ์ด๋ฏธ์ง€์™€ ์ถœ๋ ฅ์˜ ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ™๊ฒŒ ๋งŒ๋“ค์–ด์ค€๋‹ค.

Actiation maps

- ๊นŠ์ด๊ฐ€ ํ•„ํ„ฐ์˜ ๊ฐœ์ˆ˜์ธ ์ถœ๋ ฅ์„ ๊ฐ€์ง. (์—ฌ๋Ÿฌ๊ฐœ์˜ filter๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ)

- ๊ฐ’์ด ( a, b, c) ํ˜•ํƒœ๋กœ ๋‚˜์˜ค๋Š”๋ฐ ์—ฌ๊ธฐ์„œ a,b ๋Š” filter์˜ ์‚ฌ์ด์ฆˆ, c๋Š” ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•จ.

- ์ด activation maps์— convolution์„ ์—ฌ๋Ÿฌ๋ฒˆ ์ ์šฉํ•˜๋ฉฐ ๋ฐ˜๋ณตํ•œ๋‹ค.

Pooling layer (Sampling)

- ์ด๋ฏธ์ง€์—์„œ filter ์ฒ˜๋ฆฌ ํ•ด Convolution Layer๋ฅผ ๋งŒ๋“ค์–ด๋ƒ„. ์—ฌ๊ธฐ์„œ ํ•œ layer๋งŒ ๋ฝ‘์•„๋‚ธ๋‹ค.

- ์ด๋ฏธ์ง€๋ฅผ resize ํ•จ (์ž‘๊ฒŒ ๋งŒ๋“ค๊ธฐ) = ์ด๋ฅผ Pooling ์ด๋ผ ํ•จ.

- ์œ„์˜ ๊ฐ’๋“ค์„ ๋‹ค์‹œ ์Œ“๋Š”๋‹ค. Sampling ํ•œ ๊ฒƒ๋“ค์„ ๋ชจ์œผ๋Š” ํ˜•ํƒœ

โ†’ Max Pooling : ํ”ฝ์…€ ๋ชจ์Œ์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ๊ณ ๋ฅธ๋‹ค.

Convolution โ†’ ReLU โ†’Convolution โ†’ReLU โ†’Pooling โ†’...๊ณ„์†๋ฐ˜๋ณต

๋งˆ์ง€๋ง‰์—์„œ Pooling ์ž‘์—…์„ ํ•œ ํ›„ ์›ํ•˜๋Š” ์ถœ๋ ฅ๊ฐ’์— ๋งž๋„๋ก ์กฐ์ •.

<Lec 12 ์ •๋ฆฌ>

RNN

Sequence data : ํ˜„์žฌ์˜ state ๊ฐ€ ๊ทธ ๋‹ค์Œ state ์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

์•ž์˜ ๊ฒฐ๊ณผ ๊ฐ’์ด ๊ทธ ๋‹ค์Œ ๊ณ„์‚ฐ์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

์•ž์˜ ๊ฒฐ๊ณผ ๊ฐ’์ด ๊ทธ ๋‹ค์Œ ๊ณ„์‚ฐ์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

ht ๊ฐ€ new state, xt ๊ฐ€ input vector ๋ผ๊ณ  ํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„.

$$h_t=f_w(h_{t-1},x_t)$$

์œ„์˜ ์‹์— wx๋ฅผ ๋„ฃ์œผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๊ฐ€ ๋œ๋‹ค.

$$h_t = tanh(W_{hh}h_{t-1}+W_{xh}x_t), y_t=W_{hy}\cdot h_t$$

์˜ˆ์‹œ)

'hello'๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ฒฝ์šฐ

1 ) input layer ์— ๊ฐ๊ฐ ์ž๋ฆฌ์— ํ•ด๋‹น ํ•˜๋Š” ๊ฐ’์„ 1๋กœ ์„ค์ •. ๊ฐ ์•ŒํŒŒ๋ฒณ์— ๋งž๋Š” input vector ๋ฅผ ์„ค์ •ํ•œ๋‹ค.

2 ) Hidden layer 1 ์—์„œ input 'h'๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค.

3 ) Hidden layer 2 ์—์„œ Hidden layer 1 ๊ณผ 'e'๋ฅผ input ์œผ๋กœ ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค.

4 ) ๊ทธ ๋’ค์—๋„ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•ด๋‹น ๊ฐ’์˜ ํ™•๋ฅ ๋“ค์„ ์ถœ๋ ฅ๊ฐ’์œผ๋กœ ๊ฐ–๋Š”๋‹ค.

RNN ํ™œ์šฉ ์‚ฌ๋ก€ ) Language Modeling (์—ฐ๊ด€ ๊ฒ€์ƒ‰์–ด), Sppech Recognition (์Œ์„ฑ ์ธ์‹), Machine Translation (๋ฒˆ์—ญ๊ธฐ),

Conversation Modeling/Question Answering (์ฑ„ํŒ… ๋ด‡ ๋“ฑ), Image/Video Captioning

one-to-one / one to many / many to one / many to many ๋ฐฉ๋ฒ•์ด ์žˆ์Œ.

Multi-Layer RNN : ์—ฌ๋Ÿฌ๊ฐœ์˜ layer๋ฅผ ๋‘์–ด ๋” ๋ณต์žกํ•œ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ RNN.

ํ•„๊ธฐ ์ •๋ฆฌ

์ถœ์ฒ˜

Last updated