Lecture 7~9

<Lec 7 ์ •๋ฆฌ>

Learning rate, ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•, overfitting ๋ฐฉ์ง€ ๋ฐฉ๋ฒ•

- Gradient descent์—์„œ ์ตœ์†Œ๊ฐ’ ๊ตฌํ•  ๋•Œ learning rate๋ผ๋Š” ฮฑ๊ฐ’์„ ์ž„์˜๋กœ ์ง€์ •ํ•จ.์ ๋‹นํ•œ learning rate ๊ฐ’์„ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

Learning rate ๊ฐ’์ด ๋„ˆ๋ฌด ํฐ ๊ฒฝ์šฐ : step ์ด ๋„ˆ๋ฌด ์ปค์ ธ ๊ทธ๋ž˜ํ”„์˜ ๋ฐ–์œผ๋กœ ํŠ•๊ฒจ ๋‚˜๊ฐˆ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ 'overshooting' ์ด๋ผ ํ•จ.

Learning rate๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘์€ ๊ฒฝ์šฐ: ๊ฒฝ์‚ฌ๋ฉด์„ ๋„ˆ๋ฌด ์กฐ๊ธˆ์”ฉ ์ด๋™ํ•ด ๋ฐ”๋‹ฅ๊นŒ์ง€ ๋‚ด๋ ค๊ฐ€์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

=> ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด cost ํ•จ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๊ณ  ํ™•์ธ. ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ฑฐ์˜ ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด learning rate์„ ์กฐ๊ธˆ ์˜ฌ๋ ค์„œ ํ™•์ธ.

-Observe the cost function (์ฒ˜์Œ ์„ค์ •์„ 0.01 ์ •๋„๋กœ ํ•œ ํ›„ ํ™•์ธ)

-Check it goes down in a reasonable rate (์ตœ์ €๊ฐ’์„ ํ–ฅํ•ด ์ ๋‹นํ•œ ์ •๋„๋กœ ๋‚ด๋ ค๊ฐ€๋Š”์ง€ ํ™•์ธ ํ›„ ์•„๋‹ˆ๋ผ๋ฉด ์กฐ์ •)

Data(X) preprocessing for gradient descent

- Normalized data : Data ๊ฐ’์— ํฐ ์ฐจ์ด๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ Normalize ํ•ด์•ผํ•จ. data ๊ฐ’๋“ค์ด ์–ด๋–ค ํ˜•ํƒœ์˜ ๋ฒ”์œ„ ์•ˆ์— ํ•ญ์ƒ ๋“ค์–ด๊ฐ€๋„๋ก ํ•จ.

- zero-centered data : Data์˜ ์ค‘์‹ฌ์ด 0์œผ๋กœ ๊ฐ€๋„๋ก ์กฐ์ •.

Standardization

ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œ์ค€ํ™”ํ•œ๋‹ค.

X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()

ํŒŒ์ด์ฌ ์ด์šฉ ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ.

Overfitting

- ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋”ฑ ๋งž๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๋ƒˆ๋Š”๋ฐ, ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ํ•™์Šต ํ•ด ๋ณด๋ฉด ์ž˜ ๋งž์ง€ ์•Š๋Š” ๊ฒฝ์šฐ.

์ค„์ด๋Š” ๋ฒ• - training data ์˜ ์–‘์„ ๋Š˜๋ฆฐ๋‹ค. - Feature ์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ค„์ธ๋‹ค. (์ค‘๋ณต ๋œ ๊ฒƒ์ด ์žˆ์„ ๊ฒฝ์šฐ ์ค„์ด๊ธฐ) - Regularization ์ผ๋ฐ˜ํ™” : ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” weight ์„ ๋„ˆ๋ฌด ํฐ ๊ฐ’์„ ๊ฐ€์ง€์ง€ ์•Š๋„๋ก ํ•จ.

๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์„ฑ๊ณต์ ์œผ๋กœ ์˜ˆ์ธก ํ•  ์ˆ˜ ์žˆ์„ ์ง€ ํ‰๊ฐ€ : Training set ๊ณผ Test set ์„ ๋‚˜๋ˆ„์–ด ๊ฒฐ๊ณผ ๋น„๊ตํ•œ๋‹ค.Original Set : Training + Validation + Testing ์œผ๋กœ ๊ตฌ์„ฑ ๋˜์–ด ์žˆ์Œ.Training, Validation ์˜ ๊ณผ์ •์„ ๊ฑฐ์นœ ํ›„ Testing ๋‹จ๊ณ„๋กœ ๊ฐ€์•ผํ•จ.

Online learning ํ•™์Šต ๋ฐฉ๋ฒ• - ํ•™์Šต ์‹œํ‚จ ๊ฒฐ๊ณผ๋“ค์„ model์— ๋‚จ์•„์žˆ๋„๋ก ํ•จ. ๊ธฐ์กด์— ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Data์— ์ถ”๊ฐ€๋กœ ํ•™์Šต.

Accuracy ์ •ํ™•๋„ - ์‹ค์ œ ๊ฒฐ๊ณผ ๊ฐ’. ๋ชจ๋ธ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฐ’๊ณผ ์‹ค์ œ ๊ฒฐ๊ณผ ๊ฐ’์ด ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€? - ์ด๋ฏธ์ง€ ์ธ์‹์˜ ์ •ํ™•๋„๋Š” ๋Œ€๋ถ€๋ถ„ 95% ~ 99%

<Lec 8 ์ •๋ฆฌ>

์–ด๋– ํ•œ input x ์— weight ์˜ ๊ณฑ์˜ ํ•ฉ + bias = โˆ‘ xยทw + b

์ด ๊ฐ’๋“ค์„ Activation Functions ์„ ํ†ตํ•ด output์œผ๋กœ ์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ธ๋‹ค.

  • Perceptrons

Backpropagation (Hinton)

- ๋„คํŠธ์›Œํฌ๊ฐ€ ์ฃผ์–ด์ง„ ์ž…๋ ฅ (Training set)์„ ๊ฐ€์ง€๊ณ  ์ถœ๋ ฅ์„ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค. ์ด ๋•Œ w,b ์กฐ์ ˆ. ์ถœ๋ ฅ์—์„œ error๋ฅผ ์ฐพ์•„ ๋ฐ˜๋Œ€๋กœ ์ „๋‹ฌ์‹œํ‚จ๋‹ค.

- ๋” ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•ด์ง.

Convolutional Neural Networks

- ๊ณ ์–‘์ด๊ฐ€ ๊ทธ๋ฆผ์„ ๋ณด๊ฒŒ ํ•œ ๋’ค ๊ทธ๋ฆผ์˜ ํ˜•ํƒœ์— ๋”ฐ๋ผ ํ™œ์„ฑํ™” ๋˜๋Š” ๋‰ด๋Ÿฐ์„ ๋ฐœ๊ฒฌ -> ์šฐ๋ฆฌ์˜ ์‹ ๊ฒฝ๋ง ์„ธํฌ๋Š” ์ผ๋ถ€๋ถ„์„ ๋‹ด๋‹นํ•˜๊ณ , ์ด ์„ธํฌ๋“ค์ด ๋‚˜์ค‘์— ์กฐํ•ฉ ๋˜๋Š” ๊ฒƒ์ด๋ผ ์ƒ๊ฐ.

- ๋ถ€๋ถ„๋ถ€๋ถ„์„ ์ž˜๋ผ ์ธ์‹ ํ›„ ๋‚˜์ค‘์— ํ•ฉ์น˜๋Š” ๋ฐฉ์‹.

- 90% ์ด์ƒ์˜ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คŒ. ๋ฏธ๊ตญ์—์„œ๋Š” ์ด๋ฅผ ํ†ตํ•ด ์ฑ…์„ ์ž๋™์œผ๋กœ ์ฝ๋Š” ๊ธฐ๋Šฅ์„ ๋งŒ๋“ค์–ด ์‚ฌ์šฉํ•จ.

๋ฌธ์ œ์ 

- Backpropagation algorithm์ด ์•ž์— ์žˆ๋Š” ์—๋Ÿฌ๋ฅผ ๋’ค๋กœ ๋ณด๋‚ผ ๋•Œ ์˜๋ฏธ๊ฐ€ ๊ฐˆ ์ˆ˜๋ก ์•ฝํ•ด์ ธ ๊ฑฐ์˜ ์ „๋‹ฌ ๋˜์ง€ ์•Š๊ณ  ํ•™์Šต์„ ์‹œํ‚ฌ ์ˆ˜ ์—†๊ฒŒ ๋จ. ๋งŽ์ด ํ•  ์ˆ˜๋ก ์„ฑ๋Šฅ ์ €ํ•˜.

- ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์˜ ๋“ฑ์žฅ.

CIFAR (Canadian Institute for Advanced Research) : neural network ์ด๋ผ๋Š” ์—ฐ๊ตฌ๋ฅผ ๊ณ„์† ์ง„ํ–‰ํ•จ.

Breakthrough ๋…ผ๋ฌธ ๋ฐœํ‘œ. ์ดˆ๊ธฐ๊ฐ’์„ ์ž˜ ์„ ํƒํ•œ๋‹ค๋ฉด ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คŒ -> Deep netwrok, Deep Learning ์œผ๋กœ ์ด๋ฆ„์„ ๋ฐ”๊ฟˆ.

Deep API Learning - ์ž์—ฐ์–ด๋กœ ์‹œ์Šคํ…œ์— ์ž…๋ ฅ์„ ์คฌ์„ ๋•Œ ์‹œ์Šคํ…œ์ด ์ž๋™์ ์œผ๋กœ ์–ด๋–ค API๋ฅผ ์–ด๋–ค ์ˆœ์„œ๋กœ ์จ์•ผํ•˜๋Š”์ง€ ๋‚˜์—ด. - ์•„์ง๊นŒ์ง€๋Š” ํ”„๋กœ๊ทธ๋žจ์ด ๋ฐ”๋กœ ์ƒ์„ฑ๋˜์ง€๋Š” ์•Š์ง€๋งŒ ์•ฝ 65%์˜ ์ •ํ™•๋„๋กœ ์˜ฌ๋ผ์™”๋‹ค.

์‚ฌ์šฉ๋˜๋Š” ์‚ฌ๋ก€ ์˜ˆ์‹œ ) ๊ตฌ๊ธ€ ๊ฒ€์ƒ‰(๊ด€์‹ฌ ํ”ผ๋“œ), ๋„ทํ”Œ๋ฆญ์Šค, ์•„๋งˆ์กด, ์•ŒํŒŒ๊ณ  ๋“ฑ๋“ฑ

<Lec 9 ์ •๋ฆฌ>

XOR ๋ฌธ์ œ ํ•ด๊ฒฐ๋ฒ• - And, Or ์€ linear line์œผ๋กœ ๊ตฌ๋ถ„์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ Xor ์€ ์„ ํ˜•์œผ๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ์„ ์„ ์ฐพ์„ ์ˆ˜ ์—†๋‹ค.

Neural Net - ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ง์„ ์‚ฌ์šฉ.

wยทx + b ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ ์‚ฌ์šฉํ•˜์—ฌ y1, y2 ๊ฐ’์„ ๋‚ธ ๋‹ค์Œ ๊ทธ ๋‘˜์„ sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ y hat์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•œ๋‹ค.

xor ๊ณ„์‚ฐ ์˜ˆ์‹œ ex

์ฒซ๋ฒˆ ์งธ ์œ ๋‹›์€ w=[5,5] , b= -8 ๋กœ ์ฃผ์–ด์ง€๊ณ  ๋‘๋ฒˆ์จฐ ์œ ๋‹›์€ w=[-7, -7], b = 3 ์ด๋ผ ํ•œ ํ›„ ๋‘ ๊ฒฐ๊ณผ ๊ฐ’ y1, y2๋ฅผ ๋„ฃ์–ด w = [-11, -11], b = 6 ์ธ ์œ ๋‹›์œผ๋กœ ๊ณ„์‚ฐํ•ด y hat ์˜ ๊ฐ’์„ ๊ตฌํ•œ๋‹ค.

์˜ˆ์‹œ ์„ค๋ช…

NN : Forward propagation ์˜ ๊ทธ๋ฆผ์—์„œ ์ฒซ๋ฒˆ์งธ ๊ณ„์‚ฐ์„ ํ•ฉ์นจ. ๊ฐ๊ฐ ์—ฐ์‚ฐ์„ ํ•ด์„œ ์ˆ˜์‹์œผ๋กœ ์ฒ˜๋ฆฌ.

K(X) = sigmoid (XW1+ B1)Yhat = H(X) = sigmoid (K(x)W2 + b2)

์ˆœ๊ฐ„๋ณ€ํ™”์œจ = ๋ฏธ๋ถ„ ๊ฐ’.

Partial derivative : ๋‚ด๊ฐ€ ๋ฏธ๋ถ„ํ•˜๋Š” ๊ฐ’์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€๋ฅผ ์ƒ์ˆ˜ ์ทจ๊ธ‰.

Back propagation (chain rule)

f = wx + b, g = wx, f= g+b

- > ์ด๋ฅผ ๊ฐ๊ฐ ํŽธ๋ฏธ๋ถ„ ํ•œ๋‹ค.

1) forward (์ฃผ์–ด์ง„ ๊ฐ’์„ ๋Œ€์ž…ํ•ด ์ž…๋ ฅ์‹œํ‚จ๋‹ค.) : ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฐ’์„ ์–ป๋Š”๋‹ค.

2) backward (์‹ค์ œ ๊ฐ’์œผ๋กœ ํ™•์ธ) : ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์„ ๋น„๊ตํ•ด ์˜ค๋ฅ˜๋ฅผ ์ฐพ๊ณ , ์ถœ๋ ฅ ์กฐ์ •ํ•œ๋‹ค.

ํ•„๊ธฐ ์ •๋ฆฌ

์ถœ์ฒ˜

Last updated