Lecture 1~3

<Lec 1 ์ •๋ฆฌ>

Machine Learning ์ด๋ž€?

- ์ผ์ข…์˜ ์†Œํ”„ํŠธ์›จ์–ด ํ”„๋กœ๊ทธ๋žจ. ํ”„๋กœ๊ทธ๋žจ ์ž์ฒด๊ฐ€ data๋ฅผ ๋ณด๊ณ  ํ•™์Šตํ•ด์„œ ๋ฐฐ์›Œ์„œ ์‘์šฉํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ–๋Š” ํ”„๋กœ๊ทธ๋žจ.

ML์€ ํฌ๊ฒŒ ๋‘๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Œ.

  • Supervised Learning : ์ •ํ•ด์ ธ ์žˆ๋Š” data. Training set์„ ๊ฐ€์ง€๊ณ  ํ•™์Šต์„ ํ•จ.

  • Unsupervised Learning : Label ์ด ์ •ํ•ด์ง€์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต. ex) word clustering, ๊ตฌ๊ธ€ ๋‰ด์Šค ๋“ฑ

Supervised Learning

์˜ˆ์‹œ๋กœ ์ด๋ฏธ์ง€ ๋ผ๋ฒจ๋ง, ์ด๋ฉ”์ผ ์ŠคํŒธ ํ•„ํ„ฐ, ์„ฑ์  ์˜ˆ์ธก๊ธฐ ๋“ฑ์ด ์žˆ์Œ. ๊ธฐ์กด์˜ ์ž๋ฃŒ๊ฐ€ ์žˆ์–ด์•ผํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ๊ธฐ์กด์˜ ์ž๋ฃŒ๋ฅผ 'Training Dataset' ์ด๋ผ๊ณ  ํ•จ.

Supervised Learning ๋„ ์„ธ๊ฐ€์ง€ ํƒ€์ž…์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

  • Regression : ๋ฒ”์œ„ ๋‚ด์—์„œ ๊ฒฐ๊ณผ๊ฐ’์„ ๋ƒ„. ex) ๊ณต๋ถ€ํ•œ ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์„ฑ์  ์˜ˆ์ธกํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ 0~100 ์‚ฌ์ด์˜ ๊ฒฐ๊ณผ ๊ฐ’ ๋„์ถœ

  • Binary classification : ๋‘˜ ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ƒ„. ex) pass/nonpass

  • Multi-label classficiation : ์—ฌ๋Ÿฌ๊ฐœ ๋ ˆ๋ฒจ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒฐ๊ณผ. ex) A/B/C/D/F ๋กœ ๋ถ„๋ฅ˜

<Lec 2 ์ •๋ฆฌ>

Linear Regression ์˜ ๋ฐฉ๋ฒ•

  1. H(x) = Wx + b ํ˜•ํƒœ์˜ ์ˆ˜์‹์ด ๋‚˜์˜ฌ ๊ฒƒ์ด๋ผ ๊ฐ€์„ค์„ ์„ธ์šด๋‹ค.

  2. ๊ฐ€์„ค์ด ๋‚˜ํƒ€๋‚ด๋Š” ์„  ์ค‘ ๊ฐ€์žฅ ์ข‹์€ ๊ฐ’์ด ๋ฌด์—‡์ธ์ง€๋ฅผ ์ฐพ๋Š”๋‹ค. -> Data ๊ฐ’๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ฒƒ์ด ์ข‹์€ ๊ฐ’.

Training Data์˜ ์˜ˆ์‹œ

X
Y

1

1

2

2

3

3

์–ด๋–ค Data๊ฐ€ ์žˆ์„ ๋•Œ Linearํ•œ ์„ ์„ ์ฐพ๋Š” ๋ชจ๋ธ.

Cost Function (Lost Function)

๊ฐ€์„ค์„ ์„ธ์šด ์„ ๊ณผ Data์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฒ•. ๊ฐ€์„ค๊ณผ ์‹ค์ œ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€ ๊ณ„์‚ฐ

= ( H(x) - y )^2

H(x)๋ฅผ ๋Œ€์ž…ํ•˜์—ฌ ๊ณ„์‚ฐํ•ด์„œ, ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์ด ๋‚˜์˜ค๋„๋ก W์™€ b๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด Linear Regression์˜ ํ•™์Šต.

minimize cost(W,b)

<Lec 3 ์ •๋ฆฌ>

Const Function cost(W) ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ํ•จ์ˆ˜ ์ฐพ๊ธฐ.

H(x) = Wx ๋กœ ๊ฐ„์†Œํ™”ํ•ด์„œ ๊ฐ€์„ค ์„ธ์šฐ๊ณ , W๊ฐ€ 0, 1, 2, ... ์ธ ๊ฒฝ์šฐ๋กœ ๊ฐ๊ฐ data ๋Œ€์ž…ํ•˜์—ฌ ์ตœ์†Œ๊ฐ’ ๊ตฌํ•˜๊ธฐ.

์˜ˆ์‹œ) Training Data๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์„ ๊ฒฝ์šฐ

X
Y

1

1

2

2

3

3

W=1 ์ผ๋•Œ cost(W) = 0

W=0 ์ผ๋•Œ, cost(W) = 4.67

W=2 ์ผ๋•Œ, cost(W) = 4.67

...

๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค cost function ์ด ์ตœ์†Œํ™”๋˜๋Š” W๊ฐ’ ์ฐพ๊ธฐ.

Gradient descent algortihm ( = ๊ฒฝ์‚ฌ๋ฅผ ๋”ฐ๋ผ ๋‚ด๋ ค๊ฐ€๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜)

=> ๋‹ค์Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•ด cost(W,b) ๋ฅผ ๊ฐ€์žฅ ์ตœ์†Œํ™”ํ•˜๋Š” W์™€ b์˜ ๊ฐ’์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ž‘๋™๋ฒ•:

ํ•จ์ˆ˜์—์„œ ๊ฒฝ์‚ฌ๋„(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•ด ๋”ฐ๋ผ ์›€์ง์ด๋‹ค ์ตœ์ข…์ ์œผ๋กœ 0์— ๋„์ฐฉ. -> ํ•ญ์ƒ ์ตœ์ €์ ์— ๋„์ฐฉํ•œ๋‹ค๋Š” ๊ฒƒ์ด ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์žฅ์ .

  1. ์•„๋ฌด ์ ์—์„œ๋‚˜ ์‹œ์ž‘ ๊ฐ€๋Šฅ

  2. W๋ฅผ ์กฐ๊ธˆ์”ฉ ๋ฐ”๊พธ๋ฉฐ cost๋ฅผ ์ค„์ž„.

  3. ๊ทธ ๊ฒฝ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•ด ๋ฐ˜๋ณตํ•˜์—ฌ ์ตœ์†Œ๊ฐ’ ๊ตฌํ•จ.

๊ฒฝ์‚ฌ๋„ ๊ตฌํ•˜๋Š” ๋ฒ•

๋‹ค์Œ ์ˆ˜์‹์œผ๋กœ W ๊ฐ’์„ ๋ณ€ํ™”์‹œํ‚ค๋ฉด์„œ ์ตœ์ €์  ์ฐพ์Œ.

Convex function : Gradient descent algorithm์œผ๋กœ ๊ทธ๋ฆผ์„ ๊ทธ๋ฆฐ ๊ทธ๋ž˜ํ”„. ๋ฐฅ ๊ทธ๋ฆ‡ ํ˜•ํƒœ. ์–ด๋””์„œ ์‹œ์ž‘ํ•˜๋“  ํ•ญ์ƒ ๋„์ฐฉํ•˜๋Š” ์ ์ด ์ตœ์ €์ ์ด๋‹ค.

Cost Function์„ ์„ค๊ณ„ํ•  ๋•Œ ์ด ํ•จ์ˆ˜์˜ ๋ชจ์–‘์ด ๋ฐ˜๋“œ์‹œ Convex Function์ด ๋˜๋„๋ก ํ•ด์•ผํ•œ๋‹ค!

ํ•„๊ธฐ ์ •๋ฆฌ

์ถœ์ฒ˜

๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ฐ•์ขŒ ์‹œ์ฆŒ 1 - YouTube

Last updated