๐Ÿ“š Study/AI

[cs231n] Variational Autoencoders (VAE)

์œฐ๊ฐฑ 2024. 7. 9. 22:42

 

์ด์ „ PixelCNN ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š”, ํ™•๋ฅ  ๋ชจ๋ธ์ด ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜์˜€๋Š”๋ฐ,

VAE(Variational Autoencoders)๋Š” ํ™•๋ฅ  ๋ชจ๋ธ์ด ๊ณ„์‚ฐ ๋ถˆ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜๋กœ ์ •์˜๊ฐ€ ๋œ๋‹ค.

๋”ฐ๋ผ์„œ, Lower bound(ํ•˜ํ•œ์„ )์„ ๊ตฌํ•ด์„œ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด์ฃผ๋Š”๊ฒŒ ๋ชฉ์ ์ด๋‹ค.

 

VAE์— ๋Œ€ํ•ด ๋ฐ”๋กœ ๋“ค์–ด๊ฐ€๊ธฐ ์ „์—,

Autoencoder์˜ ๊ณผ์ •์ธ Encoder์™€ Decoder์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

 

 

 

Autoencoder์ด๋ž€ input data $x$๋กœ๋ถ€ํ„ฐ ๋” ๋‚ฎ์€ ์ฐจ์›์˜ feature $z$๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

$z$๊ฐ€ $x$๋ณด๋‹ค ์ฐจ์›์ด ๋‚ฎ์€ ์ด์œ ๋Š”, ๊ธฐ์กด์˜ input ์ค‘์—์„œ 'ํ•ต์‹ฌ ์ •๋ณด'๋งŒ์„ ๊ฐ–๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ฆ‰, encoder๋ฅผ ํ†ตํ•ด input data์— Noise๋ผ๊ณ  ์ƒ๊ฐ๋˜๋Š” ๋ถ€๋ถ„์€ ์ œ๊ฑฐํ•˜๊ณ  ์‹ถ๋‹ค๋Š” ๋œป์ด๋‹ค.

 

์ •๋ฆฌํ•ด์„œ ๋งํ•˜๋ฉด,

Autoencoder: unlabeled training data๋กœ๋ถ€ํ„ฐ
lower-dimensional feature representation์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ unsupervised approach
Encoder: Input data $x$๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ, ๊ทธ ํŠน์ง• ๋ฒกํ„ฐ $z$๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ณผ์ •

 

 

 

์ฆ‰, autoencoder๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š”

input data $x$๋กœ ๋ถ€ํ„ฐ ์–ป์€ feature์ธ $z$๋งŒ ๊ฐ–๊ณ  input data๋ฅผ ๋ณต์›($\hat{x}$)ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์ด ๋ฐ”๋กœ decoder์˜ ์—ญํ• ์ด๋‹ค.

 

Decoder: feature vector $z$๋ฅผ ํ†ตํ•ด input data๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ๊ณผ์ •

 

์ผ๋ฐ˜์ ์œผ๋กœ, Encoder๊ณผ Decoder์€ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค. (ํ•˜์ง€๋งŒ, ์—ญ๊ณผ์ •)

 

 

์ผ๋ฐ˜์ ์œผ๋กœ, ํ›ˆ๋ จ์„ ํ•˜๊ธฐ ์ „ reconstructed data๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์˜ค๋ฅธ์ชฝ ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค.

์™œ ์ด๋ ‡๊ฒŒ ํ๋ฆฟํ•œ๊ฐ€? ์–ด์ฉŒ๋ฉด ๋‹น์—ฐํ•˜๋‹ค.

๊ธฐ์กด input data๋กœ๋ถ€ํ„ฐ ์ฐจ์›์ด ๋‚ฎ์€ feature $z$๋ฅผ ์–ป์—ˆ๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํ•œ์ •๋œ ์ •๋ณด๋กœ ๋‹ค์‹œ ๋†’์€ ์ฐจ์›์œผ๋กœ ๋งคํ•‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค

๋”ฐ๋ผ์„œ, autoencoder๋Š”

reconstructed๋œ $\hat{x}$์™€ ์›๋ž˜ input์ด์—ˆ๋˜ $x$ ๋‘˜ ์‚ฌ์ด์˜ loss๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•œ๋‹ค.

input data๋ฅผ ๋” ์ž˜ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” feature $z$๋ฅผ ์–ป๋Š” ๊ณผ์ •์ด๋ผ๊ณ ๋„ ์–˜๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

decoder๋Š” feature vector์ธ $z$๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—

training์ด ๋๋‚œ ํ›„์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.

์ฆ‰, test๋ฅผ ํ•  ๋•Œ์—๋Š” encoder๋งŒ ์‚ฌ์šฉํ•œ๋‹ค.

 

 

 

ํ•™์Šต์‹œํ‚จ Encoder๋Š” supervised model์„ ์ดˆ๊ธฐํ™”ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

 

autoencoder๋Š” unlabeled data์—์„œ ์–‘์งˆ์˜ general feature representation์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค.

๋”ฐ๋ผ์„œ, ์ด๋ ‡๊ฒŒ ํ•™์Šต์‹œํ‚จ feature representation์€

๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š์€ supervised learning model์˜ ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜๋กœ ์ด์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Autoencoder๊ฐ€ data๋ฅผ ์ž˜ reconstructํ•˜๊ณ , supervised model๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๋Š” feature์„ ์ž˜ ํ•™์Šตํ•œ๋‹ค.

์ด๋Ÿฌํ•œ feature vector $z$๋ฅผ ์ด์šฉํ•ด์„œ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋“ค์„ generateํ•  ์ˆ˜ ์—†์„๊นŒ?

๋ผ๋Š” ๋ฐฐ๊ฒฝ์—์„œ ๋‚˜์˜จ ๊ฒƒ์ด ๋ฐ”๋กœ 'VAE(Variational Autoencoder)'์ด๋‹ค.

 

 

 


VAE๋Š” AE์™€ ๋‹ค๋ฅด๊ฒŒ
latent variable $z$๋กœ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€ $x$๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.
(autoencoder์˜ decoder ์—ญํ• ๊ณผ ์œ ์‚ฌํ•จ)

 

 

๋งŒ์•ฝ ์–ผ๊ตด($x$)์„ generateํ•œ๋‹ค๊ณ  ํ•˜๋ฉด,

$z$๊ฐ€ ๊ฐ–๊ณ  ์žˆ๋Š” ์ •๋ณด๋Š” ์–ผ๋งˆ๋‚˜ ๋ฏธ์†Œ๋ฅผ ์ง€์—ˆ๋Š”์ง€, ๋ˆˆ์น์˜ ์œ„์น˜, ์–ผ๊ตด์˜ ๋ฐฉํ–ฅ ๋“ฑ์ด ๋  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.

 

์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” true parameter $\theta$๋ฅผ ์ถ”์ •ํ•˜๋ ค๊ณ  ํ•œ๋‹ค.

 

 

 

 

์ผ๋ฐ˜์ ์œผ๋กœ prior์€ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

 

 

์–ด๋–ป๊ฒŒ ๋ชจ๋ธ์„ trainํ•  ์ˆ˜ ์žˆ์„๊นŒ?

 

- generative model์„ trainingํ•˜๋Š” ์ „๋žต์œผ๋กœ FVBN(Fully Visible Brief Network) ์„ ์‚ฌ์šฉํ•œ๋‹ค.

- ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๊ฐ€ ์•Œ์•„์•ผ ํ•  ๊ฒƒ์€, training data์˜ likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ชจ๋ธ์˜ parameter๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

 

 

์šฐ๋ฆฌ๋Š” ์—ฌ๊ธฐ์„œ $p_{\theta}(x)$๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๊ณ  ์‹ถ์€๋ฐ.. ๋ฌธ์ œ๊ฐ€ ๋ฌด์—‡์ผ๊นŒ?

๊ทธ๋ ‡๋‹ค ๊ธฐ์กด์˜ PixelCNN๊ณผ ๋‹ค๋ฅด๊ฒŒ intractableํ•œ ํ˜•์‹์ด๋ผ๋Š” ๊ฒƒ์ด๋‹ค.

 

 

 

 

์™œ intractible์ด๋ผ๊ณ  ํ‘œํ˜„ํ• ๊นŒ? ๋ฐ”๋กœ, ๊ณ„์‚ฐ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

$P_{\theta}(x | z)$๋Š” ๋ชจ๋“  $z$์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋‚ด์•ผ ->  $P_{\theta}$๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ

๋ชจ๋“  $z$๋Š” ์šฐ๋ฆฌ๊ฐ€ ์•Œ์•„๋‚ผ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

 

 

 

๋”ฐ๋ผ์„œ, ์ถ”๊ฐ€์ ์ธ encoder๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šต์„ ์‹œํ‚ค๋Š”๋ฐ

$P_{\theta}(x | z)$๋ฅผ ๊ทผ์‚ฌํ•œ $Q_{\theta}(x | z)$๋ฅผ ์ด์šฉํ•ด ํ•™์Šต์„ ์‹œํ‚จ๋‹ค.

์ด๋•Œ, $Q_{\theta}(x | z)$์€ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 ๊ทธ๋ฆฌ๊ณ  ์ด ๊ณ„์‚ฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€๋Šฅ๋„(likelihood)์— ๋Œ€ํ•œ tractableํ•œ ํ•˜ํ•œ(lower bound)์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

  • Decoder Network Term(์ฒซ ๋ฒˆ์งธ term):

    [reconstruction]: original input being reconstructed
    $q_{\theta}(z|x)$๋กœ๋ถ€ํ„ฐ samplingํ•œ $z$๋ฅผ ๊ฐ€์ง€๊ณ  $p_{\theta}(x_{i}|z)$๊ฐ€ $x_{i}$๋ฅผ ์ƒ์„ฑํ•œ log likelihood

 

  • KL term

    prior $p_{theta}(z)$์™€ posterior $q_{\theta}(z|x)$ ์‚ฌ์ด์˜ KL-divergence
    ์ฆ‰ ๊ทผ์‚ฌ๋œ posterior์˜ ๋ถ„ํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ normal distribution๊ณผ ๊ฐ€๊นŒ์šด์ง€์— ๋Œ€ํ•œ ์ฒ™๋„์ž…๋‹ˆ๋‹ค
    (๋‹จ ์ด๋•Œ prior๋ฅผ normal distribution์œผ๋กœ ๊ฐ€์ •)

 

  • KL term(๋งˆ์ง€๋ง‰ KL term)

    $p_{\theta}(z| x_{i}|)$๋Š” intractableํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์–ด๋ ต๋‹ค.
    (Intractible: Baye's rule์„ ์ ์šฉํ•œ ์ˆ˜์‹ $p_{\theta}(z| x_{i}|)$์ด ๊ณ„์‚ฐ์ด ๋˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธ) 
    ํ•˜์ง€๋งŒ KL์˜ ์„ฑ์งˆ(ํŠน์„ฑ)์— ์˜ํ•ด ์„ธ๋ฒˆ์งธ ํ•ญ(๋งˆ์ง€๋ง‰ KL term)์€ ๋ฌด์กฐ๊ฑด 0๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™๋‹ค

 

 

 

 


์ฐธ๊ณ  ๋ธ”๋กœ๊ทธ ๋งํฌ

https://taeyoung96.github.io/cs231n/CS231n_13/

 

CS231n 13๊ฐ• ์š”์•ฝ

Unsupervised Learning๊ณผ Generative Models์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž!

taeyoung96.github.io

https://doubleyoo.tistory.com/7

 

[cs231n] Lec13 - Generative models

Overview Unsupervised Learning Generative models: PixelRNN and CNN, VAE, GAN Unsupervised Learning (๋น„์ง€๋„ ํ•™์Šต) label ์—†์ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ๋ฐ์ดํ„ฐ์— ์ˆจ์–ด์žˆ๋Š” ๊ตฌ์กฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ• ex) Clustering, dimensionality reduction, f

doubleyoo.tistory.com

https://deepinsight.tistory.com/121

 

[๋ชจ๋‘๋ฅผ ์œ„ํ•œ cs231n] Lecture 13 - Part 2. VAE(Variational AutoEncoer)

VAE(Variational AutoEncoder) ์•ˆ๋…•ํ•˜์„ธ์š” Steve Lee์ž…๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ cs231n 13๊ฐ•์˜ Variational AutoEncoder์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค! (๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฐ•์ขŒ๋“ค์€ ์•„๋ž˜์˜ ๋ชฉ์ฐจ๋ฅผ ํ†ต

deepinsight.tistory.com

 

๋Œ“๊ธ€์ˆ˜0