๐Ÿ“š Study/Paper Review

[Paper Review] NeRF : Representing Scenes as Neural Radiance Fields for View Synthesis (ECCV2020)

์œฐ๊ฐฑ 2024. 4. 4. 21:39

NeRF ๋ชจ๋ธ์€ ๋งŽ์€ ๋ธ”๋กœ๊ทธ์™€ ์œ ํŠœ๋ธŒ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋ณด๋ฉฐ ์ดํ•ดํ•˜๋Š” ์ˆ˜์ค€์— ๊ทธ์ณค๋Š”๋ฐ

๋…ผ๋ฌธ์„ ์ •๋…ํ•˜๋‹ˆ ํ›จ์”ฌ ๋” ์ดํ•ด ์ •๋„๊ฐ€ ๊นŠ์–ด์ง„ ๊ธฐ๋ถ„์ด๋‹ค. ์ง์ ‘ ๊ธ€์„ ์จ๋ณด๋ฉฐ ์™„๋ฒฝํžˆ ๋‚ด ๊ฒƒ์œผ๋กœ ๋งŒ๋“ค์ž!

 

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆœ์„œ๋กœ ๊ธ€์ด ์ง„ํ–‰๋œ๋‹ค.

 

0. [Abstract] NeRF ๊ฐ„๋‹จ ์„ค๋ช…

1. [Background] Explicit Representation  vs  Implicit Representation

2. Neural Radiance Field Scene Representation

2.1 Overview

2.2  MLP Network

3. Volume Rendering with Radiance Field

3.1 [Equation 1] The expected color of the input ray.

3.2 [ Equation 2] Approximation of Equation 1(Stratified Sampling)

4. Optimizing a Neural Radiance Field

4.1 Positional Encoding

4.2  Hierarchical Volume Sampling

 


 

0. [Abstract] NeRF ๊ฐ„๋‹จ ์„ค๋ช…

 

 

NeRF๋Š” Input Images ์„ธํŠธ๋กœ๋ถ€ํ„ฐ

scene์˜ 5D neural radiance field representation๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

๊ทธ๋ฆฌ๊ณ , volume rendering ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์–ด๋–ค viewpoint์—์„œ๋„ scene์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

 

์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด,

(1) ๋‹ค์–‘ํ•œ ์œ„์น˜์™€ ๋ฐฉํ–ฅ์—์„œ ์ดฌ์˜๋œ Drum ์‚ฌ์ง„์„ input image๋ฅผ ๋„ฃ๊ณ 

(2) NeRF๋ฅผ ํ•™์Šตํ•˜์—ฌ

(3) ์ƒˆ๋กœ์šด ์‹œ์ ์—์„œ์˜ Drum ์‚ฌ์ง„์„ ๋งŒ๋“ค์–ด๋‚ธ ๊ณผ์ •์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 


1. [Background] Explicit Representation  vs  Implicit Representation

 

 

NeRF๊ฐ€ ์™œ ์ด๋ ‡๊ฒŒ ๋งŽ์€ ๊ด€์‹ฌ์„ ๋ฐ›๊ฒŒ ๋˜์—ˆ์„๊นŒ?

NeRF๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์€ MLP, ์ฆ‰ ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•œ 'Implicit Representation' ๋ฐฉ๋ฒ•์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

๊ธฐ์กด์— 3D๊ณต๊ฐ„์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ „ํ†ต์ ์œผ๋กœ pointcloud, voxel, mesh๋ฅผ ์ด์šฉํ–ˆ๋‹ค.

์ด๋Š” ์‹ค์ œ๋กœ ๊ณต๊ฐ„์— 3D ๋ฌผ์ฒด๊ฐ€ ์ •์˜๋˜์–ด ์žˆ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ  'Explicit Representation' ์ด๋ผ๊ณ  ํ•œ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜, NeRF๋Š” ํ•ด๋‹น ๋ฌผ์ฒด๋ฅผ ๋ชจ๋“  ๋ฐฉํ–ฅ์—์„œ ๋ด๋„ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค๋ฉด 3D ๋ Œ๋”๋งํ–ˆ๋‹ค๊ณ  ์ •์˜ํ•œ๋‹ค.

 

NeRF๋Š” ๊ธฐ์กด์˜ explicit ๋ฐฉ๋ฒ•๋“ค์ด

๊ณ ์ •๋œ ๊ทธ๋ฆฌ๋“œ๋‚˜ ๊ตฌ์กฐ๋ฅผ ํ•„์š”๋กœ ํ•˜๊ฑฐ๋‚˜, ๋„ˆ๋ฌด๋‚˜ ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•œ๋‹ค๋Š” ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•œ๋‹ค.

(3D ์ง€๋„ ๊ฐ™์€ ๊ฒฝ์šฐ, ์ด๋ฅผ NeRF๊ฐ€ ์•„๋‹Œ 3D Point๋‚˜ Mesh๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์—„์ฒญ๋‚œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•จ)

๋˜ํ•œ, ๋งค์šฐ ์ž‘์€ ๊ฐ’์œผ๋กœ ๋ฐฉํ–ฅ์„ ๋ฐ”๊พธ์–ด๋„ ํ•ด๋‹น Scene์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ธฐ์— ๋ถ€๋“œ๋Ÿฌ์šด ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

+ ๋…ผ๋ฌธ์— 2์žฅ์—์„œ ์–ธ๊ธ‰ํ•œ Related Work์„ ์™„๋ฒฝํžˆ ์ดํ•ดํ•˜์ง€ ๋ชปํ–ˆ์Œ. ์ดํ›„ ์ถ”๊ฐ€

 


2. Neural Radiance Field Scene Representation

 

2.1 Overview

 

NeRF ๋ชจ๋ธ์˜ overview๋ฅผ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

(a)

scene์„ ํ†ต๊ณผํ•˜๋Š” camera ray๋ฅผ ์œ ํ›„, ์ƒ˜ํ”Œ๋งํ•˜์—ฌ sampled set of 3D point๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

๊ทธ๋ฆฌ๊ณ , ์œ„์˜ point๋“ค์˜ ์œ„์น˜ ์ •๋ณด(Position)์ธ (x,y,z)์™€ ์นด๋ฉ”๋ผ์˜ ๋ฐฉํ–ฅ ์ •๋ณด(Direction)์ธ (θ,ฯ•)์„ ๊ตฌํ•œ๋‹ค.

(๋ชจ๋ธ์˜ input: 5D coordinate (x,y,z,θ,ฯ•))

(b)

์œ„์˜ ์ •๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์‹ ๊ฒฝ๋ง(MLP)์— ๋„ฃ๊ณ ,

์ถœ๋ ฅ๊ฐ’์œผ๋กœ RGB ์ƒ‰์ƒ ๊ฐ’๊ณผ volume density ๋ฐ€๋„ ๊ฐ’ σ ์„ ์–ป๋Š”๋‹ค.

(๋ชจ๋ธ์˜ output: (RGB σ))

(c)

Volume Rendering ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ RGB๊ฐ’๊ณผ density๊ฐ’์„ ๋ˆ„์ ํ•˜์—ฌ 2D image๋กœ ๋งŒ๋“ ๋‹ค.

(d)

๋‹ค์Œ ๊ณผ์ •์€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—

์˜ˆ์ธก๊ฐ’๊ณผ Ground Truth (์‹ค์ œ๊ฐ’) ์ฐจ์ด์˜ loss๊ฐ’์„ ์ค„์ด๋ฉฐ ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๋Š” Gradient Descent๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

 

2.2  MLP Network

 

์œ„์—์„œ ์‚ฌ์šฉํ•˜๋Š” MLP ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

(* input์˜ 60, 24 ์ˆซ์ž์— ๋Œ€ํ•ด์„œ๋Š” ๋’ค์—์„œ ๋‹ค๋ฃฌ๋‹ค -> 4.1 Positional Encoding)

 

1.

๋จผ์ €, 3D coordinate x๋ฅผ input์œผ๋กœ ๋„ฃ์–ด 8 fully-connected layers๋ฅผ ํ†ต๊ณผ์‹œํ‚จ๋‹ค.

(ReLU activations๊ณผ layer๋‹น 256 channel ์‚ฌ์šฉํ•จ)

๊ทธ๋ฆฌ๊ณ  output์œผ๋กœ σ์™€ 256-dimensional feature vector์„ ์–ป๋Š”๋‹ค.

2. 

์œ„์—์„œ ๋‚˜์˜จ 256-dimensional feature vector์™€ ๋ฐฉํ–ฅ์ •๋ณด view direction์„ ํ•ฉ์ณ์„œ ๋‹ค์‹œ input์„ ๋„ฃ๊ณ 

1 fully-connected layer์„ ํ†ต๊ณผํ•˜์—ฌ view-dependent RGB color์„ ์–ป๋Š”๋‹ค.

(ReLU activation๊ณผ layer๋‹น 128 channel ์‚ฌ์šฉํ•จ)

 

 

 

์œ„์˜ MLP์—์„œ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€

volume density σ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜ค์ง ์œ„์น˜์ •๋ณด x๋งŒ์ด ํ•„์š”ํ•˜๊ณ 

view-dependent RGB color์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์œ„์น˜์ •๋ณด x์™€ ๋ฐฉํ–ฅ์ •๋ณด d๊ฐ€ ํ•จ๊ป˜ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Š” ๋ณด๋Š” ๋ฐฉํ–ฅ์— ๋”ฐ๋ผ ๋ฐ€๋„ σ ๋Š” ๋™์ผํ•˜์ง€๋งŒ, RGB ์ƒ‰์ƒ์€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ view direction์„ ๊ฐ•์กฐํ•˜๋Š” ์ด์œ ๋Š”, ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด

์‹ค์ œ๋กœ ๊ณ ์ •๋œ ์œ„์น˜์— ๋Œ€ํ•ด ๋ฐ”๋ผ๋ณด์•˜์„ ๋•Œ ๋ณด๋Š” View์— ๋”ฐ๋ผ emitted radiance๊ฐ€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 


 

3. Volume Rendering with Radiance Field

Overview์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด, MLP๋ฅผ ํ†ตํ•ด output์ธ RGBσ์„ ์–ป์—ˆ๋‹ค๋ฉด, ์ด๋ฅผ ํ†ตํ•ด ์ตœ์ข… ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๊ฑธ ํ•ด์ฃผ๋Š” ๊ณผ์ •์ด ๋ฐ”๋กœ Volume Rendering์ด๋‹ค.

 

๋‹ค์‹œ ๋งํ•˜๋ฉด,

์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๋Š” ํ•˜๋‚˜์˜ ๊ด‘์„  ray๋ฅผ ๊ธฐ์ค€์œผ๋กœ, MLP๋ฅผ ํ†ตํ•ด ์ƒ˜ํ”Œ๋ง๋œ ์—ฌ๋Ÿฌ point๋“ค์˜  RGBσ๋ฅผ ๊ตฌํ–ˆ๋‹ค.

์ด์ œ๋Š” ์ด point๋“ค์˜ RGBσ์„ ํ•ฉ์ณ์„œ ๊ทธ ray๊ฐ€ ํ‘œํ˜„ํ•˜๋Š” ์˜ˆ์ƒ color์„ ๊ตฌํ•˜๋ฉด ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

์ตœ์ข… ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค๋Š” ๊ฒƒ์€ ๊ด‘์„ ์„ ๋”ฐ๋ผ ์ด๋™ํ•˜๋ฉด์„œ ์˜ˆ์ƒ color์„ ๊ตฌํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์˜๋ฏธ์ด๋‹ค.

 

 

3.1 [Equation 1] The expected color of the input ray.

 

 

ํ•˜๋‚˜์˜ ๊ด‘์„  ray์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ  ๊ทธ ray์—์„œ์˜ ์˜ˆ์ƒ color์€ ๋‹ค์Œ์˜ ์‹๊ณผ ๊ฐ™๋‹ค.

๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ ๋ถ€ํ„ฐ ์›๊ฒฉ ์ ๊นŒ์ง€ ์•„๋ž˜ ์‹๋“ค์˜ ๊ณฑ์„ ์ด์šฉํ•ด ์ ๋ถ„๊ฐ’์„ ๊ตฌํ•œ๋‹ค.

 

(1) T(t): ๊ทผ์ ‘์ ์œผ๋กœ๋ถ€ํ„ฐ ํ˜„์žฌ point๊นŒ์ง€์˜ density๋ฅผ ํ•ฉํ•œ ๊ฐ’์„ exp ์ฒ˜๋ฆฌํ•œ ๊ฐ’

- ๋ฐ€๋„๋ฅผ ํ•ฉํ•œ ๊ฐ’์ด ์ž‘๋‹ค๋ฉด  -> T(t)๊ฐ€ ์ปค์ ธ -> ์ด์ „์˜ point๋“ค์ด ์•„๋‹ˆ๋ผ ํ˜„์žฌ ์œ„์น˜์ธ t๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธ

- ๋ฐ€๋„๋ฅผ ํ•ฉํ•œ ๊ฐ’์ด ํฌ๋‹ค๋ฉด -> T(t)๊ฐ€ ์ž‘์•„์ ธ -> ํ˜„์žฌ ์œ„์น˜์ธ t๋ณด๋‹ค ์•ž์˜ point๋“ค ์ค‘์— ๋” ์ค‘์š”ํ•œ ๊ณณ์ด ์žˆ์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธ

(2) σ(r(t)): volume density ๊ฐ’

(3) c(r(t),d): color RGB ๊ฐ’

- view-dependent color์ด๋ฏ€๋กœ ์œ„์น˜์ •๋ณด๋ฟ ์•„๋‹ˆ๋ผ density๊นŒ์ง€ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋„ฃ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

3.2 [ Equation 2] Approximation of Equation 1(Stratified Sampling)

 

๊ทธ๋Ÿฌ๋‚˜ 3.1์˜ ์‹์€ continuousํ•˜๊ธฐ ๋•Œ๋ฌธ์—

samplingํ•˜์—ฌ ๊ฐ๊ฐ์˜ point๋“ค์— ๋Œ€ํ•ด ๊ฐ’์„ ๊ตฌํ•˜๋Š” NeRF์—์„œ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.

๋”ฐ๋ผ์„œ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” discreteํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์‹์„ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

 

์ผ๋‹จ ํ˜„์žฌ์˜ ์œ„์น˜ t๋ฅผ

Stratified Sampling์„ ํ•˜์—ฌ N๋“ฑ๋ถ„ํ•ด์„œ ๋™์ผํ•œ ๊ฐ„๊ฒฉ์œผ๋กœ ์ชผ๊ฐœ์–ด ์ •์˜ํ•œ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ  ray์—์„œ์˜ ์˜ˆ์ƒ color๋„ ์œ„์™€ ๊ฐ™์€ ์‹์œผ๋กœ ๋ฐ”๊พธ์–ด ์ง„ํ–‰ํ•˜์˜€๋‹ค.

 


4. Optimizing a Neural Radiance Field

์•„์‰ฝ์ง€๋งŒ, ์œ„์—์„œ ์ œ์‹œํ•œ MLP์™€ Volume Rendering ๋ฐฉ๋ฒ•๋งŒ์œผ๋กœ๋Š” SOTA๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์—†์—ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์„ฑ๋Šฅ์„ ๋” ๋†’์ด๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์„ ์ถ”๊ฐ€๋กœ ์ด์šฉํ•˜์˜€๋‹ค.

๋ฐ”๋กœ, (1) Positional Encoding๊ณผ (2) Hierarchical Volume Sampling์ด๋‹ค.

 

4.1 Positional Encoding

 

์‹ ๊ฒฝ๋ง์€ ๋‚ฎ์€ ์ฃผํŒŒ์ˆ˜ ํ•จ์ˆ˜(lower frequency functions) ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด์„œ

๊ณ ์ฃผํŒŒ์˜ ๋ณ€๋™์„ ์ž˜ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์ฃผํŒŒํ•จ์ˆ˜(high frequency function)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ

์ž…๋ ฅ(input)์„ ๋†’์€ ์ฐจ์›์˜ ๊ณต๊ฐ„(higher dimensional space)์œผ๋กœ ๋งคํ•‘ํ•œ ํ›„์— ๋„คํŠธ์›Œํฌ์— ์ „๋‹ฌํ•จ์œผ๋กœ์จ

๋” ๋‚˜์€ ๋ฐ์ดํ„ฐ ์ ํ•ฉ์„ฑ์„ ์–ป๋Š” ๊ณผ์ •์ด ๋ฐ”๋กœ 'Positional Encoding'์ด๋‹ค.

 

ํ•œ๋งˆ๋””๋กœ ์ •๋ฆฌํ•˜๋ฉด,

์›๋ž˜์˜ input์€ 5D coordinate์˜€์ง€๋งŒ, ์ฐจ์›์„ ๋Š˜๋ ค์„œ ๋ชจ๋ธ์— ๋„ฃ๊ฒ ๋‹ค ์†Œ๋ฆฌ์ด๋‹ค.

 

 

์œ„์˜ ์‹๊ณผ ๊ฐ™์€ encoding ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.

Positional Encoding์— ๋Œ€ํ•ด ๋ฐฐ์› ์œผ๋‹ˆ, ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ MLP์˜ input๋“ค์„ ์ž์„ธํžˆ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

1. ์œ„์น˜์ •๋ณด r(x)

๊ธฐ์กด์˜ ์œ„์น˜์ •๋ณด x๋Š” 3D์ด์ง€๋งŒ, L  = 10์„ ํ†ตํ•ด ์ฐจ์›์ด 60์ธ r(x)๋กœ ๋งคํ•‘ํ–ˆ๋‹ค.

์‹) 3 x (10 x 2) = 60

: x2๊ฐ€ ๋ถ™๋Š” ์ด์œ ๋Š” sin cos 2๋ฐฐ๊ฐ€ ์ƒ๊ธฐ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

2. ๋ฐฉํ–ฅ์ •๋ณด r(d)

๊ธฐ์กด์˜ ๋ฐฉํ–ฅ์ •๋ณด d ์—ญ์‹œ 3D์ด์ง€๋งŒ, L = 4๋ฅผ ํ†ตํ•ด ์ฐจ์›์ด 24์ธ r(d)๋กœ ๋งคํ•‘ํ–ˆ๋‹ค.

(d๊ฐ€ 3D์ธ ์ด์œ ๋Š” ์œ„์น˜์ •๋ณด x,y,z ๊ฐ๊ฐ์— ๋Œ€ํ•œ ๋ฐฉํ–ฅ์„ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.)

์‹) 3 x (4 x 2) = 24

 

์œ„์˜ ๊ฒฐ๊ณผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Positional Encoding ๋ฐฉ๋ฒ•์„ ํ†ตํ•˜์—ฌ

์—ฐ์†์ ์ธ ์ž…๋ ฅ ์ขŒํ‘œ๋ฅผ ๋” ๋†’์€ ์ฐจ์›์˜ ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•จ์œผ๋กœ์จ MLP๊ฐ€ ๋” ๋†’์€ ์ฃผํŒŒ์ˆ˜ ํ•จ์ˆ˜๋ฅผ ๋ณด๋‹ค ์‰ฝ๊ฒŒ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

 

 

4.2  Hierarchical Volume Sampling

๊ธฐ์กด์˜ Volume Rendering ๋ฐฉ๋ฒ•์ด ํšจ์œจ์ ์ด์ง€ ์•Š๋‹ค๋Š” ์ ์„ ์ง€์ ํ•˜๋ฉฐ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด๋‹ค.

์ด๋ฅผ ํ†ตํ•ด, rendered image์— ๊ฑฐ์˜ ๊ธฐ์—ฌํ•˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„๋“ค์— ๋Œ€ํ•œ ์ƒ˜ํ”Œ๋ง์„ ์ค„์ด๊ณ ,

findal rendering์— ๋ฏธ์น  ์˜ํ–ฅ์„ ๊ณ ๋ คํ•˜์—ฌ ๊ทธ์— ๋น„๋ก€ํ•ด ํ• ๋‹นํ•จ์œผ๋กœ์จ ๋ Œ๋”๋ง ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.

์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด, ์ฆ์š”ํ•œ ์ •๋ณด๊ฐ€ ์žˆ๋Š” ๊ณณ์€ ๋” ์ž์„ธํžˆ ๋ณด๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ณณ์€ ๋Œ€์ถฉ ๊ณ ๋ คํ•˜๊ฒ ๋‹ค๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค!

 

์ด ๋ฐฉ๋ฒ•์€ "coarse"์™€ "fine"์ด๋ผ๋Š” 2๊ฐœ์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. 

1. stratified sampling์„ ํ†ตํ•ด Nc๊ฐœ์˜ t๋ฅผ ๋ฝ‘๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด  Coarse Network๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค.

2. Coarse Network๋ฅผ ํ†ตํ•ด ์–ป์€ Distribution(T(r))์— ๋”ฐ๋ผ Nf๊ฐœ์˜ t๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋ฝ‘๋Š”๋‹ค.

3. ์ด Nc + Nf์˜ Sample์„ ์ด์šฉํ•ด Fine Network๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค.

 

> ์œ„์˜ ๋‚ด์šฉ์€ https://www.youtube.com/watch?v=zkeh7Tt9tYQ&t=1900s์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ •๋ฆฌํ–ˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰์œผ๋กœ, loss๋Š” coarse์™€ fine์˜ ์˜ค์ฐจ๋ฅผ ๊ฐ๊ฐ ์ œ๊ณฑํ•ฉ์œผ๋กœ ๋”ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 


 

 

 

(2024.04.04)

80% ์ˆ˜์ค€ ์ •๋„๋กœ ์ดํ•ดํ•œ ๊ฒƒ ๊ฐ™์€๋ฐ ์ฝ”๋“œ๊นŒ์ง€ ๋ด์•ผ ๋”์šฑ ์ œ๋Œ€๋กœ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค.

 

+์ถ”๊ฐ€๋กœ ์ฐธ์กฐํ•˜๋ฉด ์ข‹์„ ๋ธ”๋กœ๊ทธ

https://jaeyeol816.github.io/neural_representation/nerf-nerf-basic-theory/