๐Ÿ“š Study/Paper Review

[Paper Review] 3D Gaussian Splatting for Real-Time Radiance Field Rendering (SIGGRAPH 2023)

์œฐ๊ฐฑ 2024. 5. 3. 21:20

3DGS๋ฅผ ์ฒ˜์Œ ๊ณต๋ถ€ํ•˜์‹œ๋Š” ๋ถ„๋“ค์ด๋ผ๋ฉด xoft๋‹˜์˜ ๋ธ”๋กœ๊ทธ์™€ ์œ ํŠœ๋ธŒ ๊ฐ•์˜๋ฅผ ๋จผ์ € ๋“ค์œผ์‹œ๋Š”๊ฑธ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์ „์ฒด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋‹ค๋ค„์ฃผ์‹œ๊ธฐ ๋•Œ๋ฌธ์— ์ดํ•ด๊ฐ€ ์‰ฝ์Šต๋‹ˆ๋‹ค :)

 

๋ณธ ๊ธ€์€ ๋…ผ๋ฌธ์„ ์ˆœ์„œ๋Œ€๋กœ ์ฝ๊ณ  ์‹ถ์€ ๋ถ„์—๊ฒŒ ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. (xoft๋‹˜์˜ ๊ธ€์„ ๋งŽ์ด ์ฐธ๊ณ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.)

๋ถ€์กฑํ•œ ์ง€์‹์œผ๋กœ ์ž‘์„ฑํ•œ ๊ธ€์ด๊ธฐ ๋•Œ๋ฌธ์— ์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์Œ๊ป ์ง€์ ํ•ด์ฃผ์„ธ์š”!


 

1. Introduction

MLP๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋Š” NeRF ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋“ค์€ ๋ Œ๋”๋ง ์†๋„๊ฐ€ ๋„ˆ๋ฌด ๋А๋ ค ์‹ค์ œ ์‘์šฉ์—๋Š” ์ œํ•œ์ ์ด์—ˆ๋Š”๋ฐ

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” 3DGS๋ฅผ ํ†ตํ•ด

(1) training ์‹œ๊ฐ„๋„ ์ด์ „ ๋ฐฉ๋ฒ•์ฒ˜๋Ÿผ ๋น ๋ฅด๊ฒŒ ๊ทธ๋ฆฌ๊ณ  (2) ํ€„๋ฆฌํ‹ฐ๋„ ์œ ์ง€ํ•˜๋ฉด์„œ (3) ๋ Œ๋”๋ง ์†๋„๋ฅผ ๋งค์šฐ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

(real-time, high-quality radiance field rendering

-

3D scene์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š” ํฌ๊ฒŒ 2๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

์ฒซ๋ฒˆ์งธ๋Š” mesh์™€ point์™€ ๊ฐ™์€ explicitํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

์ด๋Š” GPU/CUDA-based rasterization์ด ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

๋‘๋ฒˆ์งธ๋Š” MLP๋ฅผ ์‚ฌ์šฉํ•œ NeRF ๊ธฐ๋ฐ˜์˜ implicitํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

์ด๋Š” continuousํ•˜๊ฒŒ scene์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

 

3DGS๋Š” ์ด ๋‘ ๋ฐฉ๋ฒ•์˜ ์žฅ์ ๋งŒ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•˜๋Š” ๋ฉ”์†Œ๋“œ๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์ข‹๋‹ค.

์ฆ‰, rasterization์ด ๋น ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— rendering ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ 

3D ๊ฐ€์šฐ์‹œ์•ˆ์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 


 

3. Overview

 

 

(Initialization)

SFM์œผ๋กœ ์–ป์€ point cloud๋กœ๋ถ€ํ„ฐ 3D Gaussians๋ฅผ ์–ป๋Š”๋‹ค.

3D Gaussian์€ (1) point(mean) (2) covariance matrix (3) ํˆฌ๋ช…๋„ opacity $\alpha$ (4) SH-based color๋กœ ์ •์˜๋œ๋‹ค.

(Projection)

์นด๋ฉ”๋ผ pose๊ฐ’์„ ์ด์šฉํ•ด์„œ 3D Gaussian์„ 2D Gaussian์œผ๋กœ projectionํ•œ๋‹ค.

์ฆ‰, Image Plane ์œ„๋กœ ์œ„์น˜์‹œํ‚จ๋‹ค.

(Differentiable Tile Rasterizer)

Image๋ฅผ tile๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด rasterization์„ ์ง„ํ–‰ํ•œ๋‹ค.

์ฆ‰, ์—ฌ๊ธฐ์„œ rendering๋œ ์ด๋ฏธ์ง€๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒƒ์ด๋‹ค

์ดํ›„์—, GT์ด๋ฏธ์ง€์™€ ๋น„๊ตํ•˜์—ฌ loss๊ฐ’์„ ๊ตฌํ•˜๊ณ  ์ด์— ๋Œ€ํ•œ gradient๋ฅผ ์ „ํŒŒํ•œ๋‹ค.

(Adaptive Density Control)

loss๊ฐ’์„ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ Gaussian์„ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

 


4. Differentiable 3D Gaussian Splatting

(1) SFM points without normal(๋ฒ•์„ )

์ด์ „์— 2D points๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ, ๊ฐ๊ฐ์˜ ํฌ์ธํŠธ๋ฅผ normal๋ฅผ ๊ฐ–๋Š” ์ž‘์€ ํ‰๋ฉด์›(planar circle)๋กœ ๊ฐ€์ •ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์—ˆ๋‹ค.

๋™์ผํ•œ ์•„์ด๋””์–ด๋กœ ์ง„ํ–‰ํ•˜๋ ค๋‹ค ๋ณด๋‹ˆ ๋ฌธ์ œ์ ์ด ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ,

์ด๋Š” ๋ฐ”๋กœ SFM point๋“ค์€ ๋งค์šฐ sparseํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ธฐ ๋•Œ๋ฌธ์— normal์„ ์ถ”์ •ํ•˜๊ธฐ ๋งค์šฐ ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.

์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด SFM point๋กœ ์‹œ์ž‘ํ•˜์ง€๋งŒ, normal ์ •๋ณด๊ฐ€ ํ•„์š”์—†๋Š” 3D Gaussians๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ํ–ˆ๋‹ค.

(Q.์™œ 3D Gaussian์„ ์‚ฌ์šฉํ• ๊นŒ?์— ๋Œ€ํ•œ ๋Œ€๋‹ต์ด ๋  ์ˆ˜ ์žˆ๋‹ค)

3D ๊ฐ€์šฐ์‹œ์•ˆ์—์„œ ํ‰๊ท ์„ ์˜๋ฏธํ•จ

 

(2) 3D Gaussian

3D ๊ฐ€์šฐ์‹œ์•ˆ์˜ ์žฅ์ ์œผ๋กœ๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

์šฐ์„  ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ(differentiable) ํ•˜๊ณ ,

2D splat์œผ๋กœ ์‰ฝ๊ฒŒ projection ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— rendering์„ ์œ„ํ•œ α-blending ์ด ๋น ๋ฅด๋‹ค.

(3DGS์—์„œ α-blending ์ดํ•ดํ•˜๊ธฐ)

 

๋ณธ ์‹คํ—˜์—์„œ๋Š” 3D covariance matrix $\sum$์„ ์‚ฌ์šฉํ•ด ๊ฐ€์šฐ์‹œ์•ˆ๋“ค์„ ์•„๋ž˜์™€ ๊ฐ™์€ ์‹์œผ๋กœ ์ •์˜ํ–ˆ๋‹ค.

$$ G(x) = e^{-1/2(x)^{T}\sum^{-1}(x)} $$

๊ฐ๊ฐ์˜ ๊ฐ€์šฐ์‹œ์•ˆ์€ blending process์—์„œ ํˆฌ๋ช…๋„ ๊ฐ’ $\alpha$ ์™€ ๊ณฑํ•ด์ง„๋‹ค.

 

(3) Covariance Matrix $\sum$

$$\sum = RSS^{T}R^{T}$$

  • $R$์€ rotation matrix, $S$๋Š” scale matrix
    • ์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด, ๊ฐ€์šฐ์‹œ์•ˆ์ด ์–ผ๋งˆ๋‚˜ ํšŒ์ „๋˜์–ด ์žˆ๊ณ  ํฌ๊ธฐ๊ฐ€ ์–ผ๋งŒํผ์ธ์ง€์— ๋Œ€ํ•œ ๋ณ€์ˆ˜์ด๋‹ค.

(Q.์™œ transpose๋ฐฐ์—ด์„ ๊ณฑํ•˜๋Š”์ง€?์— ๋Œ€ํ•œ ๋Œ€๋‹ต)

  • positive semi-definite(์–‘์˜ ์ค€์ •๋ถ€ํ˜ธ)์„ ๋งŒ์กฑํ•˜๊ธฐ ์œ„ํ•œ matrix
3D ๊ฐ€์šฐ์‹œ์•ˆ์—์„œ ํผ์ง„ ์ •๋„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ถ„์‚ฐ์„ ์˜๋ฏธํ•จ

 

(4) 3D->2D projection

renderingํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” 3D๊ฐ€์šฐ์‹œ์•ˆ์„ 2D๊ฐ€์šฐ์‹œ์•ˆ์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค.

๊ทธ ๊ณผ์ •์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$  \sum^{'} = JW \sum W^{T}J^{T} $$

  • $\sum$: word ์ขŒํ‘œ๊ณ„์—์„œ์˜ covariance matrix
  • $W$: word ์ขŒํ‘œ๊ณ„์—์„œ camera ์ขŒํ‘œ๊ณ„๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” viewing transformation
  • $J$: camera ์ขŒํ‘œ๊ณ„์—์„œ image ์ขŒํ‘œ๊ณ„๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” projective transformation์˜ affine ๊ทผ์‚ฌ์ธ Jacobian
  • ์ฆ‰, world ์ขŒํ‘œ๊ณ„(3D)์—์„œ image ์ขŒํ‘œ๊ณ„(2D)๋กœ ๋ฐ”๊พธ๋Š” ์‹์ด๋‹ค.

 

(5)  gradient ๋ฏธ๋ถ„ ๋ฐฉ๋ฒ•

์ž๋™ ๋ฏธ๋ถ„์— ๋Œ€ํ•œ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—,

๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ gradient๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๊ณ„์‚ฐํ–ˆ๋‹ค. (1. position 2. covariance 3. opacity 4. color)

(appendix A. ์ฐธ๊ณ )

 

 

 


5. Optimization with Adaptive Density Control of 3D Gaussians

์ตœ์ข…์ ์œผ๋กœ๋Š” optimization์„ ํ†ตํ•ด

radiance field๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” 3D ๊ฐ€์šฐ์‹œ์•ˆ์„ ์–ป๊ธฐ ์œ„ํ•ด ์•„๋ž˜์˜ parameter๋“ค์„ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

  • $p$: ์œ„์น˜์ •๋ณด(positions)
  • $\alpha$: ํˆฌ๋ช…๋„
  • $\sum$: covariance matrix
  • $c$: SH ๊ธฐ๋ฐ˜์˜ color

 

ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๊ณผ์ •์€ Gaussian์˜ density๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๊ฒƒ๊ณผ ๊ธด๋ฐ€ํžˆ ์—ฐ๊ด€๋˜์–ด ์žˆ๋‹ค. (5.2์—์„œ ์ž์„ธํžˆ)

 


5.1 Optimization

optimization์€ renderingํ•˜๊ณ  ์ด ๊ฒฐ๊ณผ๋ฌผ(resulting image)์„ GT ์ด๋ฏธ์ง€์™€ comparingํ•˜๋Š” ๊ณผ์ •์˜ ์—ฐ์†์ด๋‹ค.

์ด๋•Œ 3D์—์„œ 2D๋กœ projectionํ•˜๋Š” ๊ณผ์ •์—์„œ ์ž˜๋ชป positioned๋œ geometry๊ฐ€ ๋ฐœ์ƒ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์กด์žฌํ•œ๋‹ค.

๋”ฐ๋ผ์„œ optimization์„ ํ†ตํ•ด,

(1) geometry๋ฅผ ์ƒ์„ฑํ•˜๊ณ  (2) ์ž˜๋ชป positioned๋œ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋Š” geometry๋ฅผ destory ํ˜น์€ move ํ•œ๋‹ค.

 

(์‹คํ—˜์—์„œ ์‚ฌ์šฉํ•œ detail)

  • SGD ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • $\alpha$:  ์‹œ๊ทธ๋ชจ์ด๋“œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(sigmoid activation function) ์‚ฌ์šฉ
    •  0๊ณผ 1์˜ ๊ฐ’ ์‚ฌ์ด๋กœ ์ œํ•œํ•˜๊ณ , smooth gradient๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด
  • $S$(scale of the covariance): ์ง€์ˆ˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(exponential activation function) ์‚ฌ์šฉ
  • covariance matrix์˜ ์ดˆ๊ธฐ๊ฐ’์€
    ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์„ธ ์ ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์˜ ํ‰๊ท ์„ ์ถ•์œผ๋กœ ํ•˜๋Š” ์ด๋ฐฉ์„ฑ ๊ฐ€์šฐ์‹œ์•ˆ(isotropic Gaussian)

Isotropic Gaussian: (ํŠน์ • ์†์„ฑ์ด) ๋ฐฉํ–ฅ์— ๋”ฐ๋ผ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ๊ฐ€์šฐ์‹œ์•ˆ (๊ตฌ ๋ชจ์–‘)
<->
Anisotropic Gaussian: (ํŠน์ • ์†์„ฑ์ด) ๋ฐฉํ–ฅ์— ๋”ฐ๋ผ ๋ณ€ํ•˜๋Š” ๊ฐ€์šฐ์‹œ์•ˆ (ํƒ€์›์ฒด ๋ชจ์–‘)
  • $p$(positions): standard exponential decay scheduling ๊ธฐ๋ฒ• ์‚ฌ์šฉ (plenoxel์—์„œ์ฒ˜๋Ÿผ)
  • loss function์€ $L_{1}$๊ณผ $D-SSIM$์„ ํ•ฉ์นœ ์•„๋ž˜์˜ ์‹
    $$ L = (1-\lambda)L_{1} + \lambda L_{D-SSIM}$$
    : $\lambda$๋Š” 0.2 ์‚ฌ์šฉ

5.2 Adaptive Control of Gaussians

SFM์œผ๋กœ๋ถ€ํ„ฐ ์–ป์€ sparse points๋“ค์„ ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์‹œ์ž‘ํ•˜์—ฌ

๋ฉ”์†Œ๋“œ๋ฅผ ์ ์šฉํ•ด

Gaussians์˜ ์ˆ˜์™€ ๋‹จ์œ„ ์ฒด์ (unit volume)๋‹น ๋ฐ€๋„ ๊ฐ’์„ ์กฐ์ ˆ(control)ํ•œ๋‹ค.

๋‹ค์‹œ ๋งํ•ด, sparseํ•œ Gaussians์œผ๋กœ ์‹œ์ž‘ํ•ด scene์„ ๋” ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” denseํ•œ ์ง‘ํ•ฉ์œผ๋กœ!
scene์„ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” 3D Gaussian๋“ค์„ ์ฐพ๋Š” ๊ณผ์ •์ด๋ผ๊ณ  ํ•ด์„ํ•˜๋ฉด ์ข‹๋‹ค.

 

์ดˆ๊ธฐ warm-up ํ›„์—๋Š”

100๋ฒˆ์˜ iteration๋งˆ๋‹ค ๊ฐ€์šฐ์‹œ์•ˆ์„ ์กฐ๋ฐ€ํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ , ์ฃผ๋กœ threshold๋ณด๋‹ค ์ž‘์€ ํˆฌ๋ช…ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ œ๊ฑฐํ•œ๋‹ค.

(Remove Gaussian)

๊ทธ๋ฆฌ๊ณ , ๋นˆ ์˜์—ญ(empty area)๋ฅผ ์ฑ„์šฐ๊ธฐ ์œ„ํ•œ ์ž‘์—…์„ ํ•œ๋‹ค.

์ด๋•Œ, under-reconstruction ๋ถ€๋ถ„๊ณผ over-reconstruction ๋ถ€๋ถ„์— ์ง‘์ค‘ํ•˜๋Š”๋ฐ

์ด ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ view-space positional gradient๊ฐ€ ํฌ๋‹ค๋Š” ๊ฒƒ์„ ๊ด€์ฐฐ๋œ๋‹ค.

under๊ณผ over ๋ชจ๋‘ ์ž˜ reconstruction๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— optimization์€ ์ด๋ฅผ ๊ณ ์น˜๊ธฐ ์œ„ํ•ด ๊ฐ€์šฐ์‹œ์•ˆ์„ moveํ•˜๋ ค ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์ข‹๋‹ค.

view-space positional gradient์— ๋Œ€ํ•œ ์„ค๋ช…

view-space๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์นด๋ฉ”๋ผ ๊ธฐ์ค€์œผ๋กœ world๋ฅผ ๋ฐ”๋ผ๋ณธ ๊ณต๊ฐ„์„ ์˜๋ฏธํ•œ๋‹ค.
rendering๋œ ์ด๋ฏธ์ง€์™€ GT ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ loss๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ฒŒ ๋˜๋‹ˆ view-space๋ผ๋Š” ๊ฒƒ์€ ์ถฉ๋ถ„ํžˆ ์ดํ•ด๊ฐ€ ๊ฐ„๋‹ค.



positional gradient๋ž€ loss๋ฅผ position์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•œ ๊ฐ’์ด๋‹ค.
positional gradient๊ฐ€ ํฌ๋‹ค = loss๊ฐ€ ํฌ๋‹ค = ๊ฐ€์šฐ์‹œ์•ˆ์˜ position ์ˆ˜์ •์ด ํ•„์š”ํ•˜๋‹ค
์œ„์™€ ๊ฐ™์ด ํ•ด์„ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, positional gradient๊ฐ€ ํฐ ๊ฒฝ์šฐ๋Š” ๊ฐ€์šฐ์‹œ์•ˆ์„ cloneํ•˜๊ฑฐ๋‚˜ splitํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

 

under-reconstruction์ธ์ง€ over-reconstruction์ธ์ง€์— ๋”ฐ๋ผ์„œ optimizeํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค๋ฅด๋‹ค.

์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ ์ด๋ฅผ ์ž์„ธํžˆ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

1. under-reconstruction๋œ ๊ฒฝ์šฐ

์šฐ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” geometry(๊ฒ€์€์ƒ‰)๋ณด๋‹ค ๊ฐ€์šฐ์‹œ์•ˆ์ด ์ž‘์€ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•œ๋‹ค.

์ด๋•Œ๋Š” ๊ฐ€์šฐ์‹œ์•ˆ์„ cloneํ•˜๊ฒŒ ๋˜๋Š”๋ฐ

(1) ๋™์ผํ•œ ์‚ฌ์ด์ฆˆ์˜ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ƒ์„ฑ(create)ํ•˜๊ณ  (2) positional gradient ๋ฐฉํ–ฅ์œผ๋กœ ์ƒ์„ฑ๋œ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ด๋™์‹œํ‚จ๋‹ค.

(Clone Gaussian)

total volume ์ฆ๊ฐ€, ๊ฐ€์šฐ์‹œ์•ˆ ์ˆ˜ ์ฆ๊ฐ€

 

2. over-reconstruction๋œ ๊ฒฝ์šฐ

์šฐ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” geometry(๊ฒ€์€์ƒ‰)๋ณด๋‹ค ๊ฐ€์šฐ์‹œ์•ˆ์ด ํฐ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•œ๋‹ค.

์ด๋•Œ๋Š” ๊ฐ€์šฐ์‹œ์•ˆ์„ splitํ•˜๊ฒŒ ๋˜๋Š”๋ฐ

(1) ๊ธฐ์กด์˜ ๊ฐ€์šฐ์‹œ์•ˆ์„ 2๊ฐœ์˜ ์ƒˆ๋กœ์šด ๊ฐ€์šฐ์‹œ์•ˆ์œผ๋กœ ๋Œ€์ฒด(replace)ํ•˜๊ณ , 

(2) ์ƒˆ๋กœ์šด ๊ฐ€์šฐ์‹œ์•ˆ์˜ ์œ„์น˜๋Š” ๊ธฐ์กด์˜ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ƒ˜ํ”Œ๋งํ•˜๊ธฐ ์œ„ํ•œ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜(PDF)๋กœ ์ดˆ๊ธฐํ™”๋œ๋‹ค.

(Split Gaussian)

total volume ์œ ์ง€, ๊ฐ€์šฐ์‹œ์•ˆ ์ˆ˜ ์ฆ๊ฐ€

 

๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ€์šฐ์‹œ์•ˆ์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ the number of Gaussians ์ ์ ˆํžˆ ์กฐ์ ˆํ•˜๊ธฐ ์œ„ํ•ด

$N=3000$์ผ ๋•Œ ํˆฌ๋ช…๋„ $\alpha$๋ฅผ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์œผ๋กœ ์„ค์ •ํ•œ๋‹ค.

optimization๊ณผ์ •์„ ํ†ตํ•ด์„œ $\alpha$ ๊ฐ’์€ ๊ณ„์† ์ฆ๊ฐ€ํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—

$\alpha$๊ฐ’์„ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ค„์—ฌ์คŒ์œผ๋กœ์จ ๋ถˆํ•„์š”ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

> ์ด๋Š” ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ floater์„ ์ œ๊ฑฐํ•˜๊ณ  ์ค‘๋ณต๋œ ๊ฐ€์šฐ์‹œ์•ˆ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š”๋‹ค.

 


6. Fast Differentiable Rasterizer for Gaussians

๋ณธ ๋…ผ๋ฌธ์˜ ์™„์ „ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ rasterizer์€

memory consumption์ด ์ ๊ณ , pixel๋‹น ์ƒ์ˆ˜ ์˜ค๋ฒ„ํ—ค๋“œ๋งŒ ํ•„์š”ํ•˜์—ฌ

์ž„์˜ ๊ฐœ์ˆ˜์˜ ํ˜ผํ•ฉ๋œ ๊ฐ€์šฐ์‹œ์•ˆ์— ๋Œ€ํ•œ ํšจ์œจ์ ์ธ back propagation์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿผ ์•„๋ž˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ˆœ์„œ๋Œ€๋กœ ์„ค๋ช…๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

1. ๊ฐ€์šฐ์‹œ์•ˆ ์ œ๊ฑฐ (Cull Gaussian)

์นด๋ฉ”๋ผ๊ฐ€ ๋ฐ”๋ผ๋ณด๋Š” ์˜์—ญ์ธ view frustum์— ์—†๋Š” 3D ๊ฐ€์šฐ์‹œ์•ˆ์€ ๋ชจ๋‘ ์ œ๊ฑฐํ•œ๋‹ค.

์ด๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ธฐ์ค€์€, view frustum์™€ ๊ต์ฐจํ•˜๋Š” ๋ถ€๋ถ„์ด 99%์˜ ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๊ฐ€์ ธ์•ผ ํ•œ๋‹ค.

๋˜ํ•œ, Image plane์— ๊ฐ€๊นŒ์šฐ๋ฉด์„œ view frustum์— ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„

๊ทน๋‹จ์ ์ธ ์œ„์น˜์— ์žˆ๋Š” ๊ฐ€์šฐ์‹œ์•ˆ์— ๋Œ€ํ•ด์„œ๋Š” guard band ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ œ๊ฑฐํ•œ๋‹ค.

(view frustum์— ๋Œ€ํ•œ ์„ค๋ช…)

 

 

2.  3D ๊ฐ€์šฐ์‹œ์•ˆ -> 2D ๊ฐ€์šฐ์‹œ์•ˆ (Screen Space Gaussian)

3D ๊ฐ€์šฐ์‹œ์•ˆ์„ Image Plane์„ projectionํ•˜์—ฌ 2D ๊ฐ€์šฐ์‹œ์•ˆ์„ ์–ป๋Š” ๊ณผ์ •์ด๋‹ค.

์ด๋•Œ, 3D ๊ฐ€์šฐ์‹œ์•ˆ์˜ $M$(mean)๊ณผ $S$(covariance)๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์นด๋ฉ”๋ผ ํฌ์ฆˆ($V$)๊นŒ์ง€ ํ•„์š”ํ•˜๋‹ค.

projection์„ ํ†ตํ•ด Image Plane์—์„œ 2D ๊ฐ€์šฐ์‹œ์•ˆ์˜ $M'$๊ณผ $S'$์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

3.  Tile ๋งŒ๋“ค๊ธฐ (Create Tiles)

์ด๋ฏธ์ง€๋ฅผ 16x16 tile๋กœ ์ž‘๊ฒŒ ๋‚˜๋ˆˆ๋‹ค.

 

4.  ๊ฐ 2D ๊ฐ€์šฐ์‹œ์•ˆ์—๊ฒŒ key ํ• ๋‹น (Duplicate with Keys)

๊ฒน์น˜๋Š”(overlap) tile์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ์„œ 2D Gaussian์„ instanceํ™” ํ•œ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ฐ๊ฐ์˜ instance(๊ฐ€์šฐ์‹œ์•ˆ)์— view space depth์™€ tile ID๋ฅผ ๊ฒฐํ•ฉํ•œ key๋ฅผ ํ• ๋‹นํ•œ๋‹ค.

์•ž์ชฝ 32bit๋Š” depth์ •๋ณด๊ฐ€ encoding๋˜๊ณ ,

๋’ท์ชฝ 32bit๋Š” overlap๋˜๋Š” tile์˜ index๊ฐ€ encoding๋œ๋‹ค.

 

(Q. ์™œ tile์ˆ˜๋กœ ์ธ์Šคํ„ด์Šคํ™”ํ• ๊นŒ?์— ๋Œ€ํ•œ ๋Œ€๋‹ต)

 

5.  key์— ๋”ฐ๋ผ์„œ ๊ฐ€์šฐ์‹œ์•ˆ ์ •๋ ฌ (Sort by Keys)

์œ„์—์„œ ๋ถ€์—ฌํ•œ key๋กœ ๊ฐ€์šฐ์‹œ์•ˆ์„ sortingํ•œ๋‹ค.

์ด๋•Œ, tile๋งˆ๋‹ค ๋ชจ๋“  splat์— ๋Œ€ํ•ด depth ordering๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

์ฆ‰, ๊ฐ tile ์œ„์—๋Š” ์—ฌ๋Ÿฌ splat(2D ๊ฐ€์šฐ์‹œ์•ˆ)์ด ๊ฒน์ณ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ๊นŠ์ด ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

6. tile๋งˆ๋‹ค Gaussian  list ์ƒ์„ฑ (Identify Tile Ranges)

๊ฐ€์šฐ์‹œ์•ˆ์„ ์ •๋ ฌํ•œ ์ดํ›„์—, ๊ฐ๊ฐ์˜ tile์—์„œ

๊นŠ์ด ์ˆœ์„œ(depth-sorted)๋Œ€๋กœ ์ •๋ ฌํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ์ค‘ ์ฒ˜์Œ๊ณผ ๋์„ ์‹๋ณ„ํ•จ์œผ๋กœ์จ list๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

7. ๊ฐ€์šฐ์‹œ์•ˆ list๋ฅผ ์ฝ๊ธฐ (Get Tile Range)

์ด๋ฏธ์ง€์˜ ๋ชจ๋“  tile์— ๋Œ€ํ•ด์„œ Range r์„ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

 

8. ์ˆœ์„œ๋Œ€๋กœ ํ˜ผํ•ฉ (Blend in Order)

๊ฐ๊ฐ์˜ tile๋งˆ๋‹ค ํ•˜๋‚˜์˜ thread block์œผ๋กœ ์‹คํ–‰๋œ๋‹ค.

๊ฐ๊ฐ์˜ block์€ ์•ž์—์„œ๋ถ€ํ„ฐ ๋’ค๋กœ ์ˆœํšŒํ•˜๋ฉด์„œ color์™€ $\alpha$๊ฐ’์„ ์ถ•์ (accumulate)ํ•œ๋‹ค.

(์ฆ‰, ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€๊นŒ์šด ๊ฐ€์šฐ์‹œ์•ˆ๋“ค์„ ๋จผ์ € ์ถ•์ ํ•˜๊ฒ ์ง€)

 

ํ”ฝ์…€์˜ $\alpha$ ๊ฐ’์ด ๋ชฉํ‘œ ํฌํ™”(saturation) ์ˆ˜์ค€์— ๋„๋‹ฌํ•˜๋ฉด(1์˜ ๊ฐ’์— ๊ทผ์‚ฌ), ํ•ด๋‹น thread๋Š” ์ค‘์ง€๋œ๋‹ค.

์ด๋Š”, ํ•ด๋‹น ํ”ฝ์…€์ด ์›ํ•˜๋Š” ํˆฌ๋ช…๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€ ์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

์ด์ „ ์—ฐ๊ตฌ์™€ ๋‹ค๋ฅด๊ฒŒ gradient update๋ฅผ ๋ฐ›๋Š” Gaussian์˜ ๊ฐœ์ˆ˜๋ฅผ ์ œํ•œํ•˜์ง€ ์•Š์•˜๋‹ค.

์ด ๋•๋ถ„์—, scene์— ๋”ฐ๋ฅธ hyperparameter tuning ์—†์ด๋„, ๋‹ค์–‘ํ•œ ๊นŠ์ด ๋ณต์žก์„ฑ(depth complexity)์„ ๊ฐ€์ง„ ์žฅ๋ฉด์„ ๋‹ค๋ฃจ๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค.


7. Implementation, Results and Evaulation

PSNR(์˜์ƒ ํ™”์งˆ ์†์‹ค์–‘ ํ‰๊ฐ€์ง€ํ‘œ): ํด์ˆ˜๋ก ์ข‹์Œ
SSIM(์˜์ƒ ํ™”์งˆ ์†์‹ค์–‘ ํ‰๊ฐ€์ง€ํ‘œ): ํด์ˆ˜๋ก ์ข‹์Œ
LPIPS(์ด๋ฏธ์ง€ ๊ฐ„์˜ ์ธ๊ฐ„์˜ ์‹œ๊ฐ์ ์ธ ์œ ์‚ฌ์„ฑ์„ ์ธก์ •): ์ž‘์„์ˆ˜๋ก ์ข‹์Œ

Instant-NeRF์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ train์†๋„๋Š” ๋น„์Šทํ•˜๋‚˜, ํ€„๋ฆฌํ‹ฐ์ ์œผ๋กœ ๋” ์šฐ์ˆ˜ํ•  ๋ฟ ์•„๋‹ˆ๋ผ (PSNR, SSIM, LPIPS) ๋ Œ๋”๋ง ์‹œ๊ฐ„์ด ์••๋„์ ์œผ๋กœ ๋นจ๋ž๋‹ค.

๋˜ํ•œ, Mip-NeRF360๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ๋Š” ํ€„๋ฆฌํ‹ฐ๋Š” ์œ ์‚ฌํ•˜๋ฉด์„œ๋„ train๊ณผ ๋ Œ๋”๋ง ์‹œ๊ฐ„์ด ๋งค์šฐ ๋นจ๋ž๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์•„์‰ฌ์šด ์ ์œผ๋กœ๋Š”, point cloud๋ฅผ ์‚ฌ์šฉํ•˜๋‹ค ๋ณด๋‹ˆ๊นŒ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋†’์•˜๋‹ค.

 

(Mip-NeRF360์€ A100 4์žฅ, ๋‚˜๋จธ์ง€๋Š” A6000์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.)

 

7.4 Limitations

1. scene์ด ์ž˜ ๊ด€์ฐฐ๋˜์ง€ ์•Š์€ ์ง€์—ญ์—์„œ๋Š” ์•„ํ‹ฐํŒฉํŠธ ๋ฐœ์ƒ

(Mip-NeRF์—์„œ๋„ ์œ ์‚ฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ฒช๊ณ  ์žˆ๋‹ค.)

 

2. popping ์•„ํ‹ฐํŒฉํŠธ | optimization ๊ณผ์ •์—์„œ ํฐ ๊ฐ€์šฐ์‹œ์•ˆ์ด ์ƒ์„ฑ๋˜๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒ

์ด ํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๋Š” ์ด์œ ๋Š” rasterizer๋‹จ๊ณ„์—์„œ guard band๋ฅผ ๊ฑฐ์นœ rejection of Graussian ๋•Œ๋ฌธ์ด๊ธฐ์—

์กฐ๊ธˆ ๋” ์ด๋ก ์ ์ธ culling approach๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ํ•ด๊ฒฐ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•œ๋‹ค.

๋‘๋ฒˆ์งธ ์ด์œ ๋กœ๋Š”, depth/blending order์„ ๊ฐ‘์ž‘์Šค๋Ÿฝ๊ฒŒ switchingํ•˜๋Š” Gaussian์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•œ visibility ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋Š” antialising์œผ๋กœ ํ•ด๊ฒฐ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ฉฐ, ๋ฏธ๋ž˜ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•ด ๋‚จ๊ฒจ๋‘์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 

3. ์–ด๋– ํ•œ regularization๋„ ์ ์šฉํ•  ์ˆ˜ ์—†๋‹ค.

์ด๋ฅผ ๋„์ž…ํ•˜๋ฉด, unseen ์ง€์—ญ์ด๋‚˜ popping artifact ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

4. ์ด์ „ ์—ฐ๊ตฌ์˜ point๊ธฐ๋ฐ˜์— ๋น„ํ•ด์„œ๋Š” compactํ• ์ง€๋ผ๋„, NeRF ๋ชจ๋ธ์— ๋น„ํ•˜๋ฉด memory consumption์ด ๋งค์šฐ ํฌ๋‹ค

large scene์— ๋Œ€ํ•ด์„œ๋Š” GPU ์‚ฌ์šฉ๋Ÿ‰์˜ ํ”ผํฌ๊ฐ€ 20G๋ฅผ ์ดˆ๊ณผํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฌธ์ œ๋Š”, instant-NeRF์™€ ๊ฐ™์ด optimization logic์˜ low-level implementation์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

 

 


์ฐธ๊ณ  ๋ธ”๋กœ๊ทธ ๋งํฌ

https://clean-dragon.tistory.com/13

 

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] 3D Gaussian Splatting

๋ณธ ํฌ์ŠคํŒ…์€ ๋…ผ๋ฌธ 3D Gaussian Splatting for real-time radiance field rendering ๋ฅผ ์ฝ๊ณ  ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ์•„๋Š” ๊ฒƒ์ด ๋งŽ์ด ์—†์–ด์„œ ๋ถ€์กฑํ•œ ๋ถ€๋ถ„์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ํ˜น์—ฌ๋‚˜ ํ‹€๋ฆฐ๋ถ€๋ถ„ ์žˆ๋‹ค๋ฉด ์ง€์ ํ•ด์ฃผ์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค!

clean-dragon.tistory.com

https://xoft.tistory.com/51

 

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] 3D Gaussian Splatting (SIGGRAPH 2023) : ๋žœ๋”๋ง ์†๋„/ํ€„๋ฆฌํ‹ฐ ๊ฐœ์„ 

3D Gaussian Splatting for Real-Time Radiance Field Rendering, Bernhard Kerbl, SIGGRAPH 2023 NeRF๋ถ„์•ผ์—์„œ ๋œจ๊ฑฐ์šด ์ด์Šˆ๊ฐ€ ๋œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. NeRF์—์„œ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜๋Š” Task์™€ ๋™์ผํ•˜๊ฒŒ, ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€์™€ ์ดฌ์˜ pose ๊ฐ’์ด ์ฃผ์–ด์ง€

xoft.tistory.com