๐Ÿ•
AI Paper Study
  • AI Paper Study
  • Computer Vision
    • SRCNN(2015)
      • Introduction
      • CNN for SR
      • Experiment
      • ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ
    • DnCNN(2016)
      • Introduction
      • Related Work
      • DnCNN Model
      • Experiment
      • ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ
    • CycleGAN(2017)
      • Introduction
      • Formulation
      • Results
      • ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ
  • Language Computation
    • Attention is All You Need(2017)
      • Introduction & Background
      • Model Architecture
      • Appendix - Positional Encoding ๊ฑฐ๋ฆฌ ์ฆ๋ช…
  • ML Statistics
    • VAE(2013)
      • Introduction
      • Problem Setting
      • Method
      • Variational Auto-Encoder
      • ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ
      • Appendix - KL Divergence ์ ๋ถ„
  • ์ง๊ด€์  ์ดํ•ด
    • Seq2Seq
      • Ko-En Translation
Powered by GitBook
On this page
  • Formulation
  • Patch extraction and representation
  • Non-linear mapping
  • Reconstruction
  • Relationship with sparse-coding-based method
  • Training

Was this helpful?

  1. Computer Vision
  2. SRCNN(2015)

CNN for SR

Formulation

SRCNN์˜ ๊ตฌ์กฐ๋Š” ๊ธฐ์กด์˜ sparse coding based method์™€ ๊ฐ™์ด

  1. Patch extraction + representation

  2. Non-linear mapping

  3. Reconstruction

์˜ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. low-res image๋Š” SRCNN์— ํˆฌ์ž…๋˜๊ธฐ ์ „ bicubic interpolation์„ ํ†ตํ•ด ํ‚ค์šฐ๋ ค๋Š” ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ์™€ ๋™์ผํ•˜๊ฒŒ ๋งž์ถ˜๋‹ค.

Patch extraction and representation

Sparse Coding Based Method์™€ SRCNN์„ ๋น„๊ตํ•ด ๋ณด์ž.

low-res image๋กœ๋ถ€ํ„ฐ patch๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ฐ patch๋ฅผ high dimensional vector๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. CNN์—์„œ filter๋ฅผ ๊ฑฐ์น˜๋Š” ๊ฒƒ์€ ์ด๋ฏธ์ง€์˜ ๋ถ€๋ถ„์ธ patch๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. ๊ทธ patch๋ฅผ representationํ•˜๋Š” ๊ฒƒ์€ ์„ ํ˜• ์—ฐ์‚ฐ์„ ๊ฐ€ํ•ด ๋‹ค๋ฅธ vector๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋ฏ€๋กœ convolution ์—ฐ์‚ฐ๊ณผ ๊ฐ™๋‹ค.

Sparse Coding Based
SRCNN

Patch ์ถ”์ถœ

CNN filter window

Patch Representation

Convolution ์—ฐ์‚ฐ

์ด ๊ณผ์ •์—์„œ์˜ ์—ฐ์‚ฐ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

F1(Y)=max(0,W1โˆ—Y+B1)F_1(\mathbf{Y})=\mathrm{max}(0, W_1*\mathbf{Y}+B_1)F1โ€‹(Y)=max(0,W1โ€‹โˆ—Y+B1โ€‹)

Y์— W1 ํ•„ํ„ฐ๋กœ convolution ์—ฐ์‚ฐํ•˜๊ณ  bias์ธ B1์„ ๋”ํ•œ๋‹ค. ๊ทธ ์ถ” max(0,x)์ธ ReLU๋ฅผ ํ™œ์„ฑํ™”ํ•จ์ˆ˜๋กœ ์ ์šฉํ•œ๋‹ค. W1์˜ ๊ตฌ์กฐ๋Š” c*f1*f1 ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ n1 ์ฐจ์›์˜ vector๋กœ ๋Œ€ํ‘œํ•œ๋‹ค.

Non-linear mapping

Non-linear mapping์€ filter size 1*1์˜ convolution ์—ฐ์‚ฐ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

F2(Y)=max(0,W2โˆ—F1(Y)+B2)F_2(\mathbf{Y})=\mathrm{max}(0, W_2*F_1(\mathbf{Y})+B_2)F2โ€‹(Y)=max(0,W2โ€‹โˆ—F1โ€‹(Y)+B2โ€‹)

W2๋Š” n1*f2*f2 ํ…์„œ์—์„œ n2 ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ์—ฐ์‚ฐํ•œ๋‹ค. f2๊ฐ€ 1์ด๋ผ๋ฉด ์ง๊ด€์ ์œผ๋กœ Non-linear map์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ filter size๊ฐ€ 3*3, 5*5 ์ผ ๋•Œ๋„ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•œ๋‹ค. ์› image์˜ patch๊ฐ€ ์•„๋‹Œ feature map์˜ patch์— nonlinear map์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

Reconstruction

๋งˆ์ง€๋ง‰ reconstruction ๊ณผ์ •์€ non-linear map์„ ํ†ตํ•ด high-res image์˜ feature map์œผ๋กœ ๋ฐ”๊พธ์–ด์ง„ ๊ฒƒ์—์„œ high-res image๋ฅผ ๋ณต์›ํ•œ๋‹ค. ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•์—์„ , ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด overlap๋˜๋Š” ๋ถ€๋ถ„(์ž˜ ์ดํ•ด๊ฐ€ ๊ฐ€์ง€ ์•Š์Œ)์˜ ํ‰๊ท ์„ ๊ตฌํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์ด๊ฒƒ์€ ๋ฏธ๋ฆฌ ์ •์˜๋œ ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

F(Y)=W3โˆ—F2(Y)+B3F(\mathbf{Y})=W_3*F_2(\mathbf{Y})+B_3F(Y)=W3โ€‹โˆ—F2โ€‹(Y)+B3โ€‹

W3์€ n2*f3*f3์˜ ํ…์„œ์—์„œ c ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. B3๋Š” c์ฐจ์›์˜ ๋ฒกํ„ฐ์ด๋‹ค. ๋งŒ์•ฝ ์ด filter์˜ ๊ฐ’์ด average ์—ฐ์‚ฐ์œผ๋กœ ์ž‘๋™ํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค๋ฉด, ์ด์ „์˜ ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ๊ณผ์ •์„ ๊ฑฐ์น˜๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

Relationship with sparse-coding-based method

๋…ผ๋ฌธ์—์„œ sparse coding based method์„ CNN์˜ ๊ด€์ ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. dictionary์˜ ๊ฐœ์ˆ˜๊ฐ€ n1n_1n1โ€‹ ์ด๋ผ๋ฉด, f1ร—f1f_1 \times f_1f1โ€‹ร—f1โ€‹ ํฌ๊ธฐ์˜ ํŒจ์น˜๋ฅผ ์ถ”์ถœํ•ด ์„ ํ˜• ์—ฐ์‚ฐ(bias ํฌํ•จ)์„ ํ†ตํ•ด dictionary ๊ณต๊ฐ„์œผ๋กœ projectionํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ sparse coding ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋Š” convolution ์—ฐ์‚ฐ์ด ํ•˜๋Š” ๊ณผ์ •๊ณผ ๋™์ผํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋‘ ๋ฒˆ์งธ, non-linear mapping์—์„œ f2=1f_ 2=1f2โ€‹=1 (์ฆ‰, pixelwise)์ด๋ฉด convolution ์—ฐ์‚ฐ์œผ๋กœ fully connected network๋ฅผ ๋งŒ๋“  ๊ฒƒ๊ณผ ๊ฐ™๋‹ค. n1n_1n1โ€‹ ํฌ๊ธฐ์˜ low-res dictionary์—์„œ n2n_2n2โ€‹ ํฌ๊ธฐ์˜ high-res dictionary๋กœ projectionํ•œ๋‹ค.

๋งˆ์ง€๋ง‰ reconstruction ๊ณผ์ •์€, high-res patch์˜ overlap๋˜๋Š” ๋ถ€๋ถ„(์ด์ „ ๋…ผ๋ฌธ์„ ์‚ดํŽด๋ณด์•„์•ผ ํ• ๋“ฏ)์„ ํ‰๊ท ๋‚ด์–ด ์ตœ์ข… ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€์„ ๊ฒฐ์ •ํ•œ๋‹ค. ์ด ๋˜ํ•œ convolution ์—ฐ์‚ฐ์œผ๋กœ ๋™๋“ฑํ•˜๊ฒŒ ๋Œ€์ฒด ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„์ด๋‹ค.

๋”ฐ๋ผ์„œ sparse coding based method๋Š” CNN์˜ ํ•˜๋‚˜์˜ ์˜ˆ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒ๊ฐ์€ SRCNN์˜ hyperparameter๋ฅผ ๊ฒฐ์ •ํ•˜๋Š”๋ฐ ๋„์›€์„ ์ค€๋‹ค. high-res dictionary๊ฐ€ ๋” sparseํ•  ๊ฒƒ์ด๋ผ ์˜ˆ์ธก๋˜๋ฏ€๋กœ **n2<n1n_2<n_1n2โ€‹<n1โ€‹ ๋กœ ์„ค์ •ํ•œ๋‹ค. ๊ฒฐ๊ณผ๋ฌผ์ด ๋”์šฑ ๋†’์€ ํ•ด์ƒ๋„๋ฅผ ๋ณด์ด๋ฏ€๋กœ f1>f3f_1>f_3f1โ€‹>f3โ€‹ ์ด์–ด์•ผ ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋กœ patch์˜ ์ค‘์‹ฌ์— ์žˆ๋Š” ๊ฐ’ ์„ฑ๋ถ„์ด ๋”์šฑ ๋งŽ์ด ํฌํ•จ๋œ๋‹ค.

๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ์˜ ๋ชจ๋“  ๋ถ€๋ถ„์„ ํ•™์Šตํ•  ์ˆ˜ ์—†์—ˆ์ง€๋งŒ, ์ด ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ๊ฐ€ feedforward์ด๊ณ  ๋ชจ๋“  ๋ถ€๋ถ„์„ ํ•™์Šต ํ•  ๋•Œ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ reconstruction ๊ณผ์ •์—์„œ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋”์šฑ ๋งŽ์€ pixel ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์œผ๋กœ, ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค ์ฃผ์žฅํ•œ๋‹ค.

high-res dictionary๊ฐ€ ๋” sparseํ•œ ์ด์œ ๋Š” high-res๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ๊ณ ํ•ด์ƒ๋„์˜ basis๋ฅผ dictionary์— ํฌํ•จํ•˜๊ณ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. low-res dictionary๋Š” ๋‹จ์ˆœํžˆ bicubic์œผ๋กœ upscaleํ•œ ์ด๋ฏธ์ง€์˜ basis๋ฅผ ๋‹ด๊ณ  ์žˆ์–ด ํ™”์งˆ์ด ๋‚ฎ์ง€๋งŒ, non-linear mapping์„ ํ†ตํ•ด ํ™”์งˆ์„ ๊ฐœ์„ ํ–ˆ์œผ๋ฏ€๋กœ ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ basis๋งŒ์œผ๋กœ๋„ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฐ™์€ ์ •๋ณด๋ฅผ ๋‹ด์œผ๋ฏ€๋กœ ๋” sparseํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์˜คํžˆ๋ ค n1๊ณผ n2์˜ ์ˆ˜๋ฅผ ์ผ์น˜์‹œํ‚จ๋‹ค๋ฉด ๊ณผ์ ํ•ฉ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.

Training

parameter๋Š” W1, W2, W3, B1, B2, B3์ด๋‹ค. Loss function์œผ๋กœ MSE๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์ด๋Š” ์ฆ‰ PSNR(์ตœ๋Œ€ ์‹ ํ˜ธ ๋Œ€ ์žก์Œ๋น„)์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

L(ฮ˜)=1nโˆ‘i=1nโˆฃโˆฃF(Yi;ฮ˜)โˆ’Xiโˆฃโˆฃ2L(\mathrm{\Theta})={1 \over n}\sum_{i=1}^{n}||F(\mathbf{Y}_i;\mathrm{\Theta})-\mathbf{X}_i||^2L(ฮ˜)=n1โ€‹i=1โˆ‘nโ€‹โˆฃโˆฃF(Yiโ€‹;ฮ˜)โˆ’Xiโ€‹โˆฃโˆฃ2

์ตœ์ ํ™” ๋ฐฉ์‹์€ SGD์ด๊ณ  momentum=0.9๋กœ ํ•œ๋‹ค.

ฮ”i+1=0.9ฮ”iโˆ’ฮทโˆ‚Lโˆ‚Wil,ย Wi+1l=Wil+ฮ”i+1\Delta_{i+1}=0.9 \Delta_i - \eta \frac{\partial L}{\partial W^l_i},\space W^l_{i+1}=W^l_i+\Delta_{i+1}ฮ”i+1โ€‹=0.9ฮ”iโ€‹โˆ’ฮทโˆ‚Wilโ€‹โˆ‚Lโ€‹,ย Wi+1lโ€‹=Wilโ€‹+ฮ”i+1โ€‹

์ฒซ ์‹์€ momentum 0.9์˜ SGD์ด๊ณ , ๋‘ ๋ฒˆ์งธ ์‹์€ update ์‹์ด๋‹ค.

๋„คํŠธ์›Œํฌ์˜ ์ฒซ ๋‘ layer์˜ learning rate๋Š” 10e-4์ด๊ณ  ๋งˆ์ง€๋ง‰ layer๋Š” 10e-5๋กœ ํ•œ๋‹ค.

filter์˜ weight๋Š” gaussian distribution์œผ๋กœ ์ดˆ๊ธฐํ™”(mean=0, stddev=0.001)ํ•˜๊ณ , bias๋Š” 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

f1, f2, f3๋Š” ์ˆœ์„œ๋Œ€๋กœ 9, 1, 5์ด๋ฉฐ n1์€ 64, n2๋Š” 32๋กœ ์„ค์ •ํ•œ๋‹ค.

๋ฐ์ดํ„ฐ๋Š” ์‚ฌ์ง„์„ randomํ•˜๊ฒŒ cropํ•ด ๋งŒ๋“ ๋‹ค. convolution ์—ฐ์‚ฐ ๊ฒฐ๊ณผ, ๊ฐ filter์˜ ํฌ๊ธฐ๋งŒํผ ์ถœ๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ์ค„์–ด๋“ค๊ธฐ ๋•Œ๋ฌธ์— (forgโˆ’f1โˆ’f2โˆ’f3+3)2(f_{org}-f_{1}-f_{2}-f_{3}+3)^2(forgโ€‹โˆ’f1โ€‹โˆ’f2โ€‹โˆ’f3โ€‹+3)2 ํฌ๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ์ค‘์‹ฌ๋ถ€ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ๋Œ€์กฐํ•ด loss๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

PreviousIntroductionNextExperiment

Last updated 3 years ago

Was this helpful?

Fig. 3. An illustration of sparse-coding-based methods in the view of a convolutional neural network.