🍕
AI Paper Study
  • AI Paper Study
  • Computer Vision
    • SRCNN(2015)
      • Introduction
      • CNN for SR
      • Experiment
      • 구현해보기
    • DnCNN(2016)
      • Introduction
      • Related Work
      • DnCNN Model
      • Experiment
      • 구현해보기
    • CycleGAN(2017)
      • Introduction
      • Formulation
      • Results
      • 구현해보기
  • Language Computation
    • Attention is All You Need(2017)
      • Introduction & Background
      • Model Architecture
      • Appendix - Positional Encoding 거리 증명
  • ML Statistics
    • VAE(2013)
      • Introduction
      • Problem Setting
      • Method
      • Variational Auto-Encoder
      • 구현해보기
      • Appendix - KL Divergence 적분
  • 직관적 이해
    • Seq2Seq
      • Ko-En Translation
Powered by GitBook
On this page

Was this helpful?

  1. Language Computation
  2. Attention is All You Need(2017)

Appendix - Positional Encoding 거리 증명

PEpos={sin⁡(pos/100002i/dmodel)dim=2icos⁡(pos/100002i/dmodel)dim=2i+1PE_{pos}=\begin{cases} \sin(pos/10000^{2i/d_{model}}) & dim=2i \\ \cos(pos/10000^{2i/d_{model}}) & dim=2i+1\end{cases}PEpos​={sin(pos/100002i/dmodel​)cos(pos/100002i/dmodel​)​dim=2idim=2i+1​

이므로, 특정 position p에서의 positional Encoding vector는

PEp=[sin⁡(p/100001/d)cos⁡(p/100001/d)sin⁡(p/100002/d)cos⁡(p/100002/d)⋮sin⁡(p/10000)cos⁡(p/10000)]=[sin⁡(ω1p)cos⁡(ω1p)sin⁡(ω2p)cos⁡(ω2p)⋮sin⁡(ωd/2p)cos⁡(ωd/2p)]PE_p=\begin{bmatrix} \sin(p/10000^{1/d}) \\ \cos(p/10000^{1/d}) \\ \sin(p/10000^{2/d}) \\ \cos(p/10000^{2/d}) \\ \vdots \\ \sin(p/10000) \\ \cos(p/10000) \\\end{bmatrix}=\begin{bmatrix} \sin(\omega_1 p) \\ \cos(\omega_1 p) \\ \sin( \omega_2 p) \\ \cos(\omega_2 p) \\ \vdots \\ \sin(\omega_{d/2} p) \\ \cos(\omega_{d/2} p) \\\end{bmatrix}PEp​=​sin(p/100001/d)cos(p/100001/d)sin(p/100002/d)cos(p/100002/d)⋮sin(p/10000)cos(p/10000)​​=​sin(ω1​p)cos(ω1​p)sin(ω2​p)cos(ω2​p)⋮sin(ωd/2​p)cos(ωd/2​p)​​

pos 1차이의 거리를 계산하기 위해 회전 행렬을 다음과 같이 정의하자.

Mi(k)=[cos⁡(ωik)sin⁡(ωik)−sin⁡(ωik)cos⁡(ωik)]M_i(k)=\begin{bmatrix} \cos(\omega_i k) & \sin(\omega_i k) \\ -\sin(\omega_i k) & \cos(\omega_i k) \end{bmatrix}Mi​(k)=[cos(ωi​k)−sin(ωi​k)​sin(ωi​k)cos(ωi​k)​]

그러면 k번째 이후의 positional encoding vector는

PEp+k=[M1(k)⋯M2(k)⋮Md/2(k)][sin⁡(ω1p)cos⁡(ω1p)sin⁡(ω2p)cos⁡(ω2p)⋮sin⁡(ωd/2p)cos⁡(ωd/2p)]PE_{p+k}=\begin{bmatrix}M_1(k)&\cdots \\ &M_2(k) \\ \vdots \\&&&& M_{d/2}(k)\end{bmatrix}\begin{bmatrix} \sin(\omega_1 p) \\ \cos(\omega_1 p) \\ \sin( \omega_2 p) \\ \cos(\omega_2 p) \\ \vdots \\ \sin(\omega_{d/2} p) \\ \cos(\omega_{d/2} p) \\\end{bmatrix}PEp+k​=​M1​(k)⋮​⋯M2​(k)​​​Md/2​(k)​​​sin(ω1​p)cos(ω1​p)sin(ω2​p)cos(ω2​p)⋮sin(ωd/2​p)cos(ωd/2​p)​​

하나의 iii 에 해당하는 거리를 구하면

Dk,i=PEp+k,i−PEp,i=(Mi(k)−I)[sin⁡(ωip)cos⁡(ωip)]D_{k,i}=PE_{p+k,i}-PE_{p,i}=\left(M_i(k)-I\right)\begin{bmatrix} \sin(\omega_i p) \\ \cos(\omega_i p) \end{bmatrix}Dk,i​=PEp+k,i​−PEp,i​=(Mi​(k)−I)[sin(ωi​p)cos(ωi​p)​]
Dk,i⊤Dk,i=[sin⁡(ωip)cos⁡(ωip)](Mi(k)⊤−I)(Mi(k)−I)[sin⁡(ωip)cos⁡(ωip)]D_{k,i}^\top D_{k,i}=\begin{bmatrix} \sin(\omega_i p) &\cos(\omega_i p) \end{bmatrix}(M_i(k)^\top-I)\left(M_i(k)-I\right)\begin{bmatrix} \sin(\omega_i p) \\ \cos(\omega_i p) \end{bmatrix}Dk,i⊤​Dk,i​=[sin(ωi​p)​cos(ωi​p)​](Mi​(k)⊤−I)(Mi​(k)−I)[sin(ωi​p)cos(ωi​p)​]
=[sin⁡(ωip)cos⁡(ωip)](2I−Mi(k)−Mi(k)⊤)[sin⁡(ωip)cos⁡(ωip)]=\begin{bmatrix} \sin(\omega_i p) &\cos(\omega_i p) \end{bmatrix}(2I-M_i(k)-M_i(k)^\top)\begin{bmatrix} \sin(\omega_i p) \\ \cos(\omega_i p) \end{bmatrix}=[sin(ωi​p)​cos(ωi​p)​](2I−Mi​(k)−Mi​(k)⊤)[sin(ωi​p)cos(ωi​p)​]
=[sin⁡(ωip)cos⁡(ωip)][2−2cos⁡(ωik)002−2cos⁡(ωik)][sin⁡(ωip)cos⁡(ωip)]=\begin{bmatrix} \sin(\omega_i p) &\cos(\omega_i p) \end{bmatrix}\begin{bmatrix} 2-2\cos(\omega_ik)&0\\ 0&2-2\cos(\omega_ik)\end{bmatrix}\begin{bmatrix} \sin(\omega_i p) \\ \cos(\omega_i p) \end{bmatrix}=[sin(ωi​p)​cos(ωi​p)​][2−2cos(ωi​k)0​02−2cos(ωi​k)​][sin(ωi​p)cos(ωi​p)​]
=2−2cos⁡(ωik)=2-2\cos(\omega_ik)=2−2cos(ωi​k)

이것을 모든 i에 대해 더해 루트를 씌우자.

∑i=1d/2(2−2cos⁡(ωik))=d−2∑i=1d/2cos⁡(ωik)\sqrt{\sum_{i=1}^{d/2}(2-2\cos(\omega_ik))}=\sqrt{d-2\sum_{i=1}^{d/2} \cos(\omega_ik)}i=1∑d/2​(2−2cos(ωi​k))​=d−2i=1∑d/2​cos(ωi​k)​

이 값은 position p와는 무관하게 거리 k에 따른 값이다. 따라서 position 차이가 같다면 positional encoding vector의 거리 또한 거리가 항상 같다!

대표적으로, 1 position 차이나는 경우, k=1이고

d−2∑i=1d/2cos⁡(ωi)\sqrt{d-2\sum_{i=1}^{d/2} \cos(\omega_i)}d−2i=1∑d/2​cos(ωi​)​
PreviousModel ArchitectureNextVAE(2013)

Last updated 3 years ago

Was this helpful?