RoPE is a popular Relative Positional Encoding implementation which is used in Large Language Model like LLaMA.

In order to generalize our results in 2D to any where is even, we divide the -dimension space into sub-spaces and combine them in the merit of the linearity of the inner product, turning into:

where

is the rotary matrix with pre-defined parameters . A graphic illustration of RoPE is shown in Figure (1). Applying our RoPE to self-attention in Equation (2), we obtain:

where . Note that is an orthogonal matrix, which ensures stability during the process of encoding position information.


Reference