RoPE is a popular Relative Positional Encoding implementation which is used in Large Language Model like LLaMA.
In order to generalize our results in 2D to any where is even, we divide the -dimension space into sub-spaces and combine them in the merit of the linearity of the inner product, turning into:
where
is the rotary matrix with pre-defined parameters . A graphic illustration of RoPE is shown in Figure (1). Applying our RoPE to self-attention in Equation (2), we obtain:
where . Note that is an orthogonal matrix, which ensures stability during the process of encoding position information.