Hoai-Chau Tran
Search
Search
Dark mode
Light mode
Explorer
notes
Adam
AdamW
Attention with Linear Biases (ALiBi)
Auto regressive decoding
Backpropagation
Batch Norm
Convolution
Convolutional networks
Decision Tree
Deep Learning
Euler's Formula
Gradient Descent
Group-Query Attention
Index
kernels
KV cache
Large Language Model (llm)
Layer Norm
LLaMA
LLaMA 2
LLaMA 3.1
Multi-Query Attention
neuron networks
Perceptron
Relative Positional Encoding
Residual Connection
Rotary Position Embeddings (RoPE)
SVM
Transformer
papers
Introduction to probability for data science
RoFormer Enhanced Transformer with Rotary Position Embedding
Towards Efficient Generative Large Language Model Serving A Survey from Algorithms to Systems
Train Short, Test Long Attention with Linear Biases Enables Input Length Extrapolation
Home
❯
notes
❯
LLaMA 3.1
LLaMA 3.1
Sep 15, 2024
1 min read
read_later
Reference
Graph View
Backlinks
No backlinks found