Hoai-Chau Tran
Search
Search
Dark mode
Light mode
Explorer
notes
Adam
AdamW
Attention with Linear Biases (ALiBi)
Auto regressive decoding
Backpropagation
Batch Norm
Convolution
Convolutional networks
Decision Tree
Deep Learning
Euler's Formula
Gradient Descent
Group-Query Attention
Index
kernels
KV cache
Large Language Model (llm)
Layer Norm
LLaMA
LLaMA 2
LLaMA 3.1
Multi-Query Attention
neuron networks
Perceptron
Relative Positional Encoding
Residual Connection
Rotary Position Embeddings (RoPE)
SVM
Transformer
papers
Introduction to probability for data science
RoFormer Enhanced Transformer with Rotary Position Embedding
Towards Efficient Generative Large Language Model Serving A Survey from Algorithms to Systems
Train Short, Test Long Attention with Linear Biases Enables Input Length Extrapolation
Home
❯
tags
❯
Tag: GPU
Tag: GPU
1 item with this tag.
Sep 15, 2024
Towards Efficient Generative Large Language Model Serving A Survey from Algorithms to Systems
llm
compression
decoding_algorithms
GPU