#rope

[ follow ]
Python
fromPyImageSearch
2 months ago

KV Cache Optimization via Multi-Head Latent Attention - PyImageSearch

Multi-head Latent Attention compresses per-head KV tensors into shared low-rank latents, cutting KV cache memory and compute while preserving attention quality.
[ Load more ]