
"DeepSeek introduces its experimental V3.2-Exp model with sparse attention technology. The innovation promises to process long texts much more efficiently, while maintaining virtually identical output quality to the previous V3.1-Terminus model. Chinese AI company DeepSeek has launched V3.2-Exp, an intermediate step towards its next-generation architecture. The experimental version builds on the V3.1-Terminus model, introducing DeepSeek Sparse Attention (DSA). This sparse attention technology is expected to improve training and inference in long contexts significantly."
"The core of the update lies in the sparse attention mechanism. This technology selects only relevant parts of long texts for processing, drastically reducing the computing power required. Traditional attention mechanisms view each word in relation to all other words, which requires exponentially more computing power for long texts. According to DeepSeek, DSA achieves "fine-grained sparse attention" for the first time. The system maintains model quality while substantially improving efficiency in long contexts. For developers, this means faster training and cheaper inference for extensive documents."
DeepSeek released V3.2-Exp, an experimental intermediate model that incorporates DeepSeek Sparse Attention (DSA) to improve long-context handling. DSA selects only relevant parts of long texts for processing, substantially reducing compute during training and inference while preserving output quality comparable to V3.1-Terminus. V3.2-Exp is available on HuggingFace with day-0 support from vLLM and runs across hardware from Nvidia H200 to AMD chips. Local inference code and conversion tools are provided, though GPU configuration and expert settings adjustments are required. Benchmarks show identical MMLU-Pro scores at 85.0 and improved Codeforces performance (2121 vs 2046).
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]