#low-resource-languages

[ follow ]
fromInfoQ
4 days ago

Hugging Face Introduces mmBERT, a Multilingual Encoder for 1,800+ Languages

Hugging Face has released mmBERT, a new multilingual encoder trained on more than 3 trillion tokens across 1,833 languages. The model builds on the ModernBERT architecture and is the first to significantly improve upon XLM-R, a long-time baseline for multilingual understanding tasks. mmBERT uses a progressive training schedule instead of training on all languages at once. It starts with 60 high-resource languages, expands to 110, and finally includes all 1,833 languages.
Artificial intelligence
Digital life
fromFortune Asia
2 months ago

The world's best AI models operate in English. Other languages-even major ones like Cantonese-risk falling further behind

AI translation models struggle with languages that have limited online data, leading to mistranslations and inaccuracies.
Scala
fromHackernoon
11 months ago

Why Lua Is the Ideal Benchmark for Testing Quantized Code Models | HackerNoon

Lua presents unique challenges for quantized model performance due to its low-resource status and unconventional programming paradigms.
[ Load more ]