#model-architecture

[ follow ]
Python
fromPyImageSearch
2 days ago

Building and Training a Kimi-K2 Model Using DeepSeek-V3 Components - PyImageSearch

Agentic intelligence enables LLMs to perceive, plan, reason, and act through interaction, and Kimi-K2 delivers strong benchmark and leaderboard performance with architectural innovations.
Python
fromPyImageSearch
1 month ago

Autoregressive Model Limits and Multi-Token Prediction in DeepSeek-V3 - PyImageSearch

Multi-Token Prediction (MTP) in DeepSeek-V3 allows simultaneous token forecasting, enhancing training speed and contextual understanding.
Artificial intelligence
fromHackernoon
2 years ago

Unpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power | HackerNoon

The phi-3-mini and phi-3-small models represent advanced transformer architectures with significant context lengths and optimization techniques for training and inference.
Artificial intelligence
fromHackernoon
1 year ago

Alternative Architectures for Multi-Token Prediction in LLMs | HackerNoon

The proposed architecture shows significant benefits in scalability and performance for multi-token prediction tasks.
[ Load more ]