#distributed-inference

[ follow ]
fromInfoQ
1 week ago

NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges

This challenge is sparking innovations in the inference stack. That's where Dynamo comes in. Dynamo is an open-source framework for distributed inference. It manages execution across GPUs and nodes. It breaks inference into phases, like prefill and decode. It also separates memory-bound and compute-bound tasks. Plus, it dynamically manages GPU resources to boost usage and keep latency low. Dynamo allows infrastructure teams to scale inference capacity responsively, handling demand spikes without permanently overprovisioning expensive GPU resources.
Artificial intelligence
Artificial intelligence
fromInfoWorld
1 month ago

Perplexity's open-source tool to run trillion-parameter models without costly upgrades

TransferEngine enables full-speed GPU-to-GPU communication across AWS and Nvidia hardware, letting trillion-parameter models run on older H100 and H200 systems.
[ Load more ]