A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

"Still, developers say that bringing code from Nvidia's CUDA to ROCm isn't a smooth process, which means they typically focus on building for just one chip vendor. "ROCm is amazing, it's open source, but it runs on one vendor's hardware," Lattner told the crowd at AMD's Advancing AI event in June. Then he made his pitch for why Modular's software is more portable and makes GPUs that much faster."

"Part of Modular's value proposition is that it can ship software for optimizing GPUs even faster than Nvidia, as there might be a months-long gap between when Nvidia ships a new GPU and when it releases an "attention kernel"-a critical part of the GPU software. "Right now Modular is complimentary to AMD and Nvidia, but over time you could see both of those companies feeling threatened by ROCm or CUDA not being the best software that sits on top of their chips," says Munichiello."

""Mapping an algorithm to a GPU is an insanely difficult thing to do. There are a hundred million software devs, 10,000 who write GPU kernels, and maybe a hundred who can do it well." Mako is building AI agents to optimize coding for GPUs. Some developers think that's the future for the industry, rather than building a universal compiler or a new programming language like Modular. Mako just raised $8.5 million in seed funding from Flybridge Capital and the startup accelerator Neo."

Porting code from Nvidia's CUDA to AMD's ROCm remains difficult, so developers typically optimize for a single chip vendor. ROCm is open-source but limited to one vendor's hardware, while Modular's software aims to provide cross-vendor portability and faster GPU performance. Modular can ship optimized GPU software faster than vendors, potentially closing months-long gaps between GPU hardware releases and critical kernels like attention kernels. That positioning could threaten Nvidia and AMD if their vendor-specific stacks are no longer the best software atop their chips, and cloud customers may resist paying for an extra software layer. GPU kernel development is highly specialized, and startups like Mako are building AI agents to automate GPU code optimization, backed by recent seed funding.

#gpu-optimization #rocm #cuda #modular-software #ai-driven-kernel-optimization

Read at WIRED

Unable to calculate read time

Collection

[

...

]

A Former Apple Luminary Sets Out to Create the Ultimate GPU SoftwareA Former Apple Luminary Sets Out to Create the Ultimate GPU Software Briefly

A Former Apple Luminary Sets Out to Create the Ultimate GPU Software
A Former Apple Luminary Sets Out to Create the Ultimate GPU Software
Briefly