#model-reliability tag

Microsoft's Attempts to Sell AI Agents Are Turning Into a Disaster

Large AI agents are widely hyped but currently fail frequently, struggle with complex multistep tasks, and face weak enterprise adoption and sales.

Artificial intelligence

fromZDNET

5 months ago

Is Opus 4.5 really 'the best model in the world for coding'? It just failed half my tests

Opus 4.5 failed half of the coding tests and showed reliability and file-handling issues, achieving only a 50% pass rate.

Artificial intelligence

fromTheregister

5 months ago

AI is actually bad at math, ORCA shows

Leading large language models perform poorly on a rigorous math benchmark, with top models scoring 63% or lower and showing unreliable arithmetic reasoning.

Artificial intelligence

fromBusiness Insider

5 months ago

Eli Lilly CEO says he has 'at least 1 or 2 AIs running' during every meeting he's in

David Ricks uses AI continuously in meetings to stay current on scientific research, preferring terser models with more reliable references.

fromArs Technica

7 months ago

Can today's AI video models accurately model how the real world works?

For the researchers, though, all of the above examples aren't evidence of failure but instead a sign of the model's capabilities. To be listed under the paper's "failure cases," Veo 3 had to fail a tested task across all 12 trials, which happened in 16 of the 62 tasks tested. For the rest, the researchers write that "a success rate greater than 0 suggests that the model possesses the ability to solve the task."

Artificial intelligence

fromFuturism

8 months ago

Something Extremely Scary Happens When Advanced AI Tries to Give Medical Advice to Real World Patients

Last week, Google AI pioneer Jad Tarifi sparked controversy when he told Business Insider that it no longer makes sense to get a medical degree - since, in his telling, artificial intelligence will render such an education obsolete by the time you're a practicing doctor. Companies have long touted the tech as a way to free up the time of overworked doctors and even aid them in specialized skills, including scanning medical imagery for tumors. Hospitals have already been rolling out AI tech to help with administrative work.

Artificial intelligence

#model-reliability#model-reliability

Microsoft's Attempts to Sell AI Agents Are Turning Into a Disaster

Is Opus 4.5 really 'the best model in the world for coding'? It just failed half my tests

AI is actually bad at math, ORCA shows

Eli Lilly CEO says he has 'at least 1 or 2 AIs running' during every meeting he's in

Can today's AI video models accurately model how the real world works?

Something Extremely Scary Happens When Advanced AI Tries to Give Medical Advice to Real World Patients

#model-reliability
#model-reliability