The problem with 'human in the loop' AI? Often, it's the humans | Fortune
Briefly

The problem with 'human in the loop' AI? Often, it's the humans | Fortune
"First, there was a study from the AI evaluations company Vals AI that pitted several legal AI applications as well as ChatGPT against human lawyers on legal research tasks. All of the AI applications beat the average human lawyers (who were allowed to use digital legal search tools) in drafting legal research reports across three criteria: accuracy, authoritativeness, and appropriateness. The lawyers' aggregate median score was 69%, while ChatGPT scored 74%, Midpage 76%, Alexi 77%, and Counsel Stack, which had the highest overall score, 78%."
"One of the more intriguing findings is that for many question types, it was the generalist ChatGPT that was the most accurate, beating out the more specialized applications. And while ChatGPT lost points for authoritativeness and appropriateness, it still topped the human lawyers across those dimensions. The study has been faulted for not testing some of the better-known and most widely adopted legal AI research tools, such as Harvey, Legora, CoCounsel from Thompson Reuters, or LexisNexis Protégé,"
AI legal-research systems and ChatGPT outperformed average human lawyers in a Vals AI evaluation, with median scores: lawyers 69%, ChatGPT 74%, Midpage 76%, Alexi 77%, and Counsel Stack 78%. For many question types, the generalist ChatGPT delivered the highest accuracy despite lower authoritativeness and appropriateness scores. Observers criticized the evaluation for omitting several widely used legal AI tools and for testing only ChatGPT among frontier general models. Google plans to introduce advertising into its Gemini model. Major AI labs are collaborating on standards for AI agents. New efforts aim to extend model memory. Sentiment toward LLMs and AGI has shifted.
Read at Fortune
Unable to calculate read time
[
|
]