AI devs close to scraping bottom of data barrel
Briefly

AI devs close to scraping bottom of data barrel
"Training data is the Achilles heel of massive new AI models, as detailed by George Lee, co-head of the Goldman Sachs Global Institute, in a recent webcast on data's role in AI. "The quality of the outputs from these models, particularly in enterprise settings, is highly dependent on the quality of the data that you're sourcing and referencing," Lee said."
""We've already run out of data," Raphael said. "When you read about the new models, the undertone of what people say, like with models like Deepseek, is how did they do that with less money? One of the big hypotheses is they trained against another model.' The interesting thing is going to be how previous models then shape what the next iteration of the world looks like, he added,"
"One danger is model collapse, where the performance of an AI system degrades once it is trained on its own previously generated data outputs, leading to a model losing previously learned nuances, while errors accumulate and get amplified with each new generation. But when asked if this might hold back or even torpedo the unrealized potential of upcoming AI developments like autonomous agents, Raphael said he didn't think it would be a roadblock to future advances."
Massive AI models demand vast volumes of high-quality training data, but publicly available datasets are becoming depleted. Developers increasingly turn to synthetic data or to training on outputs from existing models, which risks creating feedback loops. Training on prior model outputs can produce model collapse, where performance deteriorates, nuances are lost, and errors amplify across generations. Significant quantities of valuable, high-quality data remain locked inside enterprise systems behind firewalls. Unlocking and responsibly leveraging that trapped enterprise data could provide the quality inputs needed to sustain future model performance and reduce reliance on purely synthetic or model-generated datasets.
Read at Theregister
Unable to calculate read time
[
|
]