Researchers at the Epoch artificial intelligence (AI) research and forecasting organization warn of the potential depletion of data for training AI language algorithms as early as 2026.
The creation of more powerful and capable language models requires finding ever-more training texts.
AI researchers categorize this data as high quality and low quality; Epoch's Pablo Villalobos said high-quality text is the more popular training data, because researchers prefer the models replicating language based on high-quality data.
The University of Southern California's Swabha Swayamdipta said data shortages could prompt a "net positive" redefinition of low and high quality that benefits language models.
Researchers also may invent methods for extending the life of training data, with Swayamdipta suggesting a model could be trained on the same data multiple times.
From MIT Technology Review
View Full Article
Abstracts Copyright © 2022 SmithBucklin, Washington, DC, USA