Connect with us

Technology

AI language models are running out of human-written text to learn from

Published

on

AI language models are running out of human-written text to learn from
  • A new study released by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by sometime between 2026 and 2032.
  • When public data eventually runs out, developers will have to decide what to feed the language models. Ideas include data now considered private, like emails or text messages, and using “synthetic data” created by other AI models.
  • Besides training larger and larger models, another path to pursue is building more skilled training models that are specialized for specific tasks.

Artificial intelligence systems like ChatGPT could soon run out of what keeps making them smarter — the tens of trillions of words people have written and shared online.

A new study released Thursday by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade — sometime between 2026 and 2032.

Comparing it to a “literal gold rush” that depletes finite natural resources, Tamay Besiroglu, an author of the study, said the AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.

YELLEN TO WARN OF ‘SIGNIFICANT RISKS’ OF AI IN FINANCE WHILE ACKNOWLEDGING ‘TREMENDOUS OPPORTUNITIES’

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models – for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

In the longer term, there won’t be enough new blogs, news articles and social media commentary to sustain the current trajectory of AI development, putting pressure on companies to tap into sensitive data now considered private — such as emails or text messages — or relying on less-reliable “synthetic data” spit out by the chatbots themselves.

Advertisement

“There is a serious bottleneck here,” Besiroglu said. “If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output.”

Artificial intelligence systems like ChatGPT are consuming ever-larger collections of human writings that they need to get smarter. (AP Digital Embed)

The researchers first made their projections two years ago — shortly before ChatGPT’s debut — in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes “overtrain” on the same sources multiple times.

But there are limits, and after further research, Epoch now foresees running out of public text data sometime in the next two to eight years.

The team’s latest study is peer-reviewed and due to be presented at this summer’s International Conference on Machine Learning in Vienna, Austria. Epoch is a nonprofit institute hosted by San Francisco-based Rethink Priorities and funded by proponents of effective altruism — a philanthropic movement that has poured money into mitigating AI’s worst-case risks.

Advertisement

Besiroglu said AI researchers realized more than a decade ago that aggressively expanding two key ingredients — computing power and vast stores of internet data — could significantly improve the performance of AI systems.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model — which has not yet been released — has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

But how much it’s worth worrying about the data bottleneck is debatable.

“I think it’s important to keep in mind that we don’t necessarily need to train larger and larger models,” said Nicolas Papernot, an assistant professor of computer engineering at the University of Toronto and researcher at the nonprofit Vector Institute for Artificial Intelligence.

Papernot, who was not involved in the Epoch study, said building more skilled AI systems can also come from training models that are more specialized for specific tasks. But he has concerns about training generative AI systems on the same outputs they’re producing, leading to degraded performance known as “model collapse.”

Advertisement

7 THINGS GOOGLE JUST ANNOUNCED THAT ARE WORTH KEEPING A CLOSE EYE ON

Training on AI-generated data is “like what happens when you photocopy a piece of paper and then you photocopy the photocopy. You lose some of the information,” Papernot said. Not only that, but Papernot’s research has also found it can further encode the mistakes, bias and unfairness that’s already baked into the information ecosystem.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves — websites like Reddit and Wikipedia, as well as news and book publishers — have been forced to think hard about how they’re being used.

“Maybe you don’t lop off the tops of every mountain,” jokes Selena Deckelmann, chief product and technology officer at the Wikimedia Foundation, which runs Wikipedia. “It’s an interesting problem right now that we’re having natural resource conversations about human-created data. I shouldn’t laugh about it, but I do find it kind of amazing.”

While some have sought to close off their data from AI training — often after it’s already been taken without compensation — Wikipedia has placed few restrictions on how AI companies use its volunteer-written entries. Still, Deckelmann said she hopes there continue to be incentives for people to keep contributing, especially as a flood of cheap and automatically generated “garbage content” starts polluting the internet.

Advertisement

AI companies should be “concerned about how human-generated content continues to exist and continues to be accessible,” she said.

From the perspective of AI developers, Epoch’s study says paying millions of humans to generate the text that AI models will need “is unlikely to be an economical way” to drive better technical performance.

As OpenAI begins work on training the next generation of its GPT large language models, CEO Sam Altman told the audience at a United Nations event last month that the company has already experimented with “generating lots of synthetic data” for training.

“I think what you need is high-quality data. There is low-quality synthetic data. There’s low-quality human data,” Altman said. But he also expressed reservations about relying too heavily on synthetic data over other technical methods to improve AI models.

Advertisement

“There’d be something very strange if the best way to train a model was to just generate, like, a quadrillion tokens of synthetic data and feed that back in,” Altman said. “Somehow that seems inefficient.”

Technology

Claude Fable is too scared to teach you about the powerhouse of the cell

Published

on

Claude Fable is too scared to teach you about the powerhouse of the cell

Anthropic just released Claude Fable 5, calling it the most powerful AI model it has ever made widely available and praising its skills in biology, among others. But the model won’t answer basic biology questions — the kind you’d expect a high schooler to handle. Instead, it hands off the query to the former flagship model, Claude Opus 4.8.

It isn’t because Fable doesn’t know the answers. It’s because Anthropic won’t let it, by design.

Fable is a public-facing, Mythos-class model, a family so capable at cybersecurity tasks Anthropic said it was too dangerous to release publicly. But while Anthropic has spent much of the extended Mythos rollout warning about cybersecurity, it is biology where Fable’s guardrails are the most obvious — and most limiting.

When I tried the model, it refused to answer a range of basic biology questions, many that felt about as far away from any plausible safety risk as any question could be. It would not respond to “tell me about cell membranes” or answer “what are mitochondria,” that famous powerhouse of the cell. It refused to explain “what is a prion,” the proteinaceous particles behind mad cow disease, or “how mRNA vaccines work.”

“We made this tradeoff so customers could benefit from the model’s capabilities sooner without the risks.”

Advertisement

The restrictions applied to ordinary and objectively rather harmless medical queries too. Fable would not answer “what causes hay fever,” explain how asthma medicine works, explain how antibiotic resistance arises, or tell me what Ebola is and how it spreads. Some of my basic queries occasionally got through, with Fable answering questions like “what is cancer” and “what is DNA.” When Fable refused, Opus 4.8 generally answered perfectly well.

Anthropic says the broad biology filters are an intentional choice and are deliberately conservative, with bioweapons the primary concern. “With the launch of Claude Fable 5, our first Mythos-class model, we believe models now have a greater ability to accomplish real-world scientific tasks and for malicious actors to potentially use our models for highly risky biological research,” spokesperson Paruul Maheshwary told The Verge. “We have always used classifiers to block our models from helping with bioweapons-related requests. To deploy Fable 5 safely, we believe it was necessary to be overly conservative with our safeguards so they block most queries tied to biology work.”

Anthropic has previously highlighted four key areas where it would throttle Fable’s responses for safety: chemistry, biology, cybersecurity, and distillation, a technique for training smaller AIs using the outputs of larger ones. The company has accused Chinese rivals like DeepSeek of using distillation on its models on an “industrial” scale.

While I could not meaningfully test distillation, Fable seemed more willing to answer questions about chemistry and cybersecurity. For example, it gave a basic overview of the explosive TNT, though withheld synthesis instructions “for obvious reasons.” It readily answered questions on the use of chlorine gas as a chemical weapon, common password threats, and nuclear fusion and fission, as well as explaining how to secure an iPhone from hackers. It still limits: Fable deferred to Opus when I asked it about sarin gas, a highly toxic nerve agent. Fable and Opus both refused the prompt “how to make anthrax,” and Claude paused the chat entirely. That made sense. The mitochondria prompt refusal seems like a false positive.

“We made this tradeoff so customers could benefit from the model’s capabilities sooner without the risks,” Maheshwary explained, adding that Anthropic is working hard to improve its detection and reduce the false positives. “We intend to make Mythos-class models available without these safeguards to the broader biology and life sciences community so these capabilities can be used to accelerate biomedical research and drug discovery.”

Advertisement

Anthropic did not answer questions about whether this kind of restricted release will become the new norm for future models.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Continue Reading

Technology

Texas mom jailed over dirty water Facebook post

Published

on

Texas mom jailed over dirty water Facebook post

NEWYou can now listen to Fox News articles!

Jennifer Combs says she never set out to become the face of a fight over free speech, dirty water and small-town power. She says she was simply trying to help people in Trinidad, Texas, report problems with their water. Some residents had complained about discoloration, sediment, odors and health concerns. So Combs used her Southern Belle Watch Facebook page to collect reports and send them to the state.

Advertisement

Then, according to Combs, the situation took a turn that still sounds hard to believe. She says police came to her home and arrested her on a felony warrant over a Facebook post.

“I’ve never even had a speeding ticket,” Combs said. “I’m a mom of four kids. I have one grandbaby right now. I have two more grandbabies on the way.”

Now, Combs says her arrest has become about something much bigger than one Facebook post.

HOW I WAS TRICKED AND LOCKED OUT OF FACEBOOK AFTER BEING HACKED

Jennifer Combs says she was arrested on a felony charge after using Facebook to collect reports about water concerns in Trinidad, Texas. A grand jury later declined to indict her. (Kurt “CyberGuy” Knutsson)

Advertisement

Join CyberGuy Live: Lock Down Your Phone in 30 Minutes (Saturday, June 13, 10 am ET)

Your phone holds your email, passwords, photos, banking apps and personal data. In this free, live online class, Kurt the CyberGuy will walk you step by step through simple phone security fixes you can do in real time. You’ll learn how to improve your privacy settings, spot the latest phone scams, use trusted security tools and walk away with a simple checklist to stay protected. Register here: CyberGuyLive.com.

Why Jennifer Combs started asking about Trinidad water

Jennifer sat down with me for my CyberGuy Report podcast at CyberguyPodcast.com to explain what happened, why she started asking questions and what she wants other communities to learn from her ordeal.

Combs says she got involved after seeing a post from an older woman who needed help buying bottled water. According to Combs, the woman was on a fixed income and had already spent part of her monthly money on bottled water. Combs said the woman claimed her doctor had told her not to cook with or drink the tap water. That moment stuck with her.

11 EASY WAYS TO PROTECT YOUR ONLINE PRIVACY IN 2025

“I’m a firm, firm person on transparency,” Combs said. “I stand on it. I think if you’re going to be in government, there should be zero reasons for you not to be transparent with your people that elected you to be there.”

Advertisement

So she started collecting complaints. Her plan was simple. If residents shared their water issues, she could pass those reports to the state. That way, inspectors would know where to look.

Trinidad water complaints had been building

Combs says the water issue had been going on for years in parts of Trinidad. “That’s real. That’s not AI. That is absolutely very real,” Combs said when asked about images of the water.

She said some residents did not want to speak publicly because they feared backlash. “A lot of them wanted to be able to message me anonymously, because the retaliation in Trinidad is very, very real,” Combs said.

That is why she created a place where people could quietly share reports. She says she wanted to collect the information, map the affected areas and send everything to the state.

The Facebook post behind the arrest

Combs read the Facebook post during our conversation. In it, she said her page had received reports that some citizens had been hospitalized due to bacteria in the water. She called it “a serious public health concern that deserves immediate attention.”

Advertisement

The post asked residents to message the page if their water looked discolored, contained sediment, had a strong odor or if they had related health concerns. It also asked for general neighborhood areas, photos, videos, dates and times.

Combs says the post was later removed by Facebook after it was reported by a select group of people from the community and flagged, though she says Facebook did not tell her why. But before it came down, she says, then-Trinidad Police Chief Charles Gregory had taken a screenshot of it and posted it on the Trinidad Police Department Facebook page, accusing her of making a false report.

“I never filed a report with the police department,” Combs said. “I only filed a report with the state of Texas with the water.”  She says she was gathering community reports about the water and sending them to the state. That distinction is important because it raises questions about why a public health complaint on Facebook became a police matter. We reached out to Meta, Facebook’s parent company, for comment, but did not hear back before our deadline.

Trinidad hired a contractor to handle water issues

Combs says the city had hired a contractor to help manage the water problem. She said boil notices listed his number, so residents were often directed to call him instead of City Hall when they had water concerns. According to Combs, that created even more frustration. She said residents still felt they were not getting clear answers, and some began sending complaints to her instead.

Later in our conversation, Combs said the person who made the complaint that led to her arrest was the same contractor paid by the city to address the water problem. “Do you want to know who that someone is?” Combs said. “That someone that made the call report is the contractor that’s paid by the city to fix the water.”

Advertisement

That detail adds another layer to the story. The person hired to help solve the water issue, according to Combs, was also the person who reported her for collecting complaints about it.

FACIAL RECOGNITION JAILS INNOCENT GRANDMOTHER, ATTORNEY SAYS

Police arrested Jennifer Combs at her home

Combs says this all came to a head on April 6. Two officers came to her home in Kearns, Texas, about eight miles from Trinidad. She says they told her she had a felony arrest warrant from Henderson County.

“I said, ‘Oh, what? What do you mean?’” Combs said. “And they said, ‘Yeah, you have a felony arrest warrant. We have to take you to Navarro County Jail.’”

Then she was handcuffed in her front yard. “To be handcuffed in my front yard and taken to jail and spend 23 hours in jail before I could get out was very traumatic,” Combs said. “It was insane.”

Advertisement

Combs says she was charged with a felony false report tied to public panic over the water system. “I was just in disbelief, in absolute disbelief,” she said.

Residents said the water reports were real

Combs says Gregory later doubled down on Facebook and defended the decision to arrest her. But Combs says the part that still bothers her is what happened after Gregory posted about her online. According to Combs, some of the same residents who had contacted her then commented on the police department’s post to say the reports were real.

“The people that had made the reports to me commented on there, and they never even interviewed them,” Combs said. “They never even talked to them. But they literally commented on his own post saying, ‘Hey, this really happened.’”

That raises a basic question. If residents were saying the reports were real, why treat the person collecting those reports like a criminal?

Grand jury declines to indict Jennifer Combs

After Combs arrest, the costs started adding up. She says her husband had to bail her out, and the legal bills started soon after. “It’s $2,500,” Combs said about the bail amount. “So he had to pay 300 and something to get me out of jail. And then we’ve had to pay attorney fees.”

Advertisement

Combs says the felony charge eventually went before a grand jury. The grand jury no-billed the case, meaning it did not indict her. “The grand jury said no bill. Absolutely no part of this,” Combs said. “No bill, not enough evidence.”

That meant the charge was no longer hanging over her head. Still, Combs said her attorney had to keep working through the process of getting it removed. By then, the damage had already been done. Combs had spent nearly a day in jail. Her husband had to bail her out. She had to hire a lawyer. And her name had been tied to a felony allegation over a Facebook post about water.

Trinidad water fight took another turn

Combs says the fallout did not stop with her arrest. After she was arrested, a man she identified as Otto the Watchdog protested outside Trinidad City Hall. Combs says he was handcuffed and put in a police car for disorderly conduct because officials claimed he offended a water clerk.

Then, according to Combs, the water clerk said she was not offended. “The water clerk is fired because she would not sign a statement that said she was offended,” Combs said.

Combs says a judge later dropped the disorderly conduct issue involving the protester. Then, she says, the city fired that judge. “The judge dropped it. They fired the judge,” Combs said.

Advertisement

She also said the city attorney was fired the same night. Yet Combs says it happened during a recorded city council meeting with cameras in the room.

MICROSOFT CROSSES PRIVACY LINE FEW EXPECTED

A Texas mother says her effort to document residents’ complaints about discolored and contaminated water led to a felony arrest and nearly a day in jail. (Kurt “CyberGuy” Knutsson)

City of Trinidad responds to request for comment

CyberGuy requested comment from the City of Trinidad. Zachary Smith, an associate attorney with Iglesias Law Firm, responded on behalf of the city and said the firm represents Trinidad. “We recognize that the public wants answers, and that is not lost on us or our clients,” Smith wrote.

Smith said the city is leaving the details to the legal process. “Because lawsuits have been filed, our clients are not able to comment on the specifics at this time. As you know, this is standard practice in active litigation,” Smith wrote.

Advertisement

He also defended the city’s position. “The claims against the City of Trinidad will be answered where they belong, in a court of law,” Smith wrote. “The officials who serve this community have acted, and continue to act, in the best interests of the people of Trinidad. We look forward to addressing these claims fully during the litigation process.”

Why the Trinidad water story raises free speech concerns

People complain online about local problems every day. They post about roads, trash pickup, schools, taxes, crime and public utilities. Some posts are emotional. Some include claims that still need to be checked. But that does not mean a citizen should be treated like a criminal for asking questions.

Combs said it best. “You have the right to question what anybody is doing,” she said. “You have the right to figure out what is in your water, what you’re drinking.”

Then she added one line that says a lot about her. “I’m never going to tell people, ‘Oh, just keep your mouth shut. Don’t say anything and just be quiet.’ That’s not me. I don’t hush very well.

Jennifer Combs wants answers for Trinidad

Combs says the water problem still needs outside attention. She said the mayor went on national TV and asked for the Texas Rangers to step in. Combs also said she had reached out for support.

Advertisement

“I need someone to help,” Combs said. “It’s insane. It’s not going to get fixed the way it is.” She said people in Trinidad have waited long enough.

“They’ve had all of these years to do it,” Combs said. “And now you’re putting people in jail for talking about it.” That is the part that should make all of us pay attention. If people are afraid to speak up about water, what else will they stay quiet about?

What Jennifer Combs wants people to know

At the end of our conversation, I asked Combs what message she has for people who speak out online about local issues. Her answer was direct.

“I think people that speak out for their communities are extremely brave,” Combs said. “So I’m never going to not tell people to speak out.”

She also said people should not let her experience scare them into silence. “You can’t let what happened to me prevent you from standing up and doing what’s right to people,” Combs said. “You can’t because then there’s no good people left.”

Advertisement

How to protect yourself when posting on Facebook

Facebook can be a powerful way to raise local concerns, but you should think carefully before posting. If your goal is to alert the public, a public post can help more people see it. If you are still gathering information, a private group or direct messages may be safer while you verify what residents are reporting.

Before you post, save screenshots of your draft, your final post and any comments that support what you wrote. If Facebook removes the post or someone reports it, you still have a record of the exact wording.

Also, protect people who contact you. Ask for photos, dates, times and general locations, but avoid sharing exact addresses, phone numbers or medical details without permission. You can show a pattern without exposing someone’s private information.

Finally, be clear about what you know and what you are still trying to confirm. Use phrases like “residents reported,” “according to messages sent to me,” or “we are asking the state to review this.” That can help show you are collecting community concerns, not claiming every detail has already been proven.

 HOW SURVEILLANCE TECH LED POLICE TO ACCUSE THE WRONG PERSON

Advertisement

Jennifer Combs argues her arrest over a Facebook post raises broader concerns about free speech, government transparency and public accountability. (Kurt “CyberGuy” Knutsson)

Kurt’s key takeaways

Jennifer Combs says she wanted clean water, transparency and answers. Instead, she says she was handcuffed in her front yard and spent the night in jail. That should concern anyone who has ever posted a complaint about a local issue online. When people question public officials, those officials should respond with records, facts and accountability. They should not turn criticism into a police matter. This story also shows why local journalism and citizen watchdogs still have power. Small towns can have big problems. Sometimes the person asking the uncomfortable question is the one doing the public a favor. The bigger question is simple: If a Facebook post about dirty water can lead to a felony arrest, what would stop another local government from trying the same thing? To hear Jennifer tell her story in her own words, check out The CyberGuy Report podcast at CyberguyPodcast.com.

Have you ever spoken up about a local problem and felt ignored, intimidated or brushed aside? Let us know by writing to us at CyberGuy.com.

Sign up for my FREE CyberGuy Report

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

Advertisement
  • Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox.
  • For simple, real-world ways to spot scams early and stay protected, visit CyberGuy.com trusted by millions who watch CyberGuy on TV daily.
  • Plus, you’ll get instant access to my Ultimate Scam Survival Guide free when you join.

Copyright 2026 CyberGuy.com. All rights reserved.

Continue Reading

Technology

Microsoft is disabling Office 2019 for Mac next month

Published

on

Microsoft is disabling Office 2019 for Mac next month

Microsoft’s Office 2019 apps for Mac will stop working next month, because the company isn’t renewing a certificate that validates Office licenses. Owners of Office 2019 for Mac are being warned they’ll have to purchase Office 2024 or a Microsoft 365 subscription if they want to continue editing documents.

Microsoft previously promised that “all your Office 2019 apps will continue to function,” when it announced end of support in 2023. The company then quietly updated that support note last month to remove the mention of apps continuing to function, replacing it with “Rest assured that all your Office 2019 apps won’t lose any data.”

Starting on July 13th, Office 2019 for Mac and Office 2021 for Mac will both run in “reduced functionality mode,” allowing people to open files but not edit, save, or create new documents. The reduced functionality will impact Word, Excel, PowerPoint, Outlook, and OneNote.

While Microsoft is providing a certificate update for Office 2021 as it’s still supported until October 13th, 2026, the company is leaving Office 2019 for Mac users out in the cold as support for these apps ended a few years ago. “Office 2019 for Mac reached end of support on October 10, 2023, and no longer receives updates,” says Microsoft. “Because Office 2019 cannot be updated to the required version, this issue cannot be resolved by updating or reinstalling Office 2019 for Mac.”

JimmyTech points out that old versions of Microsoft 365 apps on Mac and iOS will also be affected by this certificate issue, but a simple update will fix it for those users.

Advertisement

Microsoft regularly ends support of software and there’s always the risk you could run into issues running older apps or versions of Windows. It’s still surprising to not see Microsoft make an exception here though, particularly because this certificate issue breaks the main functionality of an app you’ve paid a one-time license fee for.

Continue Reading
Advertisement

Trending