Science
When A.I.’s Output Is a Threat to A.I. Itself

The internet is becoming awash in words and images generated by artificial intelligence.
Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.
A.I.-generated text may show up as a restaurant review, a dating profile or a social media post. And it may show up as a news article, too: NewsGuard, a group that tracks online misinformation, recently identified over a thousand websites that churn out error-prone A.I.-generated news articles.
In reality, with no foolproof methods to detect this kind of content, much will simply remain undetected.
All this A.I.-generated information can make it harder for us to know what’s real. And it also poses a problem for A.I. companies. As they trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.
In the long run, this cycle may pose a threat to A.I. itself. Research has shown that when generative A.I. is trained on a lot of its own output, it can get a lot worse.
Here’s a simple illustration of what happens when an A.I. system is trained on its own output, over and over again:
This is part of a data set of 60,000 handwritten digits.
When we trained an A.I. to mimic those digits, its output looked like this.
This new set was made by an A.I. trained on the previous A.I.-generated digits. What happens if this process continues?
After 20 generations of training new A.I.s on their predecessors’ output, the digits blur and start to erode.
After 30 generations, they converge into a single shape.
While this is a simplified example, it illustrates a problem on the horizon.
Imagine a medical-advice chatbot that lists fewer diseases that match your symptoms, because it was trained on a narrower spectrum of medical knowledge generated by previous chatbots. Or an A.I. history tutor that ingests A.I.-generated propaganda and can no longer separate fact from fiction.
Just as a copy of a copy can drift away from the original, when generative A.I. is trained on its own content, its output can also drift away from reality, growing further apart from the original data that it was intended to imitate.
In a paper published last month in the journal Nature, a group of researchers in Britain and Canada showed how this process results in a narrower range of A.I. output over time — an early stage of what they called “model collapse.”
The eroding digits we just saw show this collapse. When untethered from human input, the A.I. output dropped in quality (the digits became blurry) and in diversity (they grew similar).
How an A.I. that draws digits “collapses” after being trained on its own output
If only some of the training data were A.I.-generated, the decline would be slower or more subtle. But it would still occur, researchers say, unless the synthetic data was complemented with a lot of new, real data.
Degenerative A.I.
In one example, the researchers trained a large language model on its own sentences over and over again, asking it to complete the same prompt after each round.
When they asked the A.I. to complete a sentence that started with “To cook a turkey for Thanksgiving, you…,” at first, it responded like this:
Even at the outset, the A.I. “hallucinates.” But when the researchers further trained it on its own sentences, it got a lot worse…
An example of text generated by an A.I. model.
After two generations, it started simply printing long lists.
An example of text generated by an A.I. model after being trained on its own sentences for 2 generations.
And after four generations, it began to repeat phrases incoherently.
An example of text generated by an A.I. model after being trained on its own sentences for 4 generations.
“The model becomes poisoned with its own projection of reality,” the researchers wrote of this phenomenon.
This problem isn’t just confined to text. Another team of researchers at Rice University studied what would happen when the kinds of A.I. that generate images are repeatedly trained on their own output — a problem that could already be occurring as A.I.-generated images flood the web.
They found that glitches and image artifacts started to build up in the A.I.’s output, eventually producing distorted images with wrinkled patterns and mangled fingers.
When A.I. image models are trained on their own output, they can produce distorted images, mangled fingers or strange patterns.
A.I.-generated images by Sina Alemohammad and others.
“You’re kind of drifting into parts of the space that are like a no-fly zone,” said Richard Baraniuk, a professor who led the research on A.I. image models.
The researchers found that the only way to stave off this problem was to ensure that the A.I. was also trained on a sufficient supply of new, real data.
While selfies are certainly not in short supply on the internet, there could be categories of images where A.I. output outnumbers genuine data, they said.
For example, A.I.-generated images in the style of van Gogh could outnumber actual photographs of van Gogh paintings in A.I.’s training data, and this may lead to errors and distortions down the road. (Early signs of this problem will be hard to detect because the leading A.I. models are closed to outside scrutiny, the researchers said.)
Why collapse happens
All of these problems arise because A.I.-generated data is often a poor substitute for the real thing.
This is sometimes easy to see, like when chatbots state absurd facts or when A.I.-generated hands have too many fingers.
But the differences that lead to model collapse aren’t necessarily obvious — and they can be difficult to detect.
When generative A.I. is “trained” on vast amounts of data, what’s really happening under the hood is that it is assembling a statistical distribution — a set of probabilities that predicts the next word in a sentence, or the pixels in a picture.
For example, when we trained an A.I. to imitate handwritten digits, its output could be arranged into a statistical distribution that looks like this:
Distribution of A.I.-generated data
Examples of
initial A.I. output:
The distribution shown here is simplified for clarity.
The peak of this bell-shaped curve represents the most probable A.I. output — in this case, the most typical A.I.-generated digits. The tail ends describe output that is less common.
Notice that when the model was trained on human data, it had a healthy spread of possible outputs, which you can see in the width of the curve above.
But after it was trained on its own output, this is what happened to the curve:
Distribution of A.I.-generated data when trained on its own output
It gets taller and narrower. As a result, the model becomes more and more likely to produce a smaller range of output, and the output can drift away from the original data.
Meanwhile, the tail ends of the curve — which contain the rare, unusual or surprising outcomes — fade away.
This is a telltale sign of model collapse: Rare data becomes even rarer.
If this process went unchecked, the curve would eventually become a spike:
Distribution of A.I.-generated data when trained on its own output
This was when all of the digits became identical, and the model completely collapsed.
Why it matters
This doesn’t mean generative A.I. will grind to a halt anytime soon.
The companies that make these tools are aware of these problems, and they will notice if their A.I. systems start to deteriorate in quality.
But it may slow things down. As existing sources of data dry up or become contaminated with A.I. “slop,” researchers say it makes it harder for newcomers to compete.
A.I.-generated words and images are already beginning to flood social media and the wider web. They’re even hiding in some of the data sets used to train A.I., the Rice researchers found.
“The web is becoming increasingly a dangerous place to look for your data,” said Sina Alemohammad, a graduate student at Rice who studied how A.I. contamination affects image models.
Big players will be affected, too. Computer scientists at N.Y.U. found that when there is a lot of A.I.-generated content in the training data, it takes more computing power to train A.I. — which translates into more energy and more money.
“Models won’t scale anymore as they should be scaling,” said Julia Kempe, the N.Y.U. professor who led this work.
The leading A.I. models already cost tens to hundreds of millions of dollars to train, and they consume staggering amounts of energy, so this can be a sizable problem.
‘A hidden danger’
Finally, there’s another threat posed by even the early stages of collapse: an erosion of diversity.
And it’s an outcome that could become more likely as companies try to avoid the glitches and “hallucinations” that often occur with A.I. data.
This is easiest to see when the data matches a form of diversity that we can visually recognize — people’s faces:
This set of A.I. faces was created by the same Rice researchers who produced the distorted faces above. This time, they tweaked the model to avoid visual glitches.
A grid of A.I.-generated faces showing variations in their poses, expressions, ages and races.
This is the output after they trained a new A.I. on the previous set of faces. At first glance, it may seem like the model changes worked: The glitches are gone.
After one generation of training on A.I. output, the A.I.-generated faces appear more similar.
After two generations …
After two generations of training on A.I. output, the A.I.-generated faces are less diverse than the original image.
After three generations …
After three generations of training on A.I. output, the A.I.-generated faces grow more similar.
After four generations, the faces all appeared to converge.
After four generations of training on A.I. output, the A.I.-generated faces appear almost identical.
This drop in diversity is “a hidden danger,” Mr. Alemohammad said. “You might just ignore it and then you don’t understand it until it’s too late.”
Just as with the digits, the changes are clearest when most of the data is A.I.-generated. With a more realistic mix of real and synthetic data, the decline would be more gradual.
But the problem is relevant to the real world, the researchers said, and will inevitably occur unless A.I. companies go out of their way to avoid their own output.
Related research shows that when A.I. language models are trained on their own words, their vocabulary shrinks and their sentences become less varied in their grammatical structure — a loss of “linguistic diversity.”
And studies have found that this process can amplify biases in the data and is more likely to erase data pertaining to minorities.
Ways out
Perhaps the biggest takeaway of this research is that high-quality, diverse data is valuable and hard for computers to emulate.
One solution, then, is for A.I. companies to pay for this data instead of scooping it up from the internet, ensuring both human origin and high quality.
OpenAI and Google have made deals with some publishers or websites to use their data to improve A.I. (The New York Times sued OpenAI and Microsoft last year, alleging copyright infringement. OpenAI and Microsoft say their use of the content is considered fair use under copyright law.)
Better ways to detect A.I. output would also help mitigate these problems.
Google and OpenAI are working on A.I. “watermarking” tools, which introduce hidden patterns that can be used to identify A.I.-generated images and text.
But watermarking text is challenging, researchers say, because these watermarks can’t always be reliably detected and can easily be subverted (they may not survive being translated into another language, for example).
A.I. slop is not the only reason that companies may need to be wary of synthetic data. Another problem is that there are only so many words on the internet.
Some experts estimate that the largest A.I. models have been trained on a few percent of the available pool of text on the internet. They project that these models may run out of public data to sustain their current pace of growth within a decade.
“These models are so enormous that the entire internet of images or conversations is somehow close to being not enough,” Professor Baraniuk said.
To meet their growing data needs, some companies are considering using today’s A.I. models to generate data to train tomorrow’s models. But researchers say this can lead to unintended consequences (such as the drop in quality or diversity that we saw above).
There are certain contexts where synthetic data can help A.I.s learn — for example, when output from a larger A.I. model is used to train a smaller one, or when the correct answer can be verified, like the solution to a math problem or the best strategies in games like chess or Go.
And new research suggests that when humans curate synthetic data (for example, by ranking A.I. answers and choosing the best one), it can alleviate some of the problems of collapse.
Companies are already spending a lot on curating data, Professor Kempe said, and she believes this will become even more important as they learn about the problems of synthetic data.
But for now, there’s no replacement for the real thing.
About the data
To produce the images of A.I.-generated digits, we followed a procedure outlined by researchers. We first trained a type of a neural network known as a variational autoencoder using a standard data set of 60,000 handwritten digits.
We then trained a new neural network using only the A.I.-generated digits produced by the previous neural network, and repeated this process in a loop 30 times.
To create the statistical distributions of A.I. output, we used each generation’s neural network to create 10,000 drawings of digits. We then used the first neural network (the one that was trained on the original handwritten digits) to encode these drawings as a set of numbers, known as a “latent space” encoding. This allowed us to quantitatively compare the output of different generations of neural networks. For simplicity, we used the average value of this latent space encoding to generate the statistical distributions shown in the article.

Science
Racing to Save California’s Elephant Seals From Bird Flu

For the last few years, the Marine Mammal Center has been testing any patients with bird-flu-like symptoms, which include respiratory and neurological problems, for the virus.
Science
Lawmakers ask Newsom and waste agency to follow the law on plastic legislation

California lawmakers are taking aim at proposed rules to implement a state law aimed at curbing plastic waste, saying the draft regulations proposed by CalRecycle undermine the letter and intent of the legislation.
In a letter to Gov. Gavin Newsom and two of his top administrators, the lawmakers said CalRecycle exceeded its authority by drafting regulations that don’t abide by the terms set out by the law, Senate Bill 54.
“While we support many changes in the current draft regulations, we have identified several provisions that are inconsistent with the governing statute … and where CalRecycle has exceeded its authority under the law,” the lawmakers wrote in the letter to Newsom, California Environmental Protection agency chief Yana Garcia, and Zoe Heller, director of the state’s Department of Resources Recycling and Recovery, or CalRecycle.
The letter, which was written by Sen. Catherine Blakespear (D-Encinitas) and Sen. Benjamin Allen (D-Santa Monica), was signed by 21 other lawmakers, including Sen. John Laird (D-Santa Cruz) and Assemblymembers Al Muratsuchi (D-Rolling Hills Estates) and Monique Limón (D-Goleta).
CalRecycle submitted informal draft regulations two weeks ago that are designed to implement the law, which was authored by Allen, and signed into law by Newsom in 2022.
The lawmakers’ concerns are directed at the draft regulations’ potential approval of polluting recycling technologies — which the language of the law expressly prohibits — as well as the document’s expansive exemption for products and packaging that fall under the purview of the U.S. Department of Agriculture and the Food and Drug Administration.
The inclusion of such blanket exemptions is “not only contrary to the statute but also risks significantly increasing the program’s costs,” the lawmakers wrote. They said the new regulations allow “producers to unilaterally determine which products are subject to the law, without a requirement or process to back up such a claim.”
Daniel Villaseñor, a spokesman for the governor, said in an email that Newsom “was clear when he asked CalRecycle to restart these regulations that they should work to minimize costs for small businesses and families, and these rules are a step in the right direction …”
At a workshop held at the agency’s headquarters in Sacramento this week, CalRecycle staff responded to similar criticisms, and underscored that these are informal draft regulations, which means they can be changed.
“I know from comments we’ve already been receiving that some of the provisions, as we have written them … don’t quite come across in the way that we intended,” said Karen Kayfetz, chief of CalRecycle’s Product Stewardship branch, adding that she was hopeful “a robust conversation” could help highlight areas where interpretations of the regulations’ language differs from the agency’s intent.
“It was not our intent, of course, to ever go outside of the statute, and so to the extent that it may be interpreted in the language that we’ve provided, that there are provisions that extend beyond … it’s our wish to narrow that back down,” she said.
These new draft regulations are the expedited result of the agency’s attempt to satisfy Newsom’s concerns about the law, which he said could increase costs to California households if not properly implemented.
Newsom rejected the agency’s first attempt at drafting regulations — the result of nearly three years of negotiations by scores of stakeholders, including plastic producers, package developers, agricultural interests, environmental groups, municipalities, recycling companies and waste haulers — and ordered the waste agency to start the process over.
Critics say the new draft regulations cater to industry and could result in even higher costs to both California households, which have seen large increases in their residential waste hauling fees, as well as to the state’s various jurisdictions, which are taxed with cleaning up plastic waste and debris clogging the state’s rivers, highways, beaches and parks.
The law is molded on a series of legislative efforts described as Extended Producer Responsibility laws, which are designed to shift the cost of waste removal and disposal from the state’s jurisdictions and taxpayers to the industries that produce the waste — theoretically incentivizing a circular economy, in which product and packaging producers develop materials that can be reused, recycled or composted.
Science
U.S. just radically changed its COVID vaccine recommendations: How will it affect you?
As promised, federal health officials have dropped longstanding recommendations that healthy children and healthy pregnant women should get the COVID-19 vaccines.
“The COVID-19 vaccine schedule is very clear. The vaccine is not recommended for pregnant women. The vaccine is not recommended for healthy children,” the U.S. Department of Health and Human Services said in a post on X on Friday.
In formal documents, health officials offer “no guidance” on whether pregnant women should get the vaccine, and ask that parents talk with a healthcare provider before getting the vaccine for their children.
The decision was done in a way that is still expected to require insurers to pay for COVID-19 vaccines for children should their parents still want the shots for them.
The new vaccine guidelines were posted to the website of the U.S. Centers for Disease Control and Prevention late Thursday.
The insurance question
It wasn’t immediately clear whether insurers will still be required under federal law to pay for vaccinations for pregnant women.
The Trump administration’s decision came amid criticism from officials at the nation’s leading organizations for pediatricians and obstetricians. Some doctors said there is no new evidence to support removing the recommendation that healthy pregnant women and healthy children should get the COVID vaccine.
“This situation continues to make things unclear and creates confusion for patients, providers and payers,” the American College of Obstetricians and Gynecologists said in a statement Friday.
Earlier in the week, the group’s president, Dr. Steven Fleischman, said the science hasn’t changed, and that the COVID-19 vaccine is safe during pregnancy, and protects both the mom-to-be and their infants after birth.
“It is very clear that COVID-19 infection during pregnancy can be catastrophic,” Fleischman said in a statement.
Dr. Susan Kressly, president of the American Academy of Pediatrics, criticized the recommendation change as being rolled out in a “conflicting, confusing” manner, with “no explanation of the evidence used to reach their conclusions.”
“For many families, the COVID vaccine will remain an important way they protect their child and family from this disease and its complications, including long COVID,” Kressly said in a statement.
Some experts said the Trump administration should have waited to hear recommendations from a committee of doctors and scientists that typically advises the U.S. Centers for Disease Control and Prevention on immunization recommendations, which is set to meet in late June.
California’s view
The California Department of Public Health on Thursday said it supported the longstanding recommendation that “COVID-19 vaccines be available for all persons aged 6 months and older who wish to be vaccinated.”
The changes come as the CDC has faced an exodus of senior leaders and has lacked an acting director. Typically, as was the case during the first Trump administration and in the Biden administration, it is the CDC director who makes final decisions on vaccine recommendations. The CDC director has traditionally accepted the consensus viewpoint of the CDC’s panel of doctors and scientists serving on the Advisory Committee on Immunization Practices.
Even with the longstanding recommendations, vaccination rates were relatively low for children and pregnant women. As of late April, 13% of children, and 14.4% of pregnant women, had received the latest updated COVID-19 vaccine, according to the CDC. About 23% of adults overall received the updated vaccine, as did 27.8% of seniors age 65 and over.
The CDC estimates that since October, there have been 31,000 to 50,000 COVID deaths and between 270,000 and 430,000 COVID hospitalizations.
Here are some key points about the CDC’s decision:
New vaccination guidance for healthy children
Previously, the CDC’s guidance was simple: everyone ages 6 months and up should get an updated COVID vaccination. The most recent version was unveiled in September, and is officially known as the 2024-25 COVID-19 vaccine.
As of Thursday, the CDC, on its pediatric immunization schedule page, says that for healthy children — those age 6 months to 17 years — decisions about COVID vaccination should come from “shared clinical decision-making,” which is “informed by a decision process between the healthcare provider and the patient or parent/guardian.”
“Where the parent presents with a desire for their child to be vaccinated, children 6 months and older may receive COVID-19 vaccination, informed by the clinical judgment of a healthcare provider and personal preference and circumstances,” the CDC says.
The vaccine-skeptic secretary of Health and Human Services, Robert F. Kennedy Jr., contended in a video posted on Tuesday there was a “lack of any clinical data to support the repeat booster strategy in children.”
However, an earlier presentation by CDC staff said that, in general, getting an updated vaccine provides both children and adults additional protection from COVID-related emergency room and urgent care visits.
Dr. Peter Chin-Hong, a UC San Francisco infectious diseases expert, said he would have preferred the CDC retain its broader recommendation that everyone age 6 months and up get the updated vaccine.
“It’s simpler,” Chin-Hong said. He added there’s no new data out there that to him suggests children shouldn’t be getting the updated COVID vaccine.
A guideline that involves “shared decision-making,” Chin-Hong said, “is a very nebulous recommendation, and it doesn’t result in a lot of people getting vaccines.”
Kressly, of the American Academy of Pediatrics, said the shared clinical decision-making model is challenging to implement “because it lacks clear guidance for the conversations between a doctor and a family. Doctors and families need straightforward, evidence-based guidance, not vague, impractical frameworks.”
Some experts had been worried that the CDC would make a decision that would’ve ended the federal requirement that insurers cover the cost of COVID-19 vaccines for children. The out-of-pocket cost for a COVID-19 vaccine can reach around $200.
New vaccine guidance for pregnant women
In its adult immunization schedule for people who have medical conditions, the CDC now says it has “no guidance” on whether pregnant women should get the COVID-19 vaccine.
In his 58-second video on Tuesday, Kennedy did not explain why he thought pregnant women should not be recommended to get vaccinated against COVID-19.
Chin-Hong, of UCSF, called the decision to drop the vaccination recommendation for pregnant women “100%” wrong.
Pregnancy brings with it a relatively compromised immune system. Pregnant women have “a high chance of getting infections, and they get more serious disease — including COVID,” Chin-Hong said.
A pregnant woman getting vaccinated also protects the newborn. “You really need the antibodies in the pregnant person to go across the placenta to protect the newborn,” Chin-Hong said.
It’s especially important, Chin-Hong and others say, because infants under 6 months of age can’t be vaccinated against COVID-19, and they have as high a risk of severe complications as do seniors age 65 and over.
Not the worst-case scenario for vaccine proponents
Earlier in the week, some experts worried the new rules would allow insurers to stop covering the cost of the COVID vaccine for healthy children.
Their worries were sparked by the video message on Tuesday, in which Kennedy said that “the COVID vaccine for healthy children and healthy pregnant women has been removed from the CDC recommended immunization schedule.”
By late Thursday, the CDC came out with its formal decision — the agency dropped the recommendation for healthy children, but still left the shot on the pediatric immunization schedule.
Leaving the COVID-19 vaccine on the immunization schedule “means the vaccine will be covered by insurance” for healthy children, the American Academy of Pediatrics said in a statement.
How pharmacies and insurers are responding
There are some questions that don’t have immediate answers. Will some vaccine providers start requiring doctor’s notes in order for healthy children and healthy pregnant women to get vaccinated? Will it be harder for children and pregnant women to get vaccinated at a pharmacy?
In a statement, CVS Pharmacy said it “follows federal guidance and state law regarding vaccine administration and are monitoring any changes that the government may make regarding vaccine eligibility.” The insurer Aetna, which is owned by CVS, is also monitoring any changes federal officials make to COVID-19 vaccine eligibility “and will evaluate whether coverage adjustments are needed.”
Blue Shield of California said it will not change its practices on covering COVID-19 vaccines.
“Despite the recent federal policy change on COVID-19 vaccinations for healthy children and pregnant women, Blue Shield of California will continue to cover COVID-19 vaccines for all eligible members,” the insurer said in a statement. “The decision on whether to receive a COVID-19 vaccine is between our member and their provider. Blue Shield does not require prior authorization for COVID-19 vaccines.”
Under California law, health plans regulated by the state Department of Managed Health Care must cover COVID-19 vaccines without requiring prior authorization, the agency said Friday. “If consumers access these services from a provider in their health plan’s network, they will not need to pay anything for these services,” the statement said.
-
Movie Reviews1 week ago
MOVIE REVIEW – Mission: Impossible 8 has Tom Cruise facing his final reckoning
-
Politics1 week ago
Trump honors fallen American heroes, praises God in Memorial Day address: 'Great, great warriors'
-
Politics1 week ago
Trump admin asking federal agencies to cancel remaining Harvard contracts
-
Culture1 week ago
Can You Match These Canadian Novels to Their Locations?
-
Politics1 week ago
Homeland Security chief Noem visits Netanyahu ahead of Jerusalem Day
-
News1 week ago
Harvard's president speaks out against Trump. And, an analysis of DEI job losses
-
Technology1 week ago
The Browser Company explains why it stopped developing Arc
-
News1 week ago
Read the Trump Administration Letter About Harvard Contracts