Science

When A.I.’s Output Is a Threat to A.I. Itself

Published

10 months ago

August 26, 2024

Press Room

When A.I.’s Output Is a Threat to A.I. Itself

By Aatish Bhatia

Aatish Bhatia interviewed A.I. researchers, studied research papers and fed an A.I. system its own output.

Aug. 25, 2024

The internet is becoming awash in words and images generated by artificial intelligence.

Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.

A.I.-generated text may show up as a restaurant review, a dating profile or a social media post. And it may show up as a news article, too: NewsGuard, a group that tracks online misinformation, recently identified over a thousand websites that churn out error-prone A.I.-generated news articles.

In reality, with no foolproof methods to detect this kind of content, much will simply remain undetected.

All this A.I.-generated information can make it harder for us to know what’s real. And it also poses a problem for A.I. companies. As they trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.

In the long run, this cycle may pose a threat to A.I. itself. Research has shown that when generative A.I. is trained on a lot of its own output, it can get a lot worse.

Here’s a simple illustration of what happens when an A.I. system is trained on its own output, over and over again:

This is part of a data set of 60,000 handwritten digits.

When we trained an A.I. to mimic those digits, its output looked like this.

This new set was made by an A.I. trained on the previous A.I.-generated digits. What happens if this process continues?

After 20 generations of training new A.I.s on their predecessors’ output, the digits blur and start to erode.

After 30 generations, they converge into a single shape.

While this is a simplified example, it illustrates a problem on the horizon.

Imagine a medical-advice chatbot that lists fewer diseases that match your symptoms, because it was trained on a narrower spectrum of medical knowledge generated by previous chatbots. Or an A.I. history tutor that ingests A.I.-generated propaganda and can no longer separate fact from fiction.

Just as a copy of a copy can drift away from the original, when generative A.I. is trained on its own content, its output can also drift away from reality, growing further apart from the original data that it was intended to imitate.

In a paper published last month in the journal Nature, a group of researchers in Britain and Canada showed how this process results in a narrower range of A.I. output over time — an early stage of what they called “model collapse.”

The eroding digits we just saw show this collapse. When untethered from human input, the A.I. output dropped in quality (the digits became blurry) and in diversity (they grew similar).

How an A.I. that draws digits “collapses” after being trained on its own output

If only some of the training data were A.I.-generated, the decline would be slower or more subtle. But it would still occur, researchers say, unless the synthetic data was complemented with a lot of new, real data.

Degenerative A.I.

In one example, the researchers trained a large language model on its own sentences over and over again, asking it to complete the same prompt after each round.

When they asked the A.I. to complete a sentence that started with “To cook a turkey for Thanksgiving, you…,” at first, it responded like this:

Even at the outset, the A.I. “hallucinates.” But when the researchers further trained it on its own sentences, it got a lot worse…

An example of text generated by an A.I. model.

After two generations, it started simply printing long lists.

An example of text generated by an A.I. model after being trained on its own sentences for 2 generations.

And after four generations, it began to repeat phrases incoherently.

An example of text generated by an A.I. model after being trained on its own sentences for 4 generations.

“The model becomes poisoned with its own projection of reality,” the researchers wrote of this phenomenon.

This problem isn’t just confined to text. Another team of researchers at Rice University studied what would happen when the kinds of A.I. that generate images are repeatedly trained on their own output — a problem that could already be occurring as A.I.-generated images flood the web.

They found that glitches and image artifacts started to build up in the A.I.’s output, eventually producing distorted images with wrinkled patterns and mangled fingers.

When A.I. image models are trained on their own output, they can produce distorted images, mangled fingers or strange patterns.

A.I.-generated images by Sina Alemohammad and others.

“You’re kind of drifting into parts of the space that are like a no-fly zone,” said Richard Baraniuk, a professor who led the research on A.I. image models.

The researchers found that the only way to stave off this problem was to ensure that the A.I. was also trained on a sufficient supply of new, real data.

While selfies are certainly not in short supply on the internet, there could be categories of images where A.I. output outnumbers genuine data, they said.

For example, A.I.-generated images in the style of van Gogh could outnumber actual photographs of van Gogh paintings in A.I.’s training data, and this may lead to errors and distortions down the road. (Early signs of this problem will be hard to detect because the leading A.I. models are closed to outside scrutiny, the researchers said.)

Why collapse happens

All of these problems arise because A.I.-generated data is often a poor substitute for the real thing.

This is sometimes easy to see, like when chatbots state absurd facts or when A.I.-generated hands have too many fingers.

But the differences that lead to model collapse aren’t necessarily obvious — and they can be difficult to detect.

When generative A.I. is “trained” on vast amounts of data, what’s really happening under the hood is that it is assembling a statistical distribution — a set of probabilities that predicts the next word in a sentence, or the pixels in a picture.

For example, when we trained an A.I. to imitate handwritten digits, its output could be arranged into a statistical distribution that looks like this:

Distribution of A.I.-generated data

Examples of
initial A.I. output:

The distribution shown here is simplified for clarity.

The peak of this bell-shaped curve represents the most probable A.I. output — in this case, the most typical A.I.-generated digits. The tail ends describe output that is less common.

Notice that when the model was trained on human data, it had a healthy spread of possible outputs, which you can see in the width of the curve above.

But after it was trained on its own output, this is what happened to the curve:

Distribution of A.I.-generated data when trained on its own output

It gets taller and narrower. As a result, the model becomes more and more likely to produce a smaller range of output, and the output can drift away from the original data.

Meanwhile, the tail ends of the curve — which contain the rare, unusual or surprising outcomes — fade away.

This is a telltale sign of model collapse: Rare data becomes even rarer.

If this process went unchecked, the curve would eventually become a spike:

Distribution of A.I.-generated data when trained on its own output

This was when all of the digits became identical, and the model completely collapsed.

Why it matters

This doesn’t mean generative A.I. will grind to a halt anytime soon.

The companies that make these tools are aware of these problems, and they will notice if their A.I. systems start to deteriorate in quality.

But it may slow things down. As existing sources of data dry up or become contaminated with A.I. “slop,” researchers say it makes it harder for newcomers to compete.

A.I.-generated words and images are already beginning to flood social media and the wider web. They’re even hiding in some of the data sets used to train A.I., the Rice researchers found.

“The web is becoming increasingly a dangerous place to look for your data,” said Sina Alemohammad, a graduate student at Rice who studied how A.I. contamination affects image models.

Big players will be affected, too. Computer scientists at N.Y.U. found that when there is a lot of A.I.-generated content in the training data, it takes more computing power to train A.I. — which translates into more energy and more money.

“Models won’t scale anymore as they should be scaling,” said Julia Kempe, the N.Y.U. professor who led this work.

The leading A.I. models already cost tens to hundreds of millions of dollars to train, and they consume staggering amounts of energy, so this can be a sizable problem.

‘A hidden danger’

Finally, there’s another threat posed by even the early stages of collapse: an erosion of diversity.

And it’s an outcome that could become more likely as companies try to avoid the glitches and “hallucinations” that often occur with A.I. data.

This is easiest to see when the data matches a form of diversity that we can visually recognize — people’s faces:

This set of A.I. faces was created by the same Rice researchers who produced the distorted faces above. This time, they tweaked the model to avoid visual glitches.

A grid of A.I.-generated faces showing variations in their poses, expressions, ages and races.

This is the output after they trained a new A.I. on the previous set of faces. At first glance, it may seem like the model changes worked: The glitches are gone.

After one generation of training on A.I. output, the A.I.-generated faces appear more similar.

After two generations …

After two generations of training on A.I. output, the A.I.-generated faces are less diverse than the original image.

After three generations …

After three generations of training on A.I. output, the A.I.-generated faces grow more similar.

After four generations, the faces all appeared to converge.

After four generations of training on A.I. output, the A.I.-generated faces appear almost identical.

This drop in diversity is “a hidden danger,” Mr. Alemohammad said. “You might just ignore it and then you don’t understand it until it’s too late.”

Just as with the digits, the changes are clearest when most of the data is A.I.-generated. With a more realistic mix of real and synthetic data, the decline would be more gradual.

But the problem is relevant to the real world, the researchers said, and will inevitably occur unless A.I. companies go out of their way to avoid their own output.

Related research shows that when A.I. language models are trained on their own words, their vocabulary shrinks and their sentences become less varied in their grammatical structure — a loss of “linguistic diversity.”

And studies have found that this process can amplify biases in the data and is more likely to erase data pertaining to minorities.

Ways out

Perhaps the biggest takeaway of this research is that high-quality, diverse data is valuable and hard for computers to emulate.

One solution, then, is for A.I. companies to pay for this data instead of scooping it up from the internet, ensuring both human origin and high quality.

OpenAI and Google have made deals with some publishers or websites to use their data to improve A.I. (The New York Times sued OpenAI and Microsoft last year, alleging copyright infringement. OpenAI and Microsoft say their use of the content is considered fair use under copyright law.)

Better ways to detect A.I. output would also help mitigate these problems.

Google and OpenAI are working on A.I. “watermarking” tools, which introduce hidden patterns that can be used to identify A.I.-generated images and text.

But watermarking text is challenging, researchers say, because these watermarks can’t always be reliably detected and can easily be subverted (they may not survive being translated into another language, for example).

A.I. slop is not the only reason that companies may need to be wary of synthetic data. Another problem is that there are only so many words on the internet.

Some experts estimate that the largest A.I. models have been trained on a few percent of the available pool of text on the internet. They project that these models may run out of public data to sustain their current pace of growth within a decade.

“These models are so enormous that the entire internet of images or conversations is somehow close to being not enough,” Professor Baraniuk said.

To meet their growing data needs, some companies are considering using today’s A.I. models to generate data to train tomorrow’s models. But researchers say this can lead to unintended consequences (such as the drop in quality or diversity that we saw above).

There are certain contexts where synthetic data can help A.I.s learn — for example, when output from a larger A.I. model is used to train a smaller one, or when the correct answer can be verified, like the solution to a math problem or the best strategies in games like chess or Go.

And new research suggests that when humans curate synthetic data (for example, by ranking A.I. answers and choosing the best one), it can alleviate some of the problems of collapse.

Companies are already spending a lot on curating data, Professor Kempe said, and she believes this will become even more important as they learn about the problems of synthetic data.

But for now, there’s no replacement for the real thing.

About the data

To produce the images of A.I.-generated digits, we followed a procedure outlined by researchers. We first trained a type of a neural network known as a variational autoencoder using a standard data set of 60,000 handwritten digits.

We then trained a new neural network using only the A.I.-generated digits produced by the previous neural network, and repeated this process in a loop 30 times.

To create the statistical distributions of A.I. output, we used each generation’s neural network to create 10,000 drawings of digits. We then used the first neural network (the one that was trained on the original handwritten digits) to encode these drawings as a set of numbers, known as a “latent space” encoding. This allowed us to quantitatively compare the output of different generations of neural networks. For simplicity, we used the average value of this latent space encoding to generate the statistical distributions shown in the article.

Science

Former Cedars-Sinai OB-GYN surrenders license after sexual abuse complaints

Published

10 hours ago

June 14, 2025

Press Room

Former Cedars-Sinai OB-GYN surrenders license after sexual abuse complaints

Former Cedars-Sinai Medical Center obstetrician-gynecologist Barry J. Brock has surrendered his medical license following an accusation of negligent care from the state medical board.

Brock, 75, signed an agreement late last month to give up the license he has held since 1978, rather than contest an accusation the Medical Board of California filed in September regarding a former patient’s treatment. The surrender took effect on Wednesday.

While Brock “doesn’t admit any factual allegations,” his attorney Tracy Green said, he elected to surrender his license rather than invest time and money into a hearing.

Under the terms of the agreement, Brock is barred from legally practicing medicine in California for the rest of his life.

Brock retired from medicine in August. Since then, at least 176 women have filed lawsuits alleging that Cedars-Sinai and other facilities where Brock worked knowingly concealed his sexual abuses and misconduct, including medically unjustifiable procedures that at times resulted in lasting physical complications.

Brock has denied all allegations of impropriety. The OB-GYN was a member of the Cedars-Sinai physician network until 2018 and retained his clinical privileges there until mid-2024.

Cedars-Sinai confirmed in July that it suspended Brock’s hospital privileges after receiving “concerning complaints” from former patients. His privileges were terminated a few months later.

“The type of behavior alleged about Dr. Barry Brock is counter to Cedars-Sinai’s core values and the trust we strive to earn every day with our patients,” the medical center said in a statement. “We recognize the legal process must now take its course, and we remain committed to Cedars-Sinai’s sacred healing mission.”

The accusation that led to the surrender of his license focused on a patient who sought treatment in 2018 for a blighted ovum, a form of miscarriage in which the fertilized egg fails to develop into an embryo.

According to the complaint, the patient reported to Brock’s office in September 2018 for a dilation and curettage to remove remaining tissues from her uterus.

Brock ordered the patient to undress in front of him, the complaint stated, and didn’t wear gloves during the procedure, which was done without a chaperone present.

The patient experienced severe pain during the visit and bled for two months afterward, the complaint said, and no follow-up care was provided. When she visited a physician’s assistant in November 2018, the complaint said, she learned that Brock had failed to complete the dilation and curettage successfully, and she had to undergo the process a second time to remove the remaining tissue.

The complaint alleged that Brock didn’t administer sufficient pain medication and failed to properly complete the procedure or follow up with pathology findings.

While Brock’s license surrender resolves this accusation, he still faces the civil lawsuits.

Suits were filed on behalf of 167 women last year, and nine more women sued the former physician earlier this month, alleging that Brock groped their breasts and genitals inappropriately during appointments, often with bare hands, and made sexually harassing comments.

“This is why these civil lawsuits and these women coming forward … are so, so important. He can’t avoid this,” said Lisa Esser, an attorney representing the nine plaintiffs. “He’s going to be held accountable.”

Science

State rescinds suspension efforts for troubled nursing home in Hollywood

Published

1 day ago

June 13, 2025

Press Room

State rescinds suspension efforts for troubled nursing home in Hollywood

The California Public Health Department has dropped efforts to suspend the license of a Hollywood nursing home whose actions were found to have led to two patient deaths in recent years.

Brier Oak on Sunset was among seven Los Angeles County facilities that received notice last month that the state was moving to suspend their licenses.

At the time, the state believed all seven companies had received at least two “AA” violations within the last two years, a spokesperson for the Public Health Department said.

An AA violation is a relatively rare penalty issued for errors that contribute substantially to a resident’s death. California law allows the suspension or revocation of a nursing home’s license once a facility gets two such violations within a 24-month period.

Although Brier Oak received its AA violation notices 22 months apart, the residents’ deaths took place about 26 months apart, state records show.

“We recently determined that Brier Oak’s Notice was based on citation issuance date, not the date of the incidents that gave rise to the citations,” the health department said in a statement. “Therefore, this Notice of Suspension has been rescinded.”

Brier Oak on Sunset didn’t immediately respond to a request for comment.

The state investigation found that staff oversights at Brier Oak led to the deaths of two residents in 2022 and 2024.

In August, a patient died after rolling off a bed while her nurse was tending to a different patient, the state said in its citation report, which noted that paramedics found the woman lying on the floor in a pool of blood.

In May 2022, a patient died roughly 50 hours after her admission to Brier Oak. An investigation determined that staff neglected to administer crucial medications, the state said.

In a September 2022 phone interview, the patient’s family member told state investigators that “Resident 1 ‘did not get her medications for two days [from admission] and staff let her die,’” the state wrote in its report. The family member continued: “She did not deserve to die.”

The patient’s family was awarded $1.29 million in arbitration this month after a judge found that the facility was severely understaffed at the time of her arrival and should not have admitted her.

“Respondent’s Facility acted with recklessness in that they knew it was highly probable that their conduct would cause harm, and they knowingly disregarded this risk,” Superior Court Judge Terry A. Green wrote in the interim arbitration award.

License suspension efforts are still proceeding against Antelope Valley Care Center in Lancaster, Ararat Nursing Facility in Mission Hills, Golden Haven Care Center in Glendale, Kei-Ai Los Angeles Healthcare Center in Lincoln Park, Santa Anita Convalescent Hospital in Temple City and Seacrest Post-Acute Care Center in San Pedro.

Attorneys for Ararat said that the suspension was “unwarranted” and that it will be appealing. The other facilities didn’t respond to requests for comment.

Science

A Near-Full ‘Strawberry Moon’ Will Shine Again on Wednesday Night

Published

3 days ago

June 11, 2025

Press Room

A Near-Full ‘Strawberry Moon’ Will Shine Again on Wednesday Night

Night sky observers are being treated this week to a view of a red-tinted full moon — known in June as a “strawberry moon” — a phenomenon that occurs when the moon sits low on the Southern Horizon.

This summer, the reddish color is particularly pronounced because the moon is sitting at the lowest position it will reach for about 19 years.

The strawberry moon’s colorful hues were visible Tuesday night, and it reached its brightest point Wednesday around 4 a.m. Eastern time.

Here’s what it looked like:

Each month’s full moon has a name.

According to folklore, the name “strawberry moon” came from Algonquin Native American tribes to commemorate strawberry gathering season. Another name for the full moon in June is “rose moon,” which may have come from Europe.

“Most of the traditional names we use seem to come from Native American usage, but some are clearly European in origin, like the one in December, called ‘the moon before yule,’ a reference to Christmas,” said James Lattis, a historian of astronomy at the University of Wisconsin-Madison.

The moon will not sit this low on the Southern Horizon again for about 19 years.

Summer full moons are always low relative to winter full moons in the Northern Hemisphere, and therefore are more reddish in color, Dr. Lattis said. That’s because viewing the moon through the atmosphere gives it a reddish hue, much like the colors visible during a sunrise or sunset, he said.

“If one looks straight up into the sky, there’s less atmosphere,” he said. “If you’re looking through the horizon, you’re looking through the most atmosphere.”

The strawberry moon will still be “visually full” for observers on Wednesday night.

Dr. Lattis said that he had viewed the moon on Tuesday night in Wisconsin, and that it was notable for the pinkish hue it had from smoke in the air from wildfires. He said the sight may not be as dramatic elsewhere.

“I hate to discourage anybody from going out and looking at the moon — it’s a wonderful thing to do, and a lot of times, if you don’t give somebody a reason, they’ll never do it,” he said. “But it’s just another full moon.”

Battle over Space Command HQ location heats up as lawmakers press new Air Force secretary

West1 week ago

Battle over Space Command HQ location heats up as lawmakers press new Air Force secretary

iFixit says the Switch 2 is even harder to repair than the original

Technology1 week ago

iFixit says the Switch 2 is even harder to repair than the original

Business1 week ago

How Hard It Is to Make Trade Deals

Predator: Killer of Killers (2025) Movie Review | FlickDirect

Movie Reviews1 week ago

Predator: Killer of Killers (2025) Movie Review | FlickDirect

A History of Trump and Elon Musk's Relationship in their Own Words

Politics1 week ago

A History of Trump and Elon Musk's Relationship in their Own Words

US-backed GHF group extends closure of Gaza aid sites for second day

World1 week ago

US-backed GHF group extends closure of Gaza aid sites for second day

Most NATO members endorse Trump demand to up defence spending

World1 week ago

Most NATO members endorse Trump demand to up defence spending

Amid Trump, Musk blowup, canceling SpaceX contracts could cripple DoD launch program – Breaking Defense

News1 week ago

Amid Trump, Musk blowup, canceling SpaceX contracts could cripple DoD launch program – Breaking Defense

News Pub

When A.I.’s Output Is a Threat to A.I. Itself

Science

When A.I.’s Output Is a Threat to A.I. Itself

How an A.I. that draws digits “collapses” after being trained on its own output

Degenerative A.I.

Why collapse happens

Distribution of A.I.-generated data

Distribution of A.I.-generated data when trained on its own output

Distribution of A.I.-generated data when trained on its own output

Why it matters

‘A hidden danger’

Ways out

Leave a Reply
Cancel reply

Leave a Reply

Science

Former Cedars-Sinai OB-GYN surrenders license after sexual abuse complaints

Science

State rescinds suspension efforts for troubled nursing home in Hollywood

Science

A Near-Full ‘Strawberry Moon’ Will Shine Again on Wednesday Night

Each month’s full moon has a name.

The moon will not sit this low on the Southern Horizon again for about 19 years.

The strawberry moon will still be “visually full” for observers on Wednesday night.

Maine House gives initial approval to bill to bar transgender girls from women’s sports

Flood Watch Saturday June 14 Update: Expanded North Through Maryland and Pennsylvania – Just In Weather

What to know about Isle Royale, the Michigan national park were two campers died

Woman, man dead in Milton stabbing, an apparent murder-suicide, officials say

Shooter of two Minnesota lawmakers, families wrote ‘manifesto’ naming more officials

Why Marjorie Taylor Greene was ‘kicked out’ of the Freedom Caucus according to Rep. Buck

Colorado Rockies game no. 116 thread: Zac Gallen vs José Ureña

See it: Tesla crashes into Columbus convention center at 70 mph

Fox News Politics: Georgia the whole day through

Death of missing Oregon girl found in stream ruled homicide

'Social chaos': GOP, Dem lawmakers sound off on Los Angeles unrest

"Brussels, my love?" EU extends special status for Ukrainian refugees

‘No Kings’ demonstrators to gather across Greater Cincinnati in opposition to Trump

How Many Law Enforcement Agencies Are Involved in LA Immigration Protests?

California candidate for governor blasts Newsom while walking through LA riot aftermath

Trending

News Pub

When A.I.’s Output Is a Threat to A.I. Itself

How an A.I. that draws digits “collapses” after being trained on its own output

Degenerative A.I.

Why collapse happens

Distribution of A.I.-generated data

Distribution of A.I.-generated data when trained on its own output

Distribution of A.I.-generated data when trained on its own output

Why it matters

‘A hidden danger’

Ways out

You may like

Leave a Reply Cancel reply

Leave a Reply

Science

Former Cedars-Sinai OB-GYN surrenders license after sexual abuse complaints

Science

State rescinds suspension efforts for troubled nursing home in Hollywood

Science

A Near-Full ‘Strawberry Moon’ Will Shine Again on Wednesday Night

Each month’s full moon has a name.

The moon will not sit this low on the Southern Horizon again for about 19 years.

The strawberry moon will still be “visually full” for observers on Wednesday night.

Maine House gives initial approval to bill to bar transgender girls from women’s sports

Flood Watch Saturday June 14 Update: Expanded North Through Maryland and Pennsylvania – Just In Weather

What to know about Isle Royale, the Michigan national park were two campers died

Woman, man dead in Milton stabbing, an apparent murder-suicide, officials say

Shooter of two Minnesota lawmakers, families wrote ‘manifesto’ naming more officials

Why Marjorie Taylor Greene was ‘kicked out’ of the Freedom Caucus according to Rep. Buck

Colorado Rockies game no. 116 thread: Zac Gallen vs José Ureña

See it: Tesla crashes into Columbus convention center at 70 mph

Fox News Politics: Georgia the whole day through

Death of missing Oregon girl found in stream ruled homicide

'Social chaos': GOP, Dem lawmakers sound off on Los Angeles unrest

"Brussels, my love?" EU extends special status for Ukrainian refugees

‘No Kings’ demonstrators to gather across Greater Cincinnati in opposition to Trump

How Many Law Enforcement Agencies Are Involved in LA Immigration Protests?

California candidate for governor blasts Newsom while walking through LA riot aftermath

Trending

Leave a Reply
Cancel reply