Business

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

Published

1 year ago

November 1, 2024

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Business

Nike to Cut 1,400 Jobs as Part of Its Turnaround Plan

Published

5 hours ago

April 23, 2026

Press Room

Nike to Cut 1,400 Jobs as Part of Its Turnaround Plan

Nike is cutting about 1,400 jobs in its operations division, mostly from its technology department, the company said Thursday.

In a note to employees, Venkatesh Alagirisamy, the chief operating officer of Nike, said that management was nearly done reorganizing the business for its turnaround plan, and that the goal was to operate with “more speed, simplicity and precision.”

“This is not a new direction,” Mr. Alagirisamy told employees. “It is the next phase of the work already underway.”

Nike, the world’s largest sportswear company, is trying to recover after missteps led to a prolonged sales slump, in which the brand leaned into lifestyle products and away from performance shoes and apparel. Elliott Hill, the chief executive, has worked to realign the company around sports and speed up product development to create more breakthrough innovations.

In March, Nike told investors that it expected sales to fall this year, with growth in North America offset by poor performance in Asia, where the brand is struggling to rejuvenate sales in China. Executives said at the time that more volatility brought on by the war in the Middle East and rising oil prices might continue to affect its business.

The reorganization has involved cuts across many parts of the organization, including at its headquarters in Beaverton, Ore. Nike slashed some corporate staff last year and eliminated nearly 800 jobs at distribution centers in January.

“You never want to have to go through any sort of layoffs, but to re-center the company, we’re doing some of that,” Mr. Hill said in an interview earlier this year.

Mr. Alagirisamy told employees that Nike was reshaping its technology team and centering employees at its headquarters and a tech center in Bengaluru, India. The layoffs will affect workers across North America, Europe and Asia.

The cuts will also affect staffing in Nike’s factories for Air, the company’s proprietary cushioning system. Employees who work on the supply chain for raw materials will also experience changes as staff is integrated into footwear and apparel teams.

Nike’s Converse brand, which has struggled for years to revive sales, will move some of its engineering resources closer to the factories they support, the company said.

Mr. Alagirisamy said the moves were necessary to optimize Nike’s supply chain, deploy technology faster and bolster relationships with suppliers.

Business

Senate committee kills bill mandating insurance coverage for wildfire safe homes

Published

7 hours ago

April 23, 2026

Press Room

Senate committee kills bill mandating insurance coverage for wildfire safe homes

A bill that would have required insurers to offer coverage to homeowners who take steps to reduce wildfire risk on their property died in the Legislature.

The Senate Insurance Committee on Monday voted down the measure, SB 1076, one of the most ambitious bills spurred by the devastating January 2025 wildfires.

The vote came despite fire victims and others rallying at the state Capitol in support of the measure, authored by state Sen. Sasha Renée Pérez (D-Pasadena), whose district includes the Eaton fire zone.

The Insurance Coverage for Fire-Safe Homes Act originally would have required insurers to offer and renew coverage for any home that meets wildfire-safety standards adopted by the insurance commissioner starting Jan. 1, 2028.

It also threatened insurers with a five-year ban from the sale of home or auto insurance if they did not comply, though it allowed for exceptions.

However, faced with strong opposition from the insurance industry, Pérez had agreed to amend the bill so it would have established community-wide pilot projects across the state to better understand the most effective way to limit property and insurance losses from wildfires.

Insurers would have had to offer four years of coverage to homeowners in successful pilot projects.

Denni Ritter, a vice president of the American Property Casualty Insurance Assn., told the committee that her trade group opposed the bill.

“While we appreciate the intent behind those conversations, those concepts do not remove our opposition, because they retain the same core flaw — substituting underwriting judgment and solvency safeguards with a statutory mandate to accept risk,” she said.

In voting against the bill Sen. Laura Richardson, (D-San Pedro), said: “Last I heard, in the United States, we don’t require any company to do anything. That’s the difference between capitalism and communism, frankly.”

The remarks against the measure prompted committee Chair Sen. Steve Padilla, (D-Chula Vista), to chastise committee members in opposition.

“I’m a little perturbed, and I’m a little disappointed, because you have someone who is trying to work with industry, who is trying to get facts and data,” he said.

Monday’s vote was the fourth time a bill that would have required insurers to offer coverage to so-called “fire hardened” homes failed in the Legislature since 2020, according to an analysis by insurance committee staff.

Fire hardening includes measures such as cutting back brush, installing fire resistant roofs and closing eaves to resist fire embers.

Pérez’s legislation was thought to have a better chance of passage because it followed the most catastrophic wildfires in U.S. history, which damaged or destroyed more than 18,000 structures and killed 31 people.

The bill was co-sponsored by the Los Angeles advocacy group Consumer Watchdog and Every Fire Survivor’s Network, a community group founded in Altadena after the fires formerly called the Eaton Fire Survivors Network.

But it also had broad support from groups such as the California Apartment Association, the California Nurses Association and California Environmental Voters.

Leading up to the fires, many insurers, citing heightened fire risk, had dropped policyholders in fire-prone neighorhoods. That forced them onto the California FAIR Plan, the state’s insurer of last resort, which offers limited but costly policies.

A Times analysis found that that in the Palisades and Eaton fire zones, the FAIR Plan’s rolls from 2020 to 2024 nearly doubled from 14,272 to 28,440. Mandating coverage has been seen as a way of reducing FAIR Plan enrollment.

“I’m disappointed this bill died in committee. Fire survivors deserved better,” Pérez said in a statement .

Also failing Monday in the committee was SB 982, a bill authored by Sen. Scott Wiener, (D-San Francisco). It would have authorized California’s attorney general to sue fossil fuel companies to recover losses from climate-induced disasters. It was opposed by the oil and gas industry.

Passing the committee were two other Pérez bills. SB 877 requires insurers to provide more transparency in the claims process. SB 878 imposes a penalty on insurers who don’t make claims payments on time.

Another bill, SB 1301, authored by insurance commissioner candidate Sen. Ben Allen, (D-Pacific Palisades), also passed. It protects policyholders from unexplained and abrupt policy non-renewals.

Business

How We Cover the White House Correspondents’ Dinner

Published

17 hours ago

April 23, 2026

Press Room

How We Cover the White House Correspondents’ Dinner

Times Insider explains who we are and what we do, and delivers behind-the-scenes insights into how our journalism comes together.

Politicians in Washington and the reporters who cover them have an often adversarial relationship.

But on the last Saturday in April, they gather for an irreverent celebration of press freedom and the First Amendment at the Washington Hilton Hotel: The White House Correspondents’ Association dinner.

Hosted by the association, an organization that helps ensure access for media outlets covering the presidency, the dinner attracts Hollywood stars; politicians from both parties; and representatives of more than 100 networks, newspapers, magazines and wire services.

While The Times will have two reporters in the ballroom covering the event, the company no longer buys seats at the party, said Richard W. Stevenson, the Washington bureau chief. The decision goes back almost two decades; the last dinner The Times attended as an organization was in 2007.

“We made a judgment back then that the event had become too celebrity-focused and was undercutting our need to demonstrate to readers that we always seek to maintain a proper distance from the people we cover, many of whom attend as guests,” he said.

It’s a decision, he added, that “we have stuck by through both Republican and Democratic administrations, although we support the work of the White House Correspondents’ Association.”

Susan Wessling, The Times’s Standards editor, said the policy is a product of the organization’s desire to maintain editorial independence.

“We don’t want to leave readers with any questions about our independence and credibility by seeming to be overly friendly with people whose words and actions we need to report on,” she said.

The celebrity mentalist Oz Pearlman is headlining the evening, in lieu of the usual comedy set by the likes of Stephen Colbert and Hasan Minhaj, but all eyes will be on President Trump, who will make his first appearance at the dinner as president.

Mr. Trump has boycotted the event since 2011, when he was the butt of punchlines delivered by President Barack Obama and the talk show host Seth Meyers mocking his hair, his reality TV show and his preoccupation with the “birther” movement.

Last month, though, Mr. Trump, who has a contentious relationship with the media, announced his intention to attend this year’s dinner, where he will speak to a room full of the same reporters he often derides as “enemies of the people.”

Times reporters will be there to document the highs, the lows and the reactions in the room. A reporter for the Styles desk has also been assigned to cover the robust roster of after-parties around Washington.

Some off-duty reporters from The Times will also be present at this late-night circuit, though everyone remains cognizant of their roles, said Patrick Healy, The Times’s assistant managing editor for Standards and Trust.

“If they’re reporting, there’s a notebook or recorder out as usual,” he said. “If they’re not, they’re pros who know they’re always identifiable as Times journalists.”

For most of The Times’s reporters and editors, though, the evening will be experienced from home.

“The rest of us will be able to follow the coverage,” Mr. Stevenson said, “without having to don our tuxes or gowns.”