Business

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

Published

2 years ago

November 1, 2024

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Business

Walmart’s EV chargers are coming to California with discounts for members

Published

5 hours ago

July 8, 2026

Press Room

Walmart’s EV chargers are coming to California with discounts for members

Walmart is rapidly expanding its network of electric vehicle chargers designed for customers to use while they shop.

The network could help fill gaps in EV infrastructure in states with greater need for chargers. Walmart, which has more than 5,000 locations in the U.S. and hundreds in California, says more than 90% of Americans live within 10 miles of one of its stores.

The chargers also offer an incentive for customers to choose Walmart — Walmart Plus members will receive a 10% discount off an average price of $0.46 per kilowatt-hour of energy at the company’s chargers.

Walmart chargers are already available at more than 75 locations in 17 states, with Texas boasting the most charging stations, followed by Florida and Arizona.

Matthew Nelson, Walmart’s director of energy policy, said last week on LinkedIn that the network will soon reach 29 states, including California.

“We are delivering on the promise of affordable, reliable and convenient charging,” Nelson said in his post.

According to Walmart’s website, six charging stations are coming to California soon, though the company did not offer a specific timeline.

The chargers will be installed at stores in Antelope, Brea, Fresno, Stockton, Suisun City and Vallejo.

Most charging sites in California will include eight to 16 fast-charging stalls, said Walmart spokesperson Kelsey Bohl.

The company first announced plans in April 2023 to install its own EV chargers at Walmart and Sam’s Club stores, with a goal of installing thousands of chargers by 2030. Partnering with ABB E-Mobility and Alpitronic, it added 25 new charging sites this past May and six more in June.

“Walmart is building a leading retail-integrated EV fast-charging network, focused on delivering an affordable, reliable and convenient charging experience where customers already shop,” Bohl said in an emailed statement. “Customers can charge while they shop, access stations through the Walmart app they already use, and benefit from affordable pricing.”

The charging stations already available include 612 individual charging stalls using 400-kilowatt chargers. Each stall has a dual charging cord with both Combined Charging System and North American Charging Standard connectors. The standard connectors, designed by Tesla, are smaller and lighter than the combined systems.

The primary way to pay for the chargers is through the Walmart app, but the company is also experimenting with built-in credit card readers to allow those without the app to use the stations.

Customers can check charger availability on the Walmart app. The company said the chargers will be available 24 hours a day.

Business

Waymo reports teen riders for bad behavior and delivers them to the police

Published

15 hours ago

July 7, 2026

Press Room

Waymo reports teen riders for bad behavior and delivers them to the police

Robotaxis could be turning into robocops.

A self-driving Waymo reported two teens to San Mateo, Calif., police on Monday after they were found drinking alcohol and shooting toy guns in the back of the vehicle.

According to a social media post from the San Mateo Police Department, officers detained two 15-year-olds after the Waymo they were riding in contacted the department and stopped in a parking lot until law enforcement arrived.

“Parents do you know where your teens are?” the San Mateo Police Department wrote on Facebook following the incident. “Waymo does!”

Officers removed both teens from the vehicle and determined they were using toy guns to shoot Orbeez out the windows. Orbeez are small, water-absorbing beads sold at toy stores.

“Toy guns, water guns, and BB guns all pose real dangers, especially to an untrained eye,” the Police Department said. “The simple handling of them can cause fear in [passersby].” “

A video posted on Facebook shows at least five officers and a police dog responding to the scene and approaching the Waymo with their weapons raised.

Waymo did not immediately respond to a request for comment.

Waymo vehicles have internal cameras and microphones that may be used in an emergency or to “promote safety and security,” according to Waymo’s online support page.

The cameras are also used to ensure the vehicles are clean and to help find lost items, according to the support page.

The company said it does not use facial recognition or other biometric identification technologies to identify individuals.

“In more urgent circumstances, support may access live video during a trip,” the Waymo page said.

The San Mateo Police Department’s Facebook post has garnered nearly 60 comments, with one user accusing Waymo of “snitching.”

“At least they got a designated driver?!” one user commented.

Business

Commentary: How right-wing anti-transgender attacks led to a Supreme Court ruling upholding sex discrimination

Published

1 day ago

July 7, 2026

Press Room

Commentary: How right-wing anti-transgender attacks led to a Supreme Court ruling upholding sex discrimination

At the Supreme Court, the unfounded fear of boys masquerading as girls in youth sports rolled the clock back on gender equality.

On the surface, the Supreme Court’s June 30 opinion upholding state laws barring transgender girls from women’s and girl’s sports teams looks like a victory for women’s rights.

The 6-3 opinion by Justice Brett M. Kavanaugh certainly presents itself that way. “Females and males have inherent physical differences relevant to athletic performance,” Kavanaugh wrote. “Therefore, in contact sports, forcing female athletes to compete against males can create significant safety risks.” He also asserted that “forcing female athletes to compete against males can undermine competitive fairness.”

The ruling applied to prohibitions enacted in Idaho and West Virginia against “biological” males’ participation on women’s teams in public schools. Federal judges in both states overturned the bans. The Supreme Court majority restored them. The ruling essentially upholds similar bans enacted in 25 other states.

There was no record of any transgender person participating in school sports in the State, let alone any ‘problem’ with transgender students … creating unfair competition or unsafe conditions.

— Justice Sonia Sotomayor, demolishing the Supreme Court’s argument in favor of banning transgender girls from girl’s sports

Kavanaugh, like Donald Trump and others in the anti-transgender camp, maintained that one’s gender is an immutable fact of life, established even before birth.

Anything else, Trump stated in an executive order he issued on inauguration day 2025, could only be the product of “gender ideology extremism.” The U.S., his order stated, recognizes “two sexes, male and female. These sexes are not changeable and are grounded in fundamental and incontrovertible reality.” That’s a “biological truth,” he declared.

In his own version of this overconfident and factually insupportable conclusion, Kavanaugh wrote: “As all agree, females and males have inherent physical differences relevant to athletic performance.”

Science recognizes that some people are “born with sex traits that don’t fit into typical male or female patterns,” to cite a discussion on the Cleveland Clinic web page on the topic “intersex.” The condition “may involve chromosomes, hormones, reproductive organs or genitals.”

From a psychological standpoint, medical science recognizes “gender dysphoria” as a real condition often requiring counseling and medical intervention such as the use of puberty blockers and hormones to stave off the development of secondary sex characteristics until the condition can be resolved.

No one disputes that there are physical differences between the sexes. Few would dispute that on average or even at the median, males may be bigger and more powerful than females, or that in certain contact sports the difference may be telling and on occasion dangerous.

But that’s not the same as asserting that the physical differences between males and females invariably mean that men will invariably prevail over women in all competitions or that their participation will endanger women.

The International Olympic Committee — in a policy statement Kavanaugh cited incompletely — says that in “most running and swimming events,” males have a 10% to 12% advantage over women. That’s a range that would accommodate the full spectrum of outcomes — transgender females win, cisfemales win, they tie. (The “cis” prefix denotes those living consistent with their birth gender.)

West Virginia and Idaho addressed this ambiguity by banning transgender women from all girls’ teams. So under their rules transgender girls can’t play football or soccer with cisgirls. But what’s the argument in favor of banning them from the 100-yard dash, or cross-country track, or diving, or archery?

But something else is going on here. The Supreme Court’s ruling was almost preordained, given the years-long campaign by conservatives to demonize transgender individuals as if they’re members of an alien species.

It will be recalled that during his presidential campaign, Trump spun a despicable fantasy in which children were kidnapped in school and secretly subjected to sex-change operations.

Trump’s executive order wiped out policies aimed at protecting transgender adults from discrimination. He moved to outlaw gender-affirming medical therapies for anyone under 19 by cutting off federal funding for healthcare institutions that provide such care.

He banned transgender individuals from serving in the military and ordered federal prison officials to move transgender inmates into the general populations consistent with their birth genders, which exposes them to physical assault. (Federal Judge Royce Lamberth of Washington, D.C., has blocked the government from transferring three transgender women into the male prison population or terminating their hormone treatments.)

I wrote during Trump’s first term, when his anti-transgender policies were still gestating, that the goal was to show that “one can target any community, as long as it doesn’t have a strong political voice or political power. These are the actions of bullies and cowards, pretending to be strong.”

Last year, the Supreme Court struck its first blow against transgender rights by upholding a Tennessee law banning transgender care, including puberty blockers and hormone therapy, for minors. Similar laws have been enacted in 25 other states. The majority in that ruling by Chief Justice John G. Roberts Jr. was identical to the one in the June 30 ruling — Roberts, Kavanaugh, and Justices Clarence Thomas, Samuel A. Alito Jr., Neil M. Gorsuch and Amy Coney Barrett.

Who are the targets of this ideological campaign? They number only about 1.6 million U.S. adults, or one-half of 1% of the U.S. population. About 300,000 adolescents ages 13 to 17, or 1.4%, identify as transgender, according to a study by UCLA School of Law.

In West Virginia, as Justice Sonia Sotomayor observed in her dissenting opinion, “there was no record of any transgender person participating in school sports in the State, let along any ‘problem’ with transgender students … creating unfair competition or unsafe conditions.”

In endorsing the flat bans directed at transgender women in Idaho and West Virginia, Kavanaugh argued that any attempt to implement case-by-case judgments of students’ requests to join sports teams inconsistent with their biological gender would create “an enormous practical and administrability problem.”

Is that so? That wasn’t the case in Maine, where the annual K-12 population is more than 170,000. There, a committee was charged with determining whether a student’s participation in a sport consistent with their gender identity but inconsistent with their biological sex would “result in an unfair athletic advantage” or present a risk of injury to others. The committee held 56 hearings from 2013 through 2021, or an average of seven per year. During the entire time span, only four involved transgender girls. (The outcome of those hearings couldn’t be learned.)

It was Maine’s policy, one might recall, that provoked a confrontation between Trump and Maine Gov. Janet Mills at the White House last year, when Trump threatened to withhold federal funding from the state unless it barred transgender students from competing on women’s sports teams. “We’ll see you in court,” Mills snapped.

Whether the Idaho and West Virginia laws genuinely protect girls from unfair competition is questionable. (The Idaho law is styled the “Fairness in Women’s Sports Act.”) In practice, the laws may subject women in public schools to “invasive sex verification procedures,” as educational expert George Theoharis of Syracuse University wrote after the court ruling.

They’re also based on a retrograde view of women as fragile creatures needing men’s protection, Theoharis wrote — “the same logic that has historically been used to justify excluding women from making their own healthcare decisions and girls from rigorous math and science; that physically demanding work is simply beyond them.” (There don’t appear to be any state laws barring transgender women from competing in men’s sports.)

Becky Pepper-Jackson, the plaintiff in the West Virginia case, in which she is identified only as B.P.J., is the only transgender girl who sought to join girl’s teams — track and cross-country — in the state. That was in 2021, just after West Virginia passed its law and she was about to enter sixth grade. She didn’t appear to pose any competitive risk to others on the track and cross-country teams she applied to join — her lawyers told the Supreme Court that on those no-cut teams, she “came in near the back.”

Anyway, she had not gone through male puberty, which theoretically might have endowed her with a competitive advantage, because she had been taking puberty blockers and female hormones.

Thanks to the court’s ruling, Sotomayor observed in a dissent joined by Justices Elena Kagan and Ketanji Brown Jackson, West Virginia can deny Becky access to school sports “because it thinks they have an inherent athletic advantage, even if the facts show that they do not.”

B.P.J., Sotomayor wrote, “cannot practice on girls’ teams, even if she would not take anyone’s spot in an eventual competition, even if everyone who tries out for the team makes it, and even if having the chance to participate could aid immensely in treating B. P. J.’s gender dysphoria.”

So whose interest was really protected by the Supreme Court?