Connect with us

Business

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

Published

on

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

Advertisement

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

Advertisement

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

Advertisement

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

Advertisement

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

Advertisement

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Advertisement

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

Advertisement

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Business

The Container Store files for bankruptcy amid stiff competition

Published

on

The Container Store files for bankruptcy amid stiff competition

The Container Store has filed for Chapter 11 bankruptcy protection amid steep losses, slumping sales and increased competition.

Business in its stores and online will continue as usual while it restructures, the Texas-based home goods, storage and custom closets chain said late Sunday. Customer deposits for in-home services will be honored, and merchandise orders will be delivered as normal.

“The Container Store is here to stay,” Chief Executive Satish Malhotra said in a statement. “Our strategy is sound, and we believe the steps we are taking today will allow us to continue to advance our business.”

The Container Store peaked in its 2021 fiscal year, when the company exceeded $1 billion in sales for the first time and posted record earnings as consumers spent heavily on home remodeling and redecorating projects during months of pandemic quarantine. A national de-cluttering craze, set off by organization expert Marie Kondo, also benefited the chain.

But since then, the Container Store has struggled.

Advertisement

Part of the company’s struggles are due to competition from rivals including Target, Walmart and Amazon, which often sell storage items that are similarly stylish at a lower price point. And with housing prices and mortgage rates remaining stubbornly high, many prospective home buyers have been forced to wait on the sidelines, dampening demand for a wide range of products and services that come with outfitting a new property.

For the three months ended Sept. 28, the Container Store reported a loss of $16.1 million. Sales totaled $196.6 million, down 10.5% compared with the same quarter a year earlier. Same-store sales fell 12.5%.

Founded in 1978, the Container Store operates more than 100 stores around the country. In Los Angeles County, it has locations in Century City, El Segundo, Pasadena and Woodland Hills.

It filed for bankruptcy protection in the Southern District of Texas, two weeks after the New York Stock Exchange notified the company that its shares would be suspended for failing to maintain an average global market capitalization of at least $15 million over 30 consecutive trading days.

The Container Store said it expected to confirm a plan of reorganization within 35 days and emerge from bankruptcy soon after as a private company. The company said at least 90% of its term loan lenders had pledged $40 million in new money financing.

Advertisement

The Chapter 11 process does not include Elfa, a separate customized closet business based in Sweden, which is owned by the Container Store.

In an email to customers Monday, Malhotra said the company had felt “the impact of the challenging macro-economic environment” but reassured them that “our obligations to you will be fulfilled as expected.”

“You can feel confident that any orders, deposits or business you have with us are safe,” he said.

It has been a tough month for large-format retail chains. Last week Party City filed for Chapter 11 bankruptcy and said it would close all of its roughly 700 stores nationwide, and Big Lots said it would begin going-out-of-business sales at about 870 stores after a deal to sell the company fell through.

Advertisement
Continue Reading

Business

Judge enters default judgment in suit against Kanye West's private school

Published

on

Judge enters default judgment in suit against Kanye West's private school

A judge entered a default judgment against Kanye West’s Christian private school in Los Angeles Superior Court on Wednesday in connection with a lawsuit filed by a former employee.

Isaiah Meadows, Yeezy Christian Academy’s former assistant principal, sought a default judgment in his wrongful termination and unpaid wages lawsuit against the school — later rebranded Donda Academy — and other defendants for failure to appear through licensed attorneys.

The judge, Christopher K. Lui, ruled in favor of Meadows’ motion. He also ruled that the answers given by defendants — Yeezy Christian Academy, Donda Services LLC and Strokes Canyon LLC — in response to Meadows’ complaint be stricken.

Last year, a lawyer representing West, and the three other defendants denied “each and every allegation of Meadows complaint,” in a filing with the court.

In August, Brian Blumfield, West’s most recent attorney who was representing the music mogul and other business entities in the matter, sought his removal from the case on the grounds that the defendants had terminated their relationship in June and that they had refused to speak to or pay Blumfield, according to court filings. The judge granted the request.

Advertisement

Meadows had alleged that he brought many of the school’s health and safety issues to the attention of West and the school’s director. But they were left unaddressed and Meadows was later fired.

According to the complaint, a skylight in one of the classrooms didn’t have glass, allowing rain to fall in the building. West reportedly did not like glass.

“Water would soak into the floor, which would lead to a moldy smell for the next few days.”

Further, electrical and telephone wires were also allegedly left exposed and on one occasion an electrical fire started near a student dining area.

In 2020, Meadows was offered $165,000 salary to work, according to the suit. However, he claimed that West later reneged on his promise to pay for his rent after doing so for three months — Meadows had relocated with his family from North Hollywood to Calabasas to work at the school.

Advertisement

The rent payments ended in February 2021, Meadows claimed after he “was suspended after calling for meetings and raising concerns regarding operations of the school.”

Meadows alleged that his salary was then cut and he was later demoted and worked as a teacher’s assistant and physical education teacher. That April, he sent an email outlining his concerns about his pay and that of other staff members.

Nearly two weeks before the new school year was to start in 2022, Meadows was told that he was being terminated “with no explanation as to why.”

The suit is one of at least five filed against West and Donda Academy since 2023 that allege a hostile workplace as a result of West’s conduct, which includes claims of discrimination and antisemitism, and retaliation, as well as various health and safety issues at the school’s property that was located first in Calabasas, then Simi Valley and finally in Chatsworth.

Donda Academy abruptly shut down in October 2022, amid a cascade of fallout from West’s antisemitic comments, which led a number of his business partners such as the Gap and Adidas to sever ties with him.

Advertisement

There were reports that the school reopened shortly thereafter; however, according to the California Department of Education, the school has been closed since June of this year.

Continue Reading

Business

Santa, aka the IRS, might be dropping $1,400 into your stocking this year

Published

on

Santa, aka the IRS, might be dropping ,400 into your stocking this year

Everyone’s favorite Christmas gift giver, the Internal Revenue Service, has announced that it will be doling out more than $2 billion in checks to Americans this month as part of its effort to make sure everyone received their stimulus payments from 2021.

The federal tax agency has announced that an internal review showed many Americans had never received their economic impact payments, which were supposed to go out following the filing of 2021 tax returns. Because of this, the agency is paying out the money they still owe Americans who never received their checks.

Although most eligible Americans received their stimulus payments, the checks will be sent to those who qualified but filed a 2021 tax return that left the space for recovery rebate credit blank.

Those people are eligible for up to $1,400 from the federal government. The payments should be received by late January 2025, at the latest.

“These payments are an example of our commitment to go the extra mile for taxpayers. Looking at our internal data, we realized that 1 million taxpayers overlooked claiming this complex credit when they were actually eligible,” said IRS Commissioner Danny Werfel. “To minimize headaches and get this money to eligible taxpayers, we’re making these payments automatic, meaning these people will not be required to go through the extensive process of filing an amended return to receive it.”

Advertisement

Stimulus payments of $1,400 were sent out to Americans as part of a $1.9-trillion COVID-19 relief bill. Millions of Americans were eligible for the payments.

To get a check, Americans were required to make less than $75,000 per year or under $150,000 as a household.

Continue Reading

Trending