Connect with us

Business

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

Published

on

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

Advertisement

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

Advertisement

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

Advertisement

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

Advertisement

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

Advertisement

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Advertisement

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

Advertisement

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Business

Comcast considers spinning off cable channels like MSNBC. But analysts have doubts

Published

on

Comcast considers spinning off cable channels like MSNBC. But analysts have doubts

NBCUniversal parent Comcast Corp. is considering spinning off its cable networks into a separate company as the media giant continues to grapple with massive changes in the overall linear television business.

Folding the cable networks into their own company owned by Comcast shareholders could “position them to take advantage of opportunities in the changing media landscape and create value for our shareholders,” President Michael Cavanagh told analysts Thursday during Comcast’s third fiscal quarter earnings call.

“Like many of our peers in media, we are experiencing the effects of the transition in our video businesses and have been studying the best path forward for these assets,” he said, adding that the company was not ready to talk about specifics yet, but would update investors when there were “firm conclusions.”

NBCUniversal’s cable networks include USA Network, Bravo, MSNBC and Syfy.

Comcast also said it would consider partnerships for its streaming business, which has lost billions of dollars since launch.

Advertisement

The consideration comes as the cable television business is undergoing upheaval. As customers have turned to streaming services, they are continuing to cut the cord, leading to major concerns for legacy TV businesses. Paramount Global and Warner Bros. Discovery recently wrote down the value of their cable channel segments by billions of dollars.

But analysts were skeptical of such a move by Comcast to unload its cable channel assets.

These networks are able to maintain carriage and increase their fee rates with distributors in part because they’re bundled with broadcast network NBC. In addition, much of the content from the cable networks feeds into NBCUniversal’s streaming service, Peacock, Ric Prentiss, managing director at Raymond James, wrote in a Thursday note to clients.

“Splitting off these declining assets may be alluring, but we think there are complexities and dis-synergies in doing so,” he wrote. “Perhaps a private equity owner would find them attractive, but we think a standalone public stock may not perform well.”

Media analyst Rich Greenfield of LightShed Ventures was even more blunt, noting the extreme complexity of such a potential deal.

Advertisement

“We suspect this is much ado about nothing,” he wrote in a note to clients. “When you see a dismal future, with no path to growth, you sound the alarm and explore strategic alternatives.”

Walt Disney Co. Chief Executive Bob Iger previously floated the idea of spinning off the Burbank entertainment giant’s linear TV businesses, but later walked the comments back.

Comcast’s stock rose 3% to $43.67 after reporting generally positive earnings, thanks in part to a boost from the Summer Olympics. The shares are flat year-to-date.

Advertisement
Continue Reading

Business

Shohei Ohtani leads the way in Dodgers setting merchandise sales record after World Series win

Published

on

Shohei Ohtani leads the way in Dodgers setting merchandise sales record after World Series win

Dodgers fans were ready to celebrate the second Walker Buehler struck out Alex Verdugo for the final out of the World Series,

They were also ready to spend — and they did so more than any other fan base of a title-winning team in at least 10 years.

After clinching their eighth World Series title with a 7-6 win over the New York Yankees on Wednesday night, the Dodgers set a Fanatics sales record for first-hour sales of a team’s merchandise, across any sport, after claiming a championship.

The statistics are based on the amount of money spent on all merchandise, including jerseys, T-shirts, collectibles and more. Fanatics has been the official online sportswear retailer for most North American sports leagues — including MLB, NFL, NBA, WNBA and NHL — for more than a decade.

The Dodgers beat the sales record they set in 2020, after they won their first World Series title since 1988. Over the first 12 hours of sales following Wednesday’s game, Fanatics said, sales of Dodgers merchandise are up 20% from four years ago.

Advertisement

The company also released the top five Dodgers players in merchandise sales since the World Series ended. The name at the top of the list is no surprise — Japanese superstar Shohei Ohtani, who signed with the Dodgers in the offseason and was making his first postseason appearance after spending the first six years of his MLB career with the Angels.

First baseman and World Series MVP Freddie Freeman was No. 2. Playing through a serious ankle injury that limited his availability earlier in the playoffs, Freeman became the first player to hit a walk-off grand slam in the World Series. He hit a home run in each of the first four games of the series and drove in a record-tying 12 runs overall.

The top five was rounded out by right fielder Mookie Betts, rookie pitcher Yoshinobu Yamamoto and veteran pitcher Clayton Kershaw, who did not pitch during the postseason because of a toe injury.

Advertisement
Continue Reading

Business

Looking for new activities? Google wants you to turn to its navigation app

Published

on

Looking for new activities? Google wants you to turn to its navigation app

Search giant Google wants people to use its navigation app for more than just finding directions and avoiding traffic.

The tech giant is adding generative AI features to Google Maps so people can easily get recommendations for places to go and activities to do.

With 2 billion people using Google Maps every month, the company envisions people also will turn to the navigation app for inspiration, executives said at a press event at the company’s Street View Garage in Palo Alto on Wednesday.

Miriam Daniel, vice president and general manager of Google Maps, said the search giant has the ability to combine billions of pieces of information the company collects about the world and user reviews with generative AI.

“When we bring all this together, we will transform the way users interact with maps,” she said.

Advertisement

Rather than just finding directions or asking Google Maps to find the nearest gas station, users will be able to type out queries such as “things to do with friends at night in Boston” and get answers through the app. Curated with the help of Google’s generative AI chatbot and model known as Gemini, users then will see results that may include speakeasies or live music. Once the user taps on results for a business, for example, they’ll see a summary of reviews by users in addition to photos and videos of the place.

The AI-powered tools are rolling out this week on Apple and Android devices in the United States.

Google’s latest AI-powered updates underscore how the tech giant is responding to challenges to the company’s dominance in search. As the battle for the future of search heats up, the rise of AI tools such as OpenAI’s ChatGPT that can quickly summarize search results has the potential to reshape how people find and sift through information online.

Tech companies such as Meta, Apple and Microsoft have been responding to this change by infusing more generative AI tools into their products.

Google is no exception. At the company’s press event, a giant Google Map location icon, a blue Rivian vehicle and Google’s Street View cameras used to capture images of various locations filled the space.

Advertisement

As tech titans gather a trove of data about their users to power new generative AI tools, concerns about privacy, misinformation and copyright are some of the top issues companies have had to address.

Google also has faced scrutiny from regulators on its power over people’s lives, with a federal judge ruling in August that the company has an illegal monopoly on the online search market.

Daniel said when Google Maps provides users answers to their questions, the company isn’t using individualized information to provide personal results but contextual ones. For example, if a user asks Google Maps for things to do this weekend and it’s October, some of the suggestions might include seasonal activities such as pumpkin picking and going to a haunted house.

“We really take this seriously in making sure we’re using generative AI responsibly,” she said.

Google also is testing more AI-powered tools in another one of its popular navigation apps: Waze. Users will be able to tap a reporting button and tell the app that there’s a car accident ahead simply by speaking. Waze also will alert users when they’re near a school zone so they can be more careful about driving.

Advertisement

Developers are using Google’s AI technology to build new features in other products. Electric vehicle manufacturer Rivian used Google data so people can see summaries of restaurants, shops and supermarkets from the car’s infotainment screen, a tool that will be rolled out starting next month.

Continue Reading

Trending