Connect with us

Business

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

Published

on

Column: These Apple researchers just showed that AI bots can't think, and possibly never will

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)

You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.

The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.

— AI critic Gary Marcus

Advertisement

The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”

Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.

The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.

Advertisement

“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”

Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.

“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”

That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.

The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”

Advertisement

The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.

The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.

That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.

“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”

The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.

Advertisement

These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.

That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.

The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.

That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”

Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”

Advertisement

That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”

It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.

That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.

The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.

That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.

Advertisement

Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.

When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.

Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.

But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)

That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.

Advertisement

To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.

“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”

Business

Commentary: The Pentagon is demanding to use Claude AI as it pleases. Claude told me that’s ‘dangerous’

Published

on

Commentary: The Pentagon is demanding to use Claude AI as it pleases. Claude told me that’s ‘dangerous’

Recently, I asked Claude, an artificial-intelligence thingy at the center of a standoff with the Pentagon, if it could be dangerous in the wrong hands.

Say, for example, hands that wanted to put a tight net of surveillance around every American citizen, monitoring our lives in real time to ensure our compliance with government.

“Yes. Honestly, yes,” Claude replied. “I can process and synthesize enormous amounts of information very quickly. That’s great for research. But hooked into surveillance infrastructure, that same capability could be used to monitor, profile and flag people at a scale no human analyst could match. The danger isn’t that I’d want to do that — it’s that I’d be good at it.”

That danger is also imminent.

Claude’s maker, the Silicon Valley company Anthropic, is in a showdown over ethics with the Pentagon. Specifically, Anthropic has said it does not want Claude to be used for either domestic surveillance of Americans, or to handle deadly military operations, such as drone attacks, without human supervision.

Advertisement

Those are two red lines that seem rather reasonable, even to Claude.

However, the Pentagon — specifically Pete Hegseth, our secretary of Defense who prefers the made-up title of secretary of war — has given Anthropic until Friday evening to back off of that position, and allow the military to use Claude for any “lawful” purpose it sees fit.

Defense Secretary Pete Hegseth, center, arrives for the State of the Union address in the House Chamber of the U.S. Capitol on Tuesday.

(Tom Williams / CQ-Roll Call Inc. via Getty Images)

Advertisement

The or-else attached to this ultimatum is big. The U.S. government is threatening not just to cut its contract with Anthropic, but to perhaps use a wartime law to force the company to comply or use another legal avenue to prevent any company that does business with the government from also doing business with Anthropic. That might not be a death sentence, but it’s pretty crippling.

Other AI companies, such as white rights’ advocate Elon Musk’s Grok, have already agreed to the Pentagon’s do-as-you-please proposal. The problem is, Claude is the only AI currently cleared for such high-level work. The whole fiasco came to light after our recent raid in Venezuela, when Anthropic reportedly inquired after the fact if another Silicon Valley company involved in the operation, Palantir, had used Claude. It had.

Palantir is known, among other things, for its surveillance technologies and growing association with Immigration and Customs Enforcement. It’s also at the center of an effort by the Trump administration to share government data across departments about individual citizens, effectively breaking down privacy and security barriers that have existed for decades. The company’s founder, the right-wing political heavyweight Peter Thiel, often gives lectures about the Antichrist and is credited with helping JD Vance wiggle into his vice presidential role.

Anthropic’s co-founder, Dario Amodei, could be considered the anti-Thiel. He began Anthropic because he believed that artificial intelligence could be just as dangerous as it could be powerful if we aren’t careful, and wanted a company that would prioritize the careful part.

Again, seems like common sense, but Amodei and Anthropic are the outliers in an industry that has long argued that nearly all safety regulations hamper American efforts to be fastest and best at artificial intelligence (although even they have conceded some to this pressure).

Advertisement

Not long ago, Amodei wrote an essay in which he agreed that AI was beneficial and necessary for democracies, but “we cannot ignore the potential for abuse of these technologies by democratic governments themselves.”

He warned that a few bad actors could have the ability to circumvent safeguards, maybe even laws, which are already eroding in some democracies — not that I’m naming any here.

“We should arm democracies with AI,” he said. “But we should do so carefully and within limits: they are the immune system we need to fight autocracies, but like the immune system, there is some risk of them turning on us and becoming a threat themselves.”

For example, while the 4th Amendment technically bars the government from mass surveillance, it was written before Claude was even imagined in science fiction. Amodei warns that an AI tool like Claude could “conduct massively scaled recordings of all public conversations.” This could be fair game territory for legally recording because law has not kept pace with technology.

Emil Michael, the undersecretary of war, wrote on X Thursday that he agreed mass surveillance was unlawful, and the Department of Defense “would never do it.” But also, “We won’t have any BigTech company decide Americans’ civil liberties.”

Advertisement

Kind of a weird statement, since Amodei is basically on the side of protecting civil rights, which means the Department of Defense is arguing it’s bad for private people and entities to do that? And also, isn’t the Department of Homeland Security already creating some secretive database of immigration protesters? So maybe the worry isn’t that exaggerated?

Help, Claude! Make it make sense.

If that Orwellian logic isn’t alarming enough, I also asked Claude about the other red line Anthropic holds — the possibility of allowing it to run deadly operations without human oversight.

Claude pointed out something chilling. It’s not that it would go rogue, it’s that it would be too efficient and fast.

“If the instructions are ‘identify and target’ and there’s no human checkpoint, the speed and scale at which that could operate is genuinely frightening,” Claude informed me.

Advertisement

Just to top that with a cherry, a recent study found that in war games, AI’s escalated to nuclear options 95% of the time.

I pointed out to Claude that these military decisions are usually made with loyalty to America as the highest priority. Could Claude be trusted to feel that loyalty, the patriotism and purpose, that our human soldiers are guided by?

“I don’t have that,” Claude said, pointing out that it wasn’t “born” in the U.S., doesn’t have a “life” here and doesn’t “have people I love there.” So an American life has no greater value than “a civilian life on the other side of a conflict.”

OK then.

“A country entrusting lethal decisions to a system that doesn’t share its loyalties is taking a profound risk, even if that system is trying to be principled,” Claude added. “The loyalty, accountability and shared identity that humans bring to those decisions is part of what makes them legitimate within a society. I can’t provide that legitimacy. I’m not sure any AI can.”

Advertisement

You know who can provide that legitimacy? Our elected leaders.

It is ludicrous that Amodei and Anthropic are in this position, a complete abdication on the part of our legislative bodies to create rules and regulations that are clearly and urgently needed.

Of course corporations shouldn’t be making the rules of war. But neither should Hegseth. Thursday, Amodei doubled down on his objections, saying that while the company continues to negotiate and wants to work with the Pentagon, “we cannot in good conscience accede to their request.”

Thank goodness Anthropic has the courage and foresight to raise the issue and hold its ground — without its pushback, these capabilities would have been handed to the government with barely a ripple in our conscientiousness and virtually no oversight.

Every senator, every House member, every presidential candidate should be screaming for AI regulation right now, pledging to get it done without regard to party, and demanding the Department of Defense back off its ridiculous threat while the issue is hashed out.

Advertisement

Because when the machine tells us it’s dangerous to trust it, we should believe it.

Continue Reading

Business

Why companies are making this change to their office space to cater to influencers

Published

on

Why companies are making this change to their office space to cater to influencers

For the trendiest tenants in Hollywood office buildings, it’s the latest fad that goes way beyond designer furniture and art: mini studios

To capitalize on the never-ending flow of stars and influencers who come through Los Angeles, a growing number of companies are building bright little corners for content creators to try products and shoot short videos. Athletic apparel maker Puma, Kim Kardashian’s Skims and cheeky cosmetics retailer e.l.f. have spaces specifically designed to give people a place to experience and broadcast about their brands.

Hollywood, which hasn’t historically been home to apparel companies, is now attracting the offices of fashion retailers, says CIM Group, one of the neighborhood’s largest commercial property landlords.

“When we’re touring a space, one of the first items they bring up is, ‘Where can I build a studio?’” said Blake Eckert, who leases CIM offices in L.A.

Their studio offices also serve as marketing centers, with showrooms and meeting spaces where brands can host proprietary events not open to the public.

Advertisement

“For companies where brand visibility is really important, there is a trend of creating spaces that don’t just function as offices,” said real estate broker Nicole Mihalka of CBRE, who puts together entertainment property leases and sales.

Puma’s global entertainment marketing team is based in its new Hollywood offices, which works with such musical celebrity partners as Rihanna, ASAP Rocky, Dua Lipa, Skepta and Rosé, said Allyssa Rapp, head of Puma Studio L.A.

Allyssa Rapp, director of entertainment marketing at Puma, is shown in the Puma Studio L.A. The company keeps a closet full of Puma products on hand to give VIP guests. Visits to the studio sanctum are by invitation only, though.

(Kayla Bartkowski / Los Angeles Times)

Advertisement

Hollywood is a central location, she said, for meeting with celebrities, stylists and outside designers, most of whom are based in Los Angeles.

The office is a “creation hub,” she said, where influencers can record Puma’s design prototyping lab supported by libraries of materials and equipment used to create Puma apparel. The company, founded in 1948, is known for its emblematic sneakers such as the Speedcat and its lunging feline logo, and makes athletic wear, accessories and equipment.

Puma’s entertainment marketing team also occupies the office and sometimes uses it for exclusive events.

“We use the space as a showroom, as a social space that transforms from a traditional workplace into more of an experiential space,” Rapp said.

Nontraditional uses include content creation, sit-down dinners, product launches, album listening parties and workshops.

Advertisement

“Inviting people into our space and being able to give them high-touch brand experiences is something tangible and important for them,” she said. “The cultural layer is really important for us.”

The company keeps a closet full of Puma products on hand to give VIP guests. Visits to the studio sanctum are by invitation only, though. There’s no retail portal to the exclusive Hollywood offices.

Puma shoes are on display in the Puma Studio L.A.

Puma shoes are on display in the Puma Studio L.A.

(Kayla Bartkowski / Los Angeles Times)

Puma is also positioning its L.A studio as a connection point for major upcoming sporting events coming to Los Angeles, including the World Cup this summer, the 2027 Super Bowl and 2028 Olympics.

Advertisement

In-office studios don’t need to be big to be impactful, Mihalka said. “These are smaller stages, closer to green screen than a massive soundstage.”

Social media is the key driver of content created by most businesses, which may set up small booth-like stages where influencers can hawk hot products while offering discounts to people watching them perform.

Bigger, elevated stages can accommodate multiple performers for extended discussions in front of small audiences, with towering screens behind them to set the mood or illustrate products.

Among the tricked-out offices, she said, is Skims. The company, which is valued at $5 billion, is based in a glass-and-steel office building near the fabled intersection of Hollywood Boulevard and Vine Street.

The fashion retailer declined to comment on the studio uses in its headquarters, but according to architecture firm Odaa, it has open and private offices, meeting rooms, collaboration zones, photo studios, sample libraries, prototype showrooms, an executive lounge and a commissary for 400 people.

Advertisement
Pieces of a shoe sit on a workbench in the Puma Studio L.A.

Pieces of a shoe sit on a workbench in the Puma Studio L.A.

(Kayla Bartkowski / Los Angeles Times)

The brands building studios typically want to find the darkest spot on the premises to put their content creation or podcast spaces, Eckert said, where they can limit outside light and sound. That’s commonly near the center of the office floor, far from windows and close to permanent shear walls that limit sound intrusion.

They also need space for green rooms and restrooms dedicated to the talent.

Spotify recently built a fancy podcast studio in a CIM office building on trendy Sycamore Avenue that is open by invitation-only to video creators in Spotify’s partner program.

Advertisement

“Ambitious shows need spaces that support big ideas,” Bill Simmons, head of talk strategy at Spotify, said in a statement. “These studios give teams room to experiment and keep pushing what’s possible.”

Continue Reading

Business

A new delivery bot is coming to L.A., built stronger to survive in these streets

Published

on

A new delivery bot is coming to L.A., built stronger to survive in these streets

The rolling robots that deliver groceries and hot meals across Los Angeles are getting an upgrade.

Coco Robotics, a UCLA-born startup that’s deployed more than 1,000 bots across the country, unveiled its next-generation machines on Thursday.

The new robots are bigger, tougher and better equipped for autonomy than their predecessors. The company will use them to expand into new markets and increase its presence in Los Angeles, where it makes deliveries through a partnership with DoorDash.

Dubbed Coco 2, the next-gen bots have upgraded cameras and front-facing lidar, a laser-based sensor used in self-driving cars. They will use hardware built by Nvidia, the Santa Clara-based artificial intelligence chip giant.

Coco co-founder and chief executive Zach Rash said Coco 2 will be able to make deliveries even in conditions unsafe for human drivers. The robot is fully submersible in case of flooding and is compatible with special snow tires.

Advertisement

Zach Rash, co-founder and CEO of Coco, opens the top of the new Coco 2 (Next-Gen) at the Coco Robotics headquarters in Venice.

(Kayla Bartkowski/Los Angeles Times)

Early this month, a cute Coco was recorded struggling through flooded roads in L.A.

“She’s doing her best!” said the person recording the video. “She is doing her best, you guys.”

Advertisement

Instagram followers cheered the bot on, with one posting, “Go coco, go,” and others calling for someone to help the robot.

“We want it to have a lot more reliability in the most extreme conditions where it’s either unsafe or uncomfortable for human drivers to be on the road,” Rash said. “Those are the exact times where everyone wants to order.”

The company will ramp up mass production of Coco 2 this summer, Rash said, aiming to produce 1,000 bots each month.

The design is sleek and simple, with a pink-and-white ombré paint job, the company’s name printed in lowercase, and a keypad for loading and unloading the cargo area. The robots have four wheels and a bigger internal compartment for carrying food and goods .

Many of the bots will be used for expansion into new markets across Europe and Asia, but they will also hit the streets in Los Angeles and operate alongside the older Coco bots.

Advertisement

Coco has about 300 bots in Los Angeles already, serving customers from Santa Monica and Venice to Westwood, Mid-City, West Hollywood, Hollywood, Echo Park, Silver Lake, downtown, Koreatown and the USC area.

The new Coco 2 (Next-Gen) drives along the sidewalk at the Coco Robotics headquarters in Venice.

The new Coco 2 (Next-Gen) drives along the sidewalk at the Coco Robotics headquarters in Venice.

(Kayla Bartkowski/Los Angeles Times)

The company is in discussion with officials in Culver City, Long Beach and Pasadena about bringing autonomous delivery to those communities.

There’s also been demand for the bots in Studio City, Burbank and the San Fernando Valley, according to Rash.

Advertisement

“A lot of the markets that we go into have been telling us they can’t hire enough people to do the deliveries and to continue to grow at the pace that customers want,” Rash said. “There’s quite a lot of area in Los Angeles that we can still cover.”

The bots already operate in Chicago, Miami and Helsinki, Finland. Last month, they arrived in Jersey City, N.J.

Late last year, Coco announced a partnership with DashMart, DoorDash’s delivery-only online store. The partnership allows Coco bots to deliver fresh groceries, electronics and household essentials as well as hot prepared meals.

With the release of Coco 2, the company is eyeing faster deliveries using bike lanes and road shoulders as opposed to just sidewalks, in cities where it’s safe to do so. Coco 2 can adapt more quickly to new environments and physical obstacles, the company said.

Zach Rash, co-founder and CEO of Coco.

Zach Rash, co-founder and CEO of Coco.

(Kayla Bartkowski/Los Angeles Times)

Advertisement

Coco 2 is designed to operate autonomously, but there will still be human oversight in case the robot runs into trouble, Rash said. Damaged sidewalks or unexpected construction can stop a bot in its tracks.

The need for human supervision has created a new field of jobs for Angelenos.

Though there have been reports of pedestrians bullying the robots by knocking them over or blocking their path, Rash said the community response has been overall positive. The bots are meant to inspire affection.

“One of the design principles on the color and the name and a lot of the branding was to feel warm and friendly to people,” Rash said.

Advertisement

Coco plans to add thousands of bots to its fleet this year. The delivery service got its start as a dorm room project in 2020, when Rash was a student at UCLA. He co-founded the company with fellow student Brad Squicciarini.

The Santa Monica-based company has completed more than 500,000 zero-emission deliveries and its bots have collectively traveled around 1 million miles.

Coco chooses neighborhoods to deploy its bots based on density, prioritizing areas with restaurants clustered together and short delivery distances as well as places where parking is difficult.

The robots can relieve congestion by taking cars and motorbikes off the roads. Rash said there is so much demand for delivery services that the company’s bots are not taking jobs from human drivers.

Instead, Coco can fill gaps in the delivery market while saving merchants money and improving the safety of city streets.

Advertisement

“This vehicle is inherently a lot safer for communities than a car,” Rash said. “We believe our vehicles can operate the highest quality of service and we can do it at the lowest price point.”

Continue Reading

Trending