Business
Column: These Apple researchers just showed that AI bots can't think, and possibly never will
See if you can solve this arithmetic problem:
Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?
If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)
You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.
The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results.
— AI critic Gary Marcus
The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”
Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.
The Apple findings were published earlier this month in a technical paper that has attracted widespread attention in AI labs and the lay press, not only because the results are well-documented, but also because the researchers work for the nation’s leading high-tech consumer company — and one that has just rolled out a suite of purported AI features for iPhone users.
“The fact that Apple did this has gotten a lot of attention, but nobody should be surprised at the results,” says Gary Marcus, a critic of how AI systems have been marketed as reliably, well, “intelligent.”
Indeed, Apple’s conclusion matches earlier studies that have found that large language models, or LLMs, don’t actually “think” so much as match language patterns in materials they’ve been fed as part of their “training.” When it comes to abstract reasoning — “a key aspect of human intelligence,” in the words of Melanie Mitchell, an expert in cognition and intelligence at the Santa Fe Institute — the models fall short.
“Even very young children are adept at learning abstract rules from just a few examples,” Mitchell and colleagues wrote last year after subjecting GPT bots to a series of analogy puzzles. Their conclusion was that “a large gap in basic abstract reasoning still remains between humans and state-of-the-art AI systems.”
That’s important because LLMs such as GPT underlie the AI products that have captured the public’s attention. But the LLMs tested by the Apple team were consistently misled by the language patterns they were trained on.
The Apple researchers set out to answer the question, “Do these models truly understand mathematical concepts?” as one of the lead authors, Mehrdad Farajtabar, put it in a thread on X. Their answer is no. They also pondered whether the shortcomings they identified can be easily fixed, and their answer is also no: “Can scaling data, models, or compute fundamentally solve this?” Farajtabar asked in his thread. “We don’t think so!”
The Apple research, along with other findings about the limitations of AI bots’ cogitative limitations, is a much-needed corrective to the sales pitches coming from companies hawking their AI models and systems, including OpenAI and Google’s DeepMind lab.
The promoters generally depict their products as dependable and their output as trustworthy. In fact, their output is consistently suspect, posing a clear danger when they’re used in contexts where the need for rigorous accuracy is absolute, say in healthcare applications.
That’s not always the case. “There are some problems which you can make a bunch of money on without having a perfect solution,” Marcus told me. Recommendation engines powered by AI — those that steer buyers on Amazon to products they might also like, for example. If those systems get a recommendation wrong, it’s no big deal; a customer might spend a few dollars on a book he or she didn’t like.
“But a calculator that’s right only 85% of the time is garbage,” Marcus says. “You wouldn’t use it.”
The potential for damagingly inaccurate outputs is heightened by AI bots’ natural language capabilities, with which they offer even absurdly inaccurate answers with convincingly cocksure elan. Often they double down on their errors when challenged.
These errors are typically described by AI researchers as “hallucinations.” The term may make the mistakes seem almost innocuous, but in some applications, even a minuscule error rate can have severe ramifications.
That’s what academic researchers concluded in a recently published analysis of Whisper, an AI-powered speech-to-text tool developed by OpenAI, which can be used to transcribe medical discussions or jailhouse conversations monitored by correction officials.
The researchers found that about 1.4% of Whisper-transcribed audio segments in their sample contained hallucinations, including the addition to transcribed conversation of wholly fabricated statements including portrayals of “physical violence or death … [or] sexual innuendo,” and demographic stereotyping.
That may sound like a minor flaw, but the researchers observed that the errors could be incorporated in official records such as transcriptions of court testimony or prison phone calls — which could lead to official decisions based on “phrases or claims that a defendant never said.”
Updates to Whisper in late 2023 improved its performance, the researchers said, but the updated Whisper “still regularly and reproducibly hallucinated.”
That hasn’t deterred AI promoters from unwarranted boasting about their products. In an Oct. 29 tweet, Elon Musk invited followers to submit “x-ray, PET, MRI or other medical images to Grok [the AI application for his X social media platform] for analysis.” Grok, he wrote, “is already quite accurate and will become extremely good.”
It should go without saying that, even if Musk is telling the truth (not an absolutely certain conclusion), any system used by healthcare providers to analyze medical images needs to be a lot better than “extremely good,” however one might define that standard.
That brings us to the Apple study. It’s proper to note that the researchers aren’t critics of AI as such but believers that its limitations need to be understood. Farajtabar was formerly a senior research scientist at DeepMind, where another author interned under him; other co-authors hold advanced degrees and professional experience in computer science and machine learning.
The team plied their subject AI models with questions drawn from a popular collection of more than 8,000 grade school arithmetic problems testing schoolchildren’s understanding of addition, subtraction, multiplication and division. When the problems incorporated clauses that might seem relevant but weren’t, the models’ performance plummeted.
That was true of all the models, including versions of the GPT bots developed by OpenAI, Meta’s Llama, Microsoft’s Phi-3, Google’s Gemma and several models developed by the French lab Mistral AI.
Some did better than others, but all showed a decline in performance as the problems became more complex. One problem involved a basket of school supplies including erasers, notebooks and writing paper. That requires a solver to multiply the number of each item by its price and add them together to determine how much the entire basket costs.
When the bots were also told that “due to inflation, prices were 10% cheaper last year,” the bots reduced the cost by 10%. That produces a wrong answer, since the question asked what the basket would cost now, not last year.
Why did this happen? The answer is that LLMs are developed, or trained, by feeding them huge quantities of written material scraped from published works or the internet — not by trying to teach them mathematical principles. LLMs function by gleaning patterns in the data and trying to match a pattern to the question at hand.
But they become “overfitted to their training data,” Farajtabar explained via X. “They memorized what is out there on the web and do pattern matching and answer according to the examples they have seen. It’s still a [weak] type of reasoning but according to other definitions it’s not a genuine reasoning capability.” (the brackets are his.)
That’s likely to impose boundaries on what AI can be used for. In mission-critical applications, humans will almost always have to be “in the loop,” as AI developers say—vetting answers for obvious or dangerous inaccuracies or providing guidance to keep the bots from misinterpreting their data, misstating what they know, or filling gaps in their knowledge with fabrications.
To some extent, that’s comforting, for it means that AI systems can’t accomplish much without having human partners at hand. But it also means that we humans need to be aware the tendency of AI promoters to overstate their products’ capabilities and conceal their limitations. The issue is not so much what AI can do, but how users can be gulled into thinking what it can do.
“These systems are always going to make mistakes because hallucinations are inherent,” Marcus says. “The ways in which they approach reasoning are an approximation and not the real thing. And none of this is going away until we have some new technology.”
Business
How Iran War Is Threatening Global Oil and Gas Supplies
Ships near the Strait of Hormuz before and after attacks began
Every day, around 80 oil and gas tankers typically pass through the Strait of Hormuz, the narrow waterway off Iran’s southern coast that carries a fifth of the world’s oil and a significant amount of natural gas.
On Monday, just two oil and gas tankers appear to have crossed the strait, according to a New York Times analysis of shipping activity from Kpler, an industry data firm. Since then, one tanker passed through.
“It’s a de facto closure,” said Dan Pickering, chief investment officer of Pickering Energy Partners, a Houston financial services firm. “You’ve got a significant number of vessels on either side of the strait but no one is willing to go through.”
Tankers have been staying away from Hormuz since the U.S.-Israeli attacks on Iran that began on Saturday. A prolonged conflict could ripple broadly across the global economy, threatening the energy supplies of countries halfway around the world and stoking inflation.
International oil prices have climbed 12 percent since the fighting began, trading Tuesday around $81 a barrel, and natural gas prices have surged in Europe and in Asia.
A senior Iranian military official threatened on Monday to “set on fire” any ships traveling through the Strait of Hormuz. Vessels in the region have already come under attack. Several oil and gas facilities have also been struck or affected by nearby shelling, though the damage did not initially appear to be catastrophic.
Where ships and energy facilities have been damaged
A fire broke out Tuesday at a major energy hub in Fujairah, United Arab Emirates, from the falling debris of a downed drone, the authorities said. On Monday, Qatar halted production of liquefied natural gas, or fuel that has been cooled so that it can be transported on ships, after attacks on its facilities.
The sharp reduction in tanker traffic is reducing the supply of oil and gas to world markets, pushing up prices for both commodities. And the longer that ships stay away from the Strait of Hormuz, the less oil and gas get out to the world, which could raise prices even more.
Shipping companies have paused their tankers to protect their crew and cargo, and because insurance companies are charging significantly more to cover vessels in the conflict area.
On Tuesday, President Trump said that “if necessary,” the U.S. Navy would begin escorting tankers through the strait. He also said a U.S. government agency would begin offering “political risk insurance” to shipping lines in the area.
In addition to tankers, other large vessels regularly go through the strait, including car carriers and container ships. In normal conditions, nearly 160 make the trip each day.
Some ships in the region turn off the devices that broadcast their positions, while others transmit false locations — making it hard to give a full picture of the traffic in the strait.
The Shiva is a small oil tanker that has repeatedly faked its location, according to TankerTrackers.com, which tracks global oil shipments. It is suspected of carrying sanctioned Iranian oil, according to Kpler. The Shiva was one of the two tankers that crossed the strait on Monday.
The oil and gas that typically move through the strait come from big producing countries like Saudi Arabia, Iraq, Iran and United Arab Emirates, and are exported around the world.
Where tankers moving through the Strait have traveled
In 2024, more than 80 percent of the oil and gas transported through the Strait of Hormuz went to Asia. China, India, Japan and South Korea were the top importers, according to the U.S. Energy Information Administration.
Countries have energy stockpiles that could last them into the coming months, but a continued shutdown of the strait could damage their economies.
Several big disruptions have roiled supply chains in recent years, but the tanker standstill in the Strait of Hormuz could have an outsize impact.
Business
Paramount credit downgraded to ‘junk’ status over debt worries
Paramount Skydance’s jubilation over its come-from-behind victory to claim Warner Bros. Discovery has entered a new phase:
Call it the deal-debt hangover.
Two major ratings agencies have raised concerns about Paramount’s credit because of the enormous debt the David Ellison-led company will have to shoulder — at least $79 billion — once it absorbs the larger Warner Bros. Discovery, bringing CNN, HBO, TBS and Cartoon Network into the Paramount fold.
Fitch Ratings said Monday that it placed Paramount on its “negative” ratings watch, and downgraded its credit to BB+ from BBB-, which puts the company’s credit into “junk” territory. Fitch said it took action due to “uncertainty” surrounding Paramount’s $110-billion deal for Warner Bros. Discovery, which the boards of both companies approved on Friday.
S&P Global Ratings took similar action.
To finance the Warner takeover, Ellison’s billionaire father, Larry Ellison, has agreed to guarantee the $45.7 billion in equity needed. Bank of America, Citibank and Apollo Global have agreed to provide Paramount with more than $54 billion in debt financing.
“Potential credit risks include the prospective debt-funded structure, Fitch’s expectation of materially elevated leverage and limited visibility on post-transaction financial policy and capital structure,” Fitch said.
Late last week, Paramount sent $2.8 billion to Netflix as a “termination fee” to officially end the streaming giant’s pursuit of Warner Bros. That payment paved the way for Warner and Paramount’s board to enter into the new merger agreement.
Paramount hopes the merger will be wrapped up by the end of September. It needs the approval of Warner Bros. Discovery shareholders and regulators, including the European Union.
Paramount executives acknowledged this week the new company would emerge with $79 billion in debt — a considerably higher total than what Warner Bros. Discovery had following its spinoff from AT&T. That 2022 transaction left Warner Bros. Discovery with nearly $55 billion of debt, a burden that led to endless waves of cost-cutting, including thousands of layoffs and dozens of canceled projects.
Warner still has $33.5 billion in debt, a lingering legacy that will be passed on to Paramount.
Paramount plans to restructure about $15 billion in Warner Bros. Discovery’s existing debt.
Paramount CEO David Ellison at a 2024 movie premiere for a Netflix show.
(Evan Agostini / Invision / AP)
Paramount told Wall Street it would find more than $6 billion in cost cuts or “synergies” within three years — a number that has weighed heavily on entertainment industry workers, particularly in Los Angeles.
Hollywood already is reeling from previous mergers in addition to a sharp pullback in film and television production locally as filmmakers chase tax credits offered overseas and in other states, including New York and New Jersey.
Some entertainment executives, including Netflix Co-Chief Executive Ted Sarandos, have speculated that Paramount will need to find more than $10 billion in cost cuts to make the math work. More recently, Sarandos went higher, telling Bloomberg News that Paramount may need $16 billion in cuts.
Cognizant of widespread fears about additional layoffs, Paramount Chief Operating Officer Andrew Gordon took steps this week to try to tamp down such concerns.
Gordon is a former Goldman Sachs banker and a former executive with RedBird Capital Partners, an investor in Paramount and the proposed Warner Bros. deal. He joined Paramount last August as part of the Ellison takeover.
During a conference call Monday with analysts, Gordon said Paramount would look beyond the workforce for cuts because the company wants to maintain its film and TV production levels.
Paramount plans to look for cost savings by consolidating the “technology stacks and cloud providers” for its streaming services, including Paramount+ and HBO Max, Gordon said. The company also would search for reductions in corporate overhead, marketing expenses, procurement, business services and “optimizing the combined real estate footprint.”
It’s unclear whether Paramount would sell the historic Melrose Avenue lot or simply centralize the sprawling operations onto the Warner Bros. and Paramount lots in Burbank and Hollywood.
Workers are scattered throughout the region.
HBO, owned by Warner Bros. Discovery, maintains its West Coast headquarters in Culver City; CBS television stations operate from CBS’ former lot off Radford Avenue in Studio City; and CBS Entertainment and Paramount cable channels executive teams are located in a high-rise off Gower Street and Sunset Boulevard, blocks from the Paramount movie studio lot.
“The combination of PSKY and WBD could create a materially stronger business than either individual entity,” Standard & Poor’s said in its note to investors. “However, this transaction presents unique challenges because it would involve the combination of three companies, with the smallest, Skydance, being the controlling entity.”
David Ellison’s production firm, Skydance Media, was the entity that bought Paramount, creating Paramount Skydance.
Ellison has not announced what the combined company will be called.
Paramount shares closed down more than 6% Tuesday to $12.45.
Warner Bros. Discovery fell 1% to $28.20. Netflix added less than 1% to close at $97.70.
Business
Commentary: Trump Media’s financial report revives doubts for investors
So much Trump-related news has appeared lately on the airwaves and in web pixels — what with Iran and Epstein and Minnesota and so on — that inevitably a nugget will fall between the cracks.
That seems to have been the fate of the most recent annual financial report of Trump Media and Technology Group, which covered calendar year 2025 and was issued Friday.
Trump Media, which is 52% owned by Donald Trump and trades on Nasdaq with a ticker symbol based on his initials (DJT), is the holding company for Trump’s social media platform, Truth Social.
The value of TMTG’s brand may diminish if the popularity of President Donald J. Trump were to suffer.
— A risk factor disclosed by Trump Media
The annual financial disclosure has garnered minimal press coverage. That’s a pity, because it makes fascinating reading, though not in a good way.
Here are the top and bottom lines from the 10-k annual report: Trump Media lost $712.1 million last year on revenue of about $3.7 million. That’s quite a bit worse than its performance in 2024, when it lost $409 million on revenue of about $3.6 million. The company attributed most of the flood of red ink to “loss from investments,” of which more in a moment.
Truth Social isn’t an especially strong keystone of this operation. The platform is chiefly an outlet for Trump’s social media ramblings and the occasional official White House statements. But no one has to sign in to Truth Social to see them — they’re almost invariably picked up by the news media or reposted by users on other platforms such as X.
That might explain Truth Social’s relatively scrawny user base. The platform is estimated to have about 2 million active users, according to the analytical firm Search Logistics. By comparison, X has about 450 million monthly active users and Facebook has more than 2.9 billion.
It’s no mystery, then, why TMTG disdains “traditional performance metrics like average revenue per user, ad impressions and pricing, or active user accounts, including monthly and daily active users,” according to its annual report.
Relying on those metrics, which are used to judge TMTG’s social media rivals, “might not align with the best interests of TMTG or its stockholders, as it could lead to short-term decision-making at the expense of long-term innovation and value creation.”
Instead, the company says it should be evaluated based on “its commitment to a robust business plan that includes introducing innovative features, new products, new technologies.” But it also acknowledges that, at its heart, TMTG is a proxy for “the reputation and popularity of President Donald J. Trump.” The company warns that “the value of TMTG’s brand may diminish if the popularity of President Donald J. Trump were to suffer.”
How has that played out in real time? Trump Media notched its highest closing price as a public company, $66.22, on March 27, 2024, the day after its initial public offering. In midday trading Monday, the shares were quoted at $11.08, for a loss of 83% since the IPO.
One can’t quibble with stock market price quotes; nor can one finagle annual profit and loss statements, at least not without receiving questions, and perhaps lawsuit complaints, from attentive investors and the Securities and Exchange Commission.
In recent months, TMTG has engaged in a number of baroque financial transactions.
In May, the company announced that it was planning to raise $3.5 billion from institutions to invest in bitcoin, with the money to come from issues of common and preferred shares. The goal was to climb onto the cryptocurrency train, which Trump himself was fueling by, among other things, issuing an executive order promoting the expansion of crypto in the U.S. and denigrating enforcement efforts by the Biden administration as reflecting a “war on cryptocurrency.”
Under Trump, federal regulators have dropped numerous investigations related to cryptocurrencies. Trump has also talked about creating a government crypto strategic reserve, which would entail large government purchases of bitcoin and other cryptocurrencies; a March 3 announcement on that subject briefly sent bitcoin prices soaring by nearly 20%, though they promptly fell back.
Then there’s TMTG’s relationship with Crypto.com, a Singapore-based crypto “service provider” best known to Angelenos unfamiliar with the crypto world as the firm with naming rights to the Los Angeles arena that hosts the NBA Lakers and Clippers, WNBA Sparks and NHL Kings.
In August, Crypto.com and TMTG announced a deal in which TMTG would pursue a crypto treasury strategy consisting mostly of Cronos tokens, a cryptocurrency sponsored by Crypto.com. The initial infusion would consist of 6.4 billion Cronos valued at $1 billion, or about 15.8 cents per Cronos.
As of Dec. 31, TMTG said in its 10-K, it owned 756.1 million Cronos, acquired at a cost of about $114 million, or 15 cents each. By year’s end, they were worth only about nine cents each, for a paper loss of about $46 million. In trading this week, Cronos was quoted at about 7.6 cents, producing a paper loss for TMTG of about $56.5 million, or roughly half the investment.
The financial maneuvering involved in this trade is a little dizzying. The initial transaction was a 50% stock, 50% cash trade in which Crypto.com bought $50 million in TMTG stock and TMTG bought $105 million in Cronos. Who gained in this deal? It’s almost impossible to say.
Crypto.com did gain, if not purely in cash, then arguably through the Trump administration’s good graces.
On March 27, the SEC formally closed an investigation of the company that it had launched during the Biden administration, when the agency was headed by a known crypto skeptic, Gary Gensler. Trump appointed a crypto-friendly regulator, Paul Atkins, as Gensler’s successor.
It’s reasonable to note that as a business model, crypto treasuries have been in vogue over the last year or so, allowing investors to play the crypto market without all the complexities of actually buying and holding the digital assets by buying shares in treasury companies.
I asked Crypto.com whether the steady decline in Cronos’ price suggested that the hookup with TMTG wasn’t bearing fruit. “The fluctuation in value during this time period is consistent with the entire crypto market, which is typical in a bear market,” company spokeswoman Victoria Davis told me by email.
Davis also asserted that the SEC’s investigation of the company had been closed by Gensler, “not the current administration” (i.e., Trump). That’s misleading, at best. Gensler put the investigation on hold after the 2024 election, when it became clear that Trump was going to be in charge.
Crypto.com’s March 27 announcement of the formal end of the case attributed the action to “the current SEC leadership” and blamed the case on “the previous administration.” I asked Davis to explain the discrepancy but got no reply.
TMTG, like Crypto.com, attributed the decline in Cronos’ value to the secular bear market raging in the entire cryptocurrency space, a reflection of “temporary price swings across the crypto market,” said TMTG spokeswoman Shannon Devine. She said the price decline “will not diminish our enthusiasm for the enormous potential of the [CRONOS] ecosystem.”
Trump’s coziness with crypto companies hasn’t gone unnoticed by Democrats on the House Judiciary Committee, who issued a scathing report on the topic in November. (The White House scoffed at the report, saying in response to the report that Trump “only acts in the best interests of the American public.”)
In mid-December, TMTG launched yet another remaking — this time, plunging into the business of fusion power. The instrument is TAE Technologies, a Foothill Ranch-based company working to develop the technology of nuclear fusion as a clean energy source. According to a Dec. 18 announcement, TMTG and TAE will merge, creating what they say is a $6-billion company.
According to the announcement, TMTG will contribute $200 million to the merged company when the deal closes in mid-2026, and an additional $100 million subsequently. Following the merger, TMTG said last month, it will consider spinning off Truth Social into a new publicly traded company.
These arrangements are murky. TAE is privately held and the value of Truth Social is conjectural at best, so TMTG shareholders could be hard-pressed to assess their gains or losses from the merger and spin-off.
What makes them even murkier is the speculative nature of fusion as an electrical power source. Although numerous companies have leaped into the field — and TAE, which has been backed by Alphabet, the parent of Google, is among the oldest — none has shown the capability of generating electrical power at commercial scale with the elusive technology.
Although some researchers say that fusion could become a technically and economically feasible power source within 10 years, only in 2022 did fusion researchers (at Lawrence Livermore National Laboratory) achieve the goal of using fusion to produce more energy than is required to sustain a reaction. They were able to do so only for less than a billionth of a second.
Others working on the technology have expressed doubts that fusion could become a viable power source before the 2040s. The technical challenges, including how to convert the energy produced by a fusion reactor into electricity, remain daunting.
All this points to the fundamental question of what TMTG is supposed to be. TMTG’s original mission, according to its own publicity statements, was to build Truth Social into an alternative social media platform “to end Big Tech’s assault on free speech by opening up the Internet.”
Spinning off Truth Social would place that goal on the side. TMTG is on its way too becoming a hodgepodge of crypto, fusion and other investments selected without regard to whether they fit together or are even achievable. The only constant is Trump himself.
If you want to invest in him, TMTG may be the best way to do it. But judging from its latest financial disclosure, that’s not the same as being a good way to do it.
-
World6 days agoExclusive: DeepSeek withholds latest AI model from US chipmakers including Nvidia, sources say
-
Massachusetts7 days agoMother and daughter injured in Taunton house explosion
-
Denver, CO7 days ago10 acres charred, 5 injured in Thornton grass fire, evacuation orders lifted
-
Louisiana1 week agoWildfire near Gum Swamp Road in Livingston Parish now under control; more than 200 acres burned
-
Oregon5 days ago2026 OSAA Oregon Wrestling State Championship Results And Brackets – FloWrestling
-
Florida3 days agoFlorida man rescued after being stuck in shoulder-deep mud for days
-
Maryland3 days agoAM showers Sunday in Maryland
-
Culture1 week agoTry This Quiz on Thrilling Books That Became Popular Movies