Technology

Meta got caught gaming AI benchmarks

Published

10 months ago

April 8, 2025

Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.”

Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. In Meta’s press release, the company highlighted Maverick’s ELO score of 1417, which placed it above OpenAI’s 4o and just under Gemini 2.5 Pro. (A higher ELO score means the model wins more often in the arena when going head-to-head with competitors.)

The achievement seemed to position Meta’s open-weight Llama 4 as a serious challenger to the state-of-the-art, closed models from OpenAI, Anthropic, and Google. Then, AI researchers digging through Meta’s documentation discovered something unusual.

In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public. According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality,” TechCrunch first reported.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. “Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“

A spokesperson for Meta, Ashley Gabriel, said in an emailed statement that “we experiment with all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” Gabriel said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

While what Meta did with Maverick isn’t explicitly against LMArena’s rules, the site has shared concerns about gaming the system and taken steps to “prevent overfitting and benchmark leakage.” When companies can submit specially-tuned versions of their models for testing while releasing different versions to the public, benchmark rankings like LMArena become less meaningful as indicators of real-world performance.

”It’s the most widely respected general benchmark because all of the other ones suck,” independent AI researcher Simon Willison tells The Verge. “When Llama 4 came out, the fact that it came second in the arena, just after Gemini 2.5 Pro — that really impressed me, and I’m kicking myself for not reading the small print.”

Shortly after Meta released Maverick and Scout, the AI community started talking about a rumor that Meta had also trained its Llama 4 models to perform better on benchmarks while hiding their real limitations. VP of generative AI at Meta, Ahmad Al-Dahle, addressed the accusations in a post on X: “We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”

“It’s a very confusing release generally.”

Some also noticed that Llama 4 was released at an odd time. Saturday doesn’t tend to be when big AI news drops. After someone on Threads asked why Llama 4 was released over the weekend, Meta CEO Mark Zuckerberg replied: “That’s when it was ready.”

“It’s a very confusing release generally,” says Willison, who closely follows and documents AI models. “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s path to releasing Llama 4 wasn’t exactly smooth. According to a recent report from The Information, the company repeatedly pushed back the launch due to the model failing to meet internal expectations. Those expectations are especially high after DeepSeek, an open-source AI startup from China, released an open-weight model that generated a ton of buzz.

Ultimately, using an optimized model in LMArena puts developers in a difficult position. When selecting models like Llama 4 for their applications, they naturally look to benchmarks for guidance. But as is the case for Maverick, those benchmarks can reflect capabilities that aren’t actually available in the models that the public can access.

As AI development accelerates, this episode shows how benchmarks are becoming battlegrounds. It also shows how Meta is eager to be seen as an AI leader, even if that means gaming the system.

Update, April 7th: The story was updated to add Meta’s statement.

Technology

Super Bowl LX ads: all AI everything

Published

9 hours ago

February 7, 2026

Press Room

Super Bowl LX is nearly here, with the Seattle Seahawks taking on the New England Patriots. While Bad Bunny will be the star of the halftime show, AI could be the star of the commercial breaks, much like crypto was a few years ago.

Super Bowl LX is set to kick off at 6:30PM ET/3:30PM PT on Sunday, February 8th at Levi’s Stadium in Santa Clara, California.

Technology

How to protect a loved one’s identity after death

Published

9 hours ago

February 7, 2026

Press Room

How to protect a loved one’s identity after death

NEWYou can now listen to Fox News articles!

When someone you love dies, the to-do list can feel endless. There are legal steps, financial paperwork and emotional weight all happening at once. What many families do not realize is that identity protection rarely makes those lists, even though it should.

Scammers actively target the identities of people who have died. They rely on delays, data gaps and the assumption that someone else is handling it. Janet from Indiana recently reached out with a question many families quietly worry about but rarely ask.

My husband just passed away in December. There are lists upon lists of things to do to wrap up his estate, but nothing that tells me how to lock down his identity now that he’s gone so that fraudsters cannot use it. Maybe our government is efficient enough to report to all of the credit bureaus that he is deceased, but I don’t want to bet my financial security on it. We both have our credit frozen with all three agencies, but is there more that I should do? Thank you.
— Janet in Indiana

Janet’s instincts are exactly right. The system often does not work as cleanly as people expect.

Sign up for my FREE CyberGuy Report
Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox. Plus, you’ll get instant access to my Ultimate Scam Survival Guide – free when you join my CYBERGUY.COM newsletter.

MICROSOFT CROSSES PRIVACY LINE FEW EXPECTED

Scammers often look for recently deceased names because they know systems do not update instantly and families are overwhelmed. (Kurt “CyberGuy” Knutsson)

What the government and credit bureaus do and don’t do

When someone dies, Social Security is usually notified by the funeral home. That step helps, but it does not automatically secure a person’s financial identity.

Here is what often surprises families:

Credit bureaus are not synchronized in real time
A death notice does not instantly stop fraud attempts
Scammers specifically target recently deceased individuals
Gaps between systems create opportunities for misuse

In short, relying on automation alone leaves room for problems.

AI DEEPFAKE ROMANCE SCAM STEALS WOMAN’S HOME AND LIFE SAVINGS

Credit freezes and alerts help, but they do not stop every attempt to misuse personal information after a death. (Kurt “CyberGuy” Knutsson)

What you’ve already done right

Before adding more steps, it matters to acknowledge what Janet already did correctly.

Credit freezes with all three bureaus
Early awareness of identity risks
Taking action before fraud appears

When speed matters, credit locks — different from freezes — give you instant on/off control. That combination puts someone well ahead of most families.

Steps to protect a loved one’s identity after death

Once the immediate paperwork is underway, these practical steps help close the gaps scammers look for. None of them is super complicated, but together they create a much stronger layer of protection.

1) Add a deceased flag to credit files

Even with a credit freeze in place, this step adds another layer of protection that lenders see immediately.

Contact Equifax, Experian and TransUnion and ask them to mark the credit file as deceased. Each bureau may request:

A copy of the death certificate

Proof that you are the surviving spouse or executor

Once the flag is added, fraudulent applications become much harder to process because lenders are alerted upfront. A credit lock provides the same blocking effect, but with real-time control; this can matter when you’re managing a deceased estate or responding quickly to lender requests.

2) Monitor identity activity while you manage everything else

This is where many checklists fall short. Credit freezes and deceased flags help, but identity misuse can still surface in other ways.

Fraud attempts may appear as:

Account takeovers
Unauthorized credit inquiries
Use of personal data outside traditional credit

That is why ongoing monitoring still matters.

Why identity theft protection helps at this stage

Identity theft protection focuses on identity protection rather than just credit scores, which makes it especially useful after a loss.

Monitors for misuse tied to your loved one’s information
Sends alerts if something suspicious appears
Includes fraud support if action is needed
Reduces the burden of constant manual checks

One of the best parts of my pick for top identity theft service is its all-in-one approach to safeguarding your personal and financial life. It includes identity theft insurance of up to $1 million per adult to cover eligible losses and legal fees, plus 24/7 U.S.-based fraud resolution support with dedicated case managers ready to help restore your identity fast. It also combines three-bureau credit monitoring with an instant credit lock that lets you quickly lock down your Experian file right from the app.

See my tips and best picks on how to protect yourself from identity theft at Cyberguy.com.

3) Secure sensitive documents during estate administration

Estate administration often requires sharing paperwork, which is where identity leaks can happen.

Lock down and limit access to:

Death certificate copies
Social Security numbers
Old tax returns
Insurance and pension records

Only share what is required and keep track of where documents go.

MILLIONS OF AI CHAT MESSAGES EXPOSED IN APP DATA LEAK

A man types on a laptop. (Kurt “CyberGuy” Knutsson)

4) Watch mail and phone calls for warning signs

Small signals often reveal fraud attempts early.

Pay close attention to:

Bills or collection notices in their name
Credit card or loan offers
Bank or government letters you did not expect
Calls asking to verify personal information

If something feels off, pause before responding and verify the source independently.

Kurt’s key takeaways

Protecting a loved one’s identity after death is one more responsibility no one prepares you for. It is not about mistrusting the system. It is about protecting yourself during a time when you are already carrying enough. Janet’s question reflects what many families experience quietly. Identity protection does not end when life does, and scammers know that grief creates gaps. Taking a few extra steps now can spare you months or even years of stress later. You are not being overly cautious. You are being careful at a moment when the system does not always move fast enough to keep up with real life.

If you have handled an estate or are planning ahead, have you taken steps to protect a loved one’s identity after death, or is this something you are just learning about now? Let us know by writing to us at Cyberguy.com.

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

Kurt “CyberGuy” Knutsson is an award-winning tech journalist who has a deep love of technology, gear and gadgets that make life better with his contributions for Fox News & FOX Business beginning mornings on “FOX & Friends.” Got a tech question? Get Kurt’s free CyberGuy Newsletter, share your voice, a story idea or comment at CyberGuy.com.

Technology

Apple might let you use ChatGPT from CarPlay

Published

21 hours ago

February 7, 2026

Press Room

Apple might let you use ChatGPT from CarPlay

CarPlay users could soon be able to use their chatbot of choice instead of Siri. As Bloomberg reports, Apple is working to add support for CarPlay voice control apps from OpenAI, Anthropic, Google, and others. Previously, users who wanted to access third-party chatbots in the car would need to go through their iPhone, but soon they may be able to talk with ChatGPT, Claude, or Gemini directly in CarPlay.

However, Apple reportedly “won’t let users replace the Siri button on CarPlay or the wake word that summons the service.” So, users will need to manually open their preferred chatbot’s app. Developers will be able to set their apps to automatically start voice mode whenever they’re opened, though, which could help streamline the experience.

According to Bloomberg, the addition of third-party chatbots in CarPlay could roll out “within the coming months,” but hasn’t been officially announced yet. The rumored update follows Apple’s announcement last month that Google Gemini will power an updated version of Siri, which is slated to arrive sometime this year.