Connect with us

Technology

Meta got caught gaming AI benchmarks

Published

on

Meta got caught gaming AI benchmarks

Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.”

Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. In Meta’s press release, the company highlighted Maverick’s ELO score of 1417, which placed it above OpenAI’s 4o and just under Gemini 2.5 Pro. (A higher ELO score means the model wins more often in the arena when going head-to-head with competitors.)

The achievement seemed to position Meta’s open-weight Llama 4 as a serious challenger to the state-of-the-art, closed models from OpenAI, Anthropic, and Google. Then, AI researchers digging through Meta’s documentation discovered something unusual.

In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public. According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality,” TechCrunch first reported.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. “Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“

Advertisement

A spokesperson for Meta, Ashley Gabriel, said in an emailed statement that “we experiment with all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” Gabriel said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

While what Meta did with Maverick isn’t explicitly against LMArena’s rules, the site has shared concerns about gaming the system and taken steps to “prevent overfitting and benchmark leakage.” When companies can submit specially-tuned versions of their models for testing while releasing different versions to the public, benchmark rankings like LMArena become less meaningful as indicators of real-world performance.

”It’s the most widely respected general benchmark because all of the other ones suck,” independent AI researcher Simon Willison tells The Verge. “When Llama 4 came out, the fact that it came second in the arena, just after Gemini 2.5 Pro — that really impressed me, and I’m kicking myself for not reading the small print.”

Shortly after Meta released Maverick and Scout, the AI community started talking about a rumor that Meta had also trained its Llama 4 models to perform better on benchmarks while hiding their real limitations. VP of generative AI at Meta, Ahmad Al-Dahle, addressed the accusations in a post on X: “We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”

Advertisement

“It’s a very confusing release generally.”

Some also noticed that Llama 4 was released at an odd time. Saturday doesn’t tend to be when big AI news drops. After someone on Threads asked why Llama 4 was released over the weekend, Meta CEO Mark Zuckerberg replied: “That’s when it was ready.”

“It’s a very confusing release generally,” says Willison, who closely follows and documents AI models. “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s path to releasing Llama 4 wasn’t exactly smooth. According to a recent report from The Information, the company repeatedly pushed back the launch due to the model failing to meet internal expectations. Those expectations are especially high after DeepSeek, an open-source AI startup from China, released an open-weight model that generated a ton of buzz.

Ultimately, using an optimized model in LMArena puts developers in a difficult position. When selecting models like Llama 4 for their applications, they naturally look to benchmarks for guidance. But as is the case for Maverick, those benchmarks can reflect capabilities that aren’t actually available in the models that the public can access.

Advertisement

As AI development accelerates, this episode shows how benchmarks are becoming battlegrounds. It also shows how Meta is eager to be seen as an AI leader, even if that means gaming the system.

Update, April 7th: The story was updated to add Meta’s statement.

Technology

Two of my favorite color e-book readers are the cheapest they’ve been in months

Published

on

Two of my favorite color e-book readers are the cheapest they’ve been in months

Color isn’t essential in an e-reader, but let’s be honest, it’s a nice perk that can bring digital books, magazines, comics, cookbooks, and other publications to life. The catch is that color ebook readers tend to be substantially pricier, which makes today’s deals stand out. Right now, the Kindle Colorsoft (16GB) and Kobo Libra Colour are matching their lowest prices to date, with the Amazon e-reader going for $169.99 ($80 off) at Amazon and Best Buy, and the Libra Colour going for $199.99 ($30 off) via Rakuten’s online storefront.

At their core, both are excellent e-readers with 7-inch, 300ppi E Ink displays, which drop to 150ppi when viewing color. The Colorsoft’s display is slightly more vibrant in most instances, but the difference isn’t dramatic. Each also offers IPX8 water resistance, so you don’t need to worry about spills and can comfortably read in the bath or by the pool.

Which one makes more sense for you largely depends on where you buy your books, how much storage you need, and whether you like to take notes. The Colorsoft is great if you’re heavily embedded in Amazon’s ecosystem, as buying and accessing Kindle books is intuitive and doesn’t require any sideloading. As the more affordable option in Amazon’s lineup, the standard Colorsoft delivers a nearly identical reading experience to the Signature Edition, and it supports Amazon’s “Send to Alexa Plus” feature, which lets you send notes or documents to Amazon’s AI-powered assistant for summaries, to-do lists, reminders, and more. The downside is that it lacks wireless charging and an auto-adjusting front light — which are standard on the step-up model — and comes with 16GB of storage instead of 32GB.

That said, if I didn’t already own so many Kindle books, the Libra Colour would be my pick. It offers double the storage at 32GB and includes intuitive physical page-turn buttons. You can also write notes while reading, given that it offers stylus support, and it includes built-in notebook templates, as well as the ability to convert handwriting to typed text. It also supports EPUB and a wider range of file formats, and lets you save articles for offline reading with Instapaper. And it also offers adjustable warm lighting, which makes reading at night a little easier on the eyes.

Continue Reading

Technology

Robot plays tennis with humans in real time

Published

on

Robot plays tennis with humans in real time

NEWYou can now listen to Fox News articles!

A humanoid robot is now rallying tennis shots with a human in real time. It runs without a script or remote control, so it can react instantly on a tennis court.

The robot stands about 4 feet tall, giving it a compact, human-like frame.  Galbot Robotics released a video showing its robot going shot-for-shot with a human player. The system behind it is called LATENT and runs on the Unitree G1.

And it is not just returning the ball. It is moving, adjusting and competing during live play.

Sign up for my FREE CyberGuy Report
Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you’ll get instant access to my Ultimate Scam Survival Guide – free when you join my CYBERGUY.COM newsletter.

Advertisement

CHINA’S COMPACT HUMANOID ROBOT SHOWS OFF BALANCE AND FLIPS
 

A humanoid robot rallies tennis shots with a human in real time, reacting without scripts or remote control during live play. (Galbot Robotics)

Why this tennis robot is different from others

Most athletic robots you have seen follow scripts. They perform pre-programmed actions or rely on a remote control. This one operates differently. It reacts to a human opponent in real time, tracking fast-moving balls, shifting across the court and returning shots with surprising accuracy. It also adjusts to changing trajectories and unpredictable shots during rallies. Researchers say it can sustain long rallies with millisecond-level reactions and full-body coordination. That marks a major step forward.

How the AI learned to play tennis

Training a robot to play tennis is extremely complex. Tennis involves:

  • Tennis ball speeds can reach up to 67 miles per hour
  • Split-second racket contact
  • Constant movement across a large court

Capturing complete human gameplay data is difficult. So the researchers used a different method.

Training the robot using motion fragments

Instead of recording full matches, they focused on small segments of movement:

Advertisement
  • Forehands
  • Backhands
  • Side steps

They gathered about five hours of motion data from five players. The sessions took place on a compact 10-by-16-foot court. That space is more than 17 times smaller than a standard tennis court.

RESTAURANT ROBOT GOES HAYWIRE, SENDS TABLEWARE FLYING BEFORE BREAKING OUT IN DANCE MOVES
 

Humanoid robots designed by Galbot Robotics select items from a shelf at the Shanghai New Expo Center in Shanghai, China, on July 26, 2025. Galbot Robotics also designed the tennis-playing robot that learns movement fragments and applies them in live competition. (Ying Tang/NurPhoto via Getty Images)

How the robot plays tennis during live rallies

The system first learns individual movements. Then it combines them into coordinated sequences. That allows the robot to:

  • Move toward the ball
  • Strike it with control
  • Recover and reposition

To improve performance, the team trained the model in simulation. They varied physical conditions such as mass, friction and aerodynamics. This helps the robot adapt to real-world unpredictability. As a result, the system responds dynamically instead of following a fixed routine. 

How well does it actually perform against humans?

In testing, the system achieved up to 96% success on forehand shots in simulation. In real-world trials, the robot can sustain rallies with a human and consistently return the ball across the net.

Advertisement

Watching the demo, it appears competitive. At times, the robot places shots away from the human player. That suggests more than a simple reaction. It points toward early forms of decision-making.

There are still limits. The robot can look unstable at times. Its motion is not yet as fluid as a trained athlete. High or unpredictable shots may still present challenges. Even so, the progress is clear.

Why this matters beyond tennis

This breakthrough goes far beyond tennis. It shows how robots can learn complex human skills without perfect data. The same approach could apply to:

  • Football
  • Badminton
  • Industrial work
  • Search and rescue

Any task that lacks complete motion data could benefit from this method. That is the bigger picture.

WORLD’S FASTEST HUMANOID ROBOT RUNS 22 MPH
 

Advertisement

A robot dances at the launch ceremony of a Galbot Robotics retail store in Beijing, China, on August 7, 2025. The company has also designed a 4-foot robot that returns tennis shots with millisecond reactions and full-body coordination. (VCG/VCG via Getty Images)

Could robots compete with humans one day?

The path forward is becoming clearer. Today, the robot rallies. Next, it competes. In time, robots could train with or challenge professional athletes. Exhibition matches between humans and machines may become part of the sport. That future no longer feels far away.

Take my quiz: How safe is your online security?

Think your devices and data are truly protected? Take this quick quiz to see where your digital habits stand. From passwords to Wi-Fi settings, you’ll get a personalized breakdown of what you’re doing right and what needs improvement. Take my Quiz here: Cyberguy.com.

Kurt’s key takeaways

This demo shows how quickly things are changing. Robots are no longer stuck following scripts. They can now react, adjust and compete in real situations. What used to feel far off is starting to show up right in front of us.

Advertisement

So here is the question: If a robot could outplay you on the court, would you still want to compete, or would you rather train with it? Let us know by writing to us at Cyberguy.com.

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

Sign up for my FREE CyberGuy Report
Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you’ll get instant access to my Ultimate Scam Survival Guide – free when you join my CYBERGUY.COM newsletter.

Copyright 2026 CyberGuy.com.  All rights reserved.

Advertisement
Continue Reading

Technology

AI influencer awards season is upon us

Published

on

AI influencer awards season is upon us

First came the AI beauty pageant. Then the AI music contests. Now, there is an award for AI Personality of the Year — perhaps the inevitable next step for the AI influencer economy as it transforms from quirky novelty into a serious and lucrative industry.

The contest, a joint venture between generative AI studio OpenArt and AI-powered creator platform Fanvue, with backing from AI voice company ElevenLabs, opens on Monday and runs for a month. The organizers said it is intended to “celebrate the creative talent ‘behind’ AI Influencers” and recognize their growing commercial and cultural clout.

Contestants will compete for a total prize fund of $20,000, which will be split between an overall winner and individual categories of fitness, lifestyle, comedian, music and dance entertainer, and fictional cartoon, anime, or fantasy personality. Victors will be celebrated at an event in May that the organizers are dubbing the “‘Oscars’ for AI personalities.”

To enter, you must develop your AI influencer on OpenArt’s platform and submit it at www.AIpersonality.ai. You’ll be asked for social media handles across TikTok, X, YouTube, and Instagram, as well as the story behind the character, your motivations for creating it, and details of any brand work.

Among those assessing contestants are 13‑time Emmy‑winning comedy writer Gil Rief, the creators of Spanish AI model Aitana Lopez, and Christopher “Topher” Townsend, the MAGA rapper behind AI-generated gospel singer Solomon Ray. According to a copy of the judges’ briefing seen by The Verge, contestants will be scored on four criteria: quality, social clout, brand appeal, and the inspiration behind the avatar. Specific points include reliably engaging with followers, portraying a consistent look across social channels, accurate details like having the “right number of fingers and thumbs,” and having “an authentic narrative” behind the avatar.

Advertisement

The contest is open to established creators and novices alike, though existing AI influencers will still need to submit material produced on OpenArt’s platform, Matt Jones, head of brand at Fanvue, told The Verge.

Despite being designed to celebrate creators of virtual influencers, Jones said that entrants don’t need to publicly identify themselves. “If a person who created this amazing piece of work wants nothing to do with the press or to expose themselves or to have their name out there, that’s obviously fine,” he said. “There would be no need to thrust anybody into the limelight here. We would just celebrate the piece of work.”

That creators can remain anonymous feels odd for a contest judging authenticity, particularly in an AI influencer ecosystem built on fictional people, fake personas, and fabricated backstories. That same anonymity has also helped grifts flourish with little accountability, from the AI white nationalist rapper Danny Bones to MAGA fantasy girl Jessica Foster.

There’s familiar baggage too, including persistent questions about originality, whether AI-generated work, or even a likeness, has been lifted from real creators, and whether these tools simply reproduce the same old biases in synthetic form. Organizer Fanvue has already faced criticism for this in the past: in 2024, a Guardian columnist described its “Miss AI” beauty pageant as something that “take(s) every toxic gendered beauty norm and bundle(s) them up into a completely unrealistic package.”

To Fanvue’s Jones, creators inevitably leave something of themselves in the AI characters they make. “You can’t help but put a little bit of yourself into the stories that you tell and the characters that you make,” he said, urging creators to “lean into that.” The idea feels at home in the influencer economy: not strictly real, but a form of synthetic authenticity the internet already knows how to handle.

Advertisement
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Continue Reading

Trending