Technology

Meta got caught gaming AI benchmarks

Published

1 year ago

April 8, 2025

Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.”

Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. In Meta’s press release, the company highlighted Maverick’s ELO score of 1417, which placed it above OpenAI’s 4o and just under Gemini 2.5 Pro. (A higher ELO score means the model wins more often in the arena when going head-to-head with competitors.)

The achievement seemed to position Meta’s open-weight Llama 4 as a serious challenger to the state-of-the-art, closed models from OpenAI, Anthropic, and Google. Then, AI researchers digging through Meta’s documentation discovered something unusual.

In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public. According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality,” TechCrunch first reported.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. “Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“

A spokesperson for Meta, Ashley Gabriel, said in an emailed statement that “we experiment with all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” Gabriel said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

While what Meta did with Maverick isn’t explicitly against LMArena’s rules, the site has shared concerns about gaming the system and taken steps to “prevent overfitting and benchmark leakage.” When companies can submit specially-tuned versions of their models for testing while releasing different versions to the public, benchmark rankings like LMArena become less meaningful as indicators of real-world performance.

”It’s the most widely respected general benchmark because all of the other ones suck,” independent AI researcher Simon Willison tells The Verge. “When Llama 4 came out, the fact that it came second in the arena, just after Gemini 2.5 Pro — that really impressed me, and I’m kicking myself for not reading the small print.”

Shortly after Meta released Maverick and Scout, the AI community started talking about a rumor that Meta had also trained its Llama 4 models to perform better on benchmarks while hiding their real limitations. VP of generative AI at Meta, Ahmad Al-Dahle, addressed the accusations in a post on X: “We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”

“It’s a very confusing release generally.”

Some also noticed that Llama 4 was released at an odd time. Saturday doesn’t tend to be when big AI news drops. After someone on Threads asked why Llama 4 was released over the weekend, Meta CEO Mark Zuckerberg replied: “That’s when it was ready.”

“It’s a very confusing release generally,” says Willison, who closely follows and documents AI models. “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s path to releasing Llama 4 wasn’t exactly smooth. According to a recent report from The Information, the company repeatedly pushed back the launch due to the model failing to meet internal expectations. Those expectations are especially high after DeepSeek, an open-source AI startup from China, released an open-weight model that generated a ton of buzz.

Ultimately, using an optimized model in LMArena puts developers in a difficult position. When selecting models like Llama 4 for their applications, they naturally look to benchmarks for guidance. But as is the case for Maverick, those benchmarks can reflect capabilities that aren’t actually available in the models that the public can access.

As AI development accelerates, this episode shows how benchmarks are becoming battlegrounds. It also shows how Meta is eager to be seen as an AI leader, even if that means gaming the system.

Update, April 7th: The story was updated to add Meta’s statement.

Technology

Govee’s new LED Lightwall comes with its own self-standing frame

Published

6 hours ago

April 16, 2026

Press Room

Govee’s new LED Lightwall comes with its own self-standing frame

Govee has announced an upgraded version of its hanging Curtain Lights Pro that can instead be used nearly anywhere you have access to an outlet or large battery. At $449.99, Govee’s new Lightwall is more than twice as expensive as the $199.99 Curtain Lights Pro, but comes with more LEDs in a denser array and a self-standing aluminum frame that can be assembled in 10 to 15 minutes without the need for any tools.

When hung from its stand the Lightwall measures 7.9 feet wide and 5.3 feet tall and features 1,536 color-changing LEDs spaced about 1.96 inches apart in a 48 x 32 grid. It’s water-resistant, and with the ability to refresh at up to 35fps the Lightwall almost sounds like it could be used as a personal backyard Jumbotron, but it’s not designed for watching TV or movies.

The Lightwall instead connects to Govee’s Home app where you can select from over 200 preset scenes and simple animations, choose from 10 different music modes that generate lighting patterns matched to beats, or synchronize its colors to other Govee lighting products to create a cohesive mood.

The app can also use AI to create custom animated GIFs from simple text prompts, or you can take matters into your own hands and create custom designs by sketching in the app with your finger and stacking up to 30 layers of doodles. The Lightwall is smart home compatible and supports Matter, too, so in addition to managing it through Govee’s app you can control it using voice commands through smart devices with Google Assistant or Amazon Alexa.

Technology

Roblox adds age-based accounts for kids and teens

Published

6 hours ago

April 16, 2026

Press Room

Roblox adds age-based accounts for kids and teens

‘Fox & Friends’ exclusive: Roblox CEO announces new safety measures for kids

Roblox Co-founder and CEO Dave Baszucki details new safety measures, including Kids and Select accounts, on Fox & Friends. He addresses lawsuits and concerns about predators, emphasizing age verification, content filtering, and strict communication controls to protect users. Baszucki states Roblox has “no tolerance” for bad actors and builds safety by default, allowing parents to customize chat settings for their children.

NEWYou can now listen to Fox News articles!

If your child plays Roblox, they are part of a massive global audience. Roblox has reported more than 144 million daily active users, with a large share made up of kids and teens who log in to play games, create content and connect with friends. That reach is exactly why a new change rolling out in early June matters.

Roblox is introducing two new account types designed to better match what kids play and who they can talk to based on age. The shift centers on structure. Instead of one shared experience with layered controls, Roblox is building separate environments for different age groups. As a result, content, chat and parental controls will adjust automatically as a child grows.

Sign up for my FREE CyberGuy Report

Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox.
For simple, real-world ways to spot scams early and stay protected, visit CyberGuy.com trusted by millions who watch CyberGuy on TV daily.
Plus, you’ll get instant access to my Ultimate Scam Survival Guide free when you join.

OPENAI TIGHTENS AI RULES FOR TEENS BUT CONCERNS REMAIN

Roblox rolls out a new AI system that analyzes entire scenes in real time to detect harmful content across its platform. (Brent Lewin/Bloomberg via Getty Images)

What are Roblox Kids and Roblox Select accounts?

Roblox is dividing younger users into two groups, each with its own rules and experience.

Roblox Kids (ages 5 to 8)

This is the most restricted environment. It is designed for younger children who need tighter guardrails.

Access limited to games rated Minimal or Mild
Only games that pass a three-step review process
Chat is turned off by default
A distinct visual design so parents can easily recognize the account

The idea here is simple. Kids see a limited version of Roblox that removes riskier content and disables communication.

Roblox Select (ages 9 to 15)

AUSTRALIA REMOVES 4.7M KIDS FROM SOCIAL MEDIA PLATFORMS IN FIRST MONTH OF HISTORIC BAN

This group gets more flexibility, but still within limits.

Access to games rated up to Moderate
Same multi-step game screening process
Chat settings remain on by default in most regions
Visual indicators show the account type

At this stage, Roblox assumes users can handle a broader range of experiences, but still keeps filters in place.

How Roblox decides what games kids can play

Not every game makes the cut. Roblox is adding a continuous evaluation system that runs behind the scenes. Here’s how it works:

1) Developer verification

Creators must verify their identity, enable two-step security and maintain a Roblox Plus subscription.

2) Real-time evaluation

Older users, age 16 and up, effectively test new games first. Roblox studies how they interact and reviews reports before exposing those games to younger players.

3) Content eligibility check

Games receive maturity ratings such as Minimal, Mild or Moderate. Certain categories, like social hangouts or free-form drawing, are excluded by default for younger users. This layered approach combines AI moderation, human review and real-world gameplay signals.

Age checks now control the entire experience

Roblox is expanding the same age-check system it introduced earlier this year for chat.

Users under 9 Roblox Kids
Users 9 to 15 Roblox Select
Users 16 and older standard with Roblox account

If a user does not complete an age check, they face stricter limits. They can only access lower-rated games and cannot use chat. Once verified, the system automatically moves them into the correct account type.

Roblox officials say the new system aims to proactively protect children while maintaining gameplay for compliant users. (Riccardo Milani/Hans Lucas/AFP via Getty Images)

Accounts evolve as kids grow

There is no need to manually switch settings over time.

At age 9, users move from Kids to Select
At age 16, they move to a standard account

This automatic progression is designed to simplify things for families while keeping protections in place at each stage.

Parental controls get more precise

Roblox is also expanding what parents can do.

Block specific games through age 15
Manage direct chat settings until age 15
Approve access to individual games outside default limits
View what games kids play and who they interact with

These tools give parents more direct control instead of relying only on broad content filters.

A move toward global content ratings

Later this year, Roblox plans to align with the International Age Rating Coalition framework. That includes familiar systems like ESRB in the U.S. and PEGI in Europe. The goal is to make ratings clearer and more consistent across regions.

Why this matters to families

This update changes how Roblox works at a fundamental level. Instead of asking parents to constantly adjust settings, the platform builds age-appropriate experiences from the start. It also reflects a broader shift in tech. Platforms are under pressure to design safety into the product, not tack it on later.

As Larry Magid, CEO of ConnectSafely, an organization focused on helping families navigate digital safety, put it:

“By combining age assurance, stronger creator accountability, and parental controls, Roblox is helping set a higher standard for how platforms can better protect younger users while preserving positive online experiences.”

Take my quiz: How safe is your online security?

Think your devices and data are truly protected? Take this quick quiz to see where your digital habits stand. From passwords to Wi-Fi settings, you’ll get a personalized breakdown of what you’re doing right and what needs improvement. Take my Quiz here: Cyberguy.com.

Kurt’s key takeaways

Roblox targets nuanced rule-breaking by analyzing avatars, text and environments together instead of in isolation. (JasonDoiy/Getty Images)

Roblox is not removing risk entirely. No platform can. What it is doing is tightening the structure around how kids interact with content and other players. For parents, this could make things simpler. For kids, the experience will feel more tailored to where they are in life. The bigger question is whether this becomes the norm across gaming and social platforms.

If platforms start shaping experiences based on age by default, does that improve safety or limit how kids explore and learn online? Let us know by writing to us at Cyberguy.com.

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

Sign up for my FREE CyberGuy Report

Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox.
For simple, real-world ways to spot scams early and stay protected, visit CyberGuy.com trusted by millions who watch CyberGuy on TV daily.
Plus, you’ll get instant access to my Ultimate Scam Survival Guide free when you join.

Kurt “CyberGuy” Knutsson is an award-winning tech journalist who has a deep love of technology, gear and gadgets that make life better with his contributions for Fox News & FOX Business beginning mornings on “FOX & Friends.” Got a tech question? Get Kurt’s free CyberGuy Newsletter, share your voice, a story idea or comment at CyberGuy.com.

Technology

YouTube now lets you turn off Shorts

Published

18 hours ago

April 15, 2026

Press Room

YouTube’s time management settings now have an option to put a zero-minute time limit on Shorts, effectively removing them from your app in Android and iOS. The option is an update to the Shorts timer YouTube originally announced in October; the lowest previous option was 15 minutes.

The feature was expanded in January to give parents some control over how long their kids spend scrolling through Shorts, with an option for zero minutes “coming soon.” According to YouTube spokesperson Makenzie Spiller, the option to set the timer to zero is now “live for all parents, and is currently being rolled out to everyone,” including users with regular adult accounts.

Regardless of age, it can be a handy tool for anyone who wants to spend a little less time scrolling. The Shorts tab won’t show any videos once you hit your limit, just a notification that you’ve “reached your Shorts feed limit.” In our tests, hitting the time limit also removes Shorts from the Home screen, so by setting the timer to zero you can ignore Shorts entirely if you want. To turn on the timer, go to the settings in the YouTube app and select “time management” then toggle on the Shorts feed limit and select a time for it.