Technology

Meta got caught gaming AI benchmarks

Published

1 year ago

April 8, 2025

Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.”

Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. In Meta’s press release, the company highlighted Maverick’s ELO score of 1417, which placed it above OpenAI’s 4o and just under Gemini 2.5 Pro. (A higher ELO score means the model wins more often in the arena when going head-to-head with competitors.)

The achievement seemed to position Meta’s open-weight Llama 4 as a serious challenger to the state-of-the-art, closed models from OpenAI, Anthropic, and Google. Then, AI researchers digging through Meta’s documentation discovered something unusual.

In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public. According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality,” TechCrunch first reported.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. “Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“

A spokesperson for Meta, Ashley Gabriel, said in an emailed statement that “we experiment with all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” Gabriel said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

While what Meta did with Maverick isn’t explicitly against LMArena’s rules, the site has shared concerns about gaming the system and taken steps to “prevent overfitting and benchmark leakage.” When companies can submit specially-tuned versions of their models for testing while releasing different versions to the public, benchmark rankings like LMArena become less meaningful as indicators of real-world performance.

”It’s the most widely respected general benchmark because all of the other ones suck,” independent AI researcher Simon Willison tells The Verge. “When Llama 4 came out, the fact that it came second in the arena, just after Gemini 2.5 Pro — that really impressed me, and I’m kicking myself for not reading the small print.”

Shortly after Meta released Maverick and Scout, the AI community started talking about a rumor that Meta had also trained its Llama 4 models to perform better on benchmarks while hiding their real limitations. VP of generative AI at Meta, Ahmad Al-Dahle, addressed the accusations in a post on X: “We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”

“It’s a very confusing release generally.”

Some also noticed that Llama 4 was released at an odd time. Saturday doesn’t tend to be when big AI news drops. After someone on Threads asked why Llama 4 was released over the weekend, Meta CEO Mark Zuckerberg replied: “That’s when it was ready.”

“It’s a very confusing release generally,” says Willison, who closely follows and documents AI models. “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s path to releasing Llama 4 wasn’t exactly smooth. According to a recent report from The Information, the company repeatedly pushed back the launch due to the model failing to meet internal expectations. Those expectations are especially high after DeepSeek, an open-source AI startup from China, released an open-weight model that generated a ton of buzz.

Ultimately, using an optimized model in LMArena puts developers in a difficult position. When selecting models like Llama 4 for their applications, they naturally look to benchmarks for guidance. But as is the case for Maverick, those benchmarks can reflect capabilities that aren’t actually available in the models that the public can access.

As AI development accelerates, this episode shows how benchmarks are becoming battlegrounds. It also shows how Meta is eager to be seen as an AI leader, even if that means gaming the system.

Update, April 7th: The story was updated to add Meta’s statement.

Technology

A warrantless wiretap law is about to expire — but surveillance networks aren’t actually ‘going dark’

Published

3 hours ago

June 11, 2026

Press Room

A warrantless wiretap law is about to expire — but surveillance networks aren’t actually ‘going dark’

Congress has failed to pass a three-week extension of Section 702 of the Foreign Intelligence Surveillance Act (FISA), with the House voting 218-198 against reauthorizing the controversial warrantless wiretapping authority through July 2nd. After a short-term extension earlier this year, the spying program now appears set to lapse for at least a week. This is the nightmare scenario FISA’s proponents have been warning about — but it doesn’t actually mean the US has lost its surveillance capabilities.

Proponents of a clean extension claim a lapse will hinder intelligence agencies’ efforts to thwart potential terrorist attacks, with surveillance networks “going dark”. Sen. Tom Cotton (R-AR) stressed the importance of reauthorizing Section 702 ahead of the World Cup. House Speaker Mike Johnson (R-LA) has said even a brief lapse would be disastrous. “Democrats in the Senate are playing political games right now with the lives of Americans,” he told reporters Wednesday. “It’s a very dangerous situation.”

In March, the FISA court recertified surveillance under Section 702 until 2027. The Brennan Center for Justice notes that a lapse won’t allow telecom companies to flout requests to hand over communications information to the NSA and other spy agencies. In 2008, after Yahoo failed to comply with a Section 702 request during a lapse, the FISA court ruled that the directives issued under Section 702 are effective while the certification is in place — even in the event of a lapse.

“The phrase ‘going dark’ is significantly misleading,” Andrea Sawka Fiegl, the senior policy director for media and technology at Common Cause, said on a Tuesday press call. Fiegl added that companies don’t choose whether they participate in surveillance under Section 702. If they don’t comply after being served with a directive, they face fines starting at $250,000 a day.

“The ‘going dark’ framing is basically a pressure tactic designed to strip Congress of its leverage to negotiate reforms by creating this false binary,” Fiegl said. “There is ample time for Congress to consider and pass reforms.”

Among those reforms are a warrant requirement for queries involving US persons, including so-called “backdoor searches” in which intelligence agencies identify a foreign target with ties to a US person, and then search that person’s communications, thus granting them access to their desired US target. Reformers also want to prohibit intelligence agencies from buying Americans’ data from private brokers to get around warrant requirements.

“Every day that Section 702 is in effect without reforms is a day that Americans’ rights are under threat,” Sen. Ron Wyden (D-OR) said in a statement Wednesday night, after Senate Republicans blocked his request for a five-week extension of Section 702 with new transparency requirements. “If there is going to be an extension of these authorities, there needs to be some guardrails or at least some transparency that would allow Congress and the American people to understand the abuses that have taken place and the need for reforms.”

Though President Donald Trump and Republican leaders in both chambers have called for a clean reauthorization of Section 702, there’s bipartisan appetite for reform — and a handful of Republican holdouts stand in the way of a clean reauthorization. Most Democrats — even some who have supported reauthorization in the past — have objected to a clean extension due to Trump’s appointment of Bill Pulte as acting director of national intelligence.

Technology

12 biggest Apple WWDC 2026 takeaways you need to know

Published

3 hours ago

June 11, 2026

Press Room

12 biggest Apple WWDC 2026 takeaways you need to know

NEWYou can now listen to Fox News articles!

Apple used WWDC 2026, its annual developers conference, to lay out what is coming next for your iPhone, Mac, iPad, Apple Watch and Vision Pro. This year’s keynote also carried extra weight because it marked Tim Cook’s final WWDC as Apple CEO before John Ternus takes over in September.

Still, the biggest story for users was software. Apple put Siri AI and Apple Intelligence at the center of the keynote, while also announcing iOS 27 support for older iPhones, new child safety tools, faster performance and smarter features across everyday apps.

The updates range from big changes, like Siri AI, to smaller fixes that could still make a difference. You may notice them when your phone finds a photo faster, shares a file quicker or helps clean up a weak password.

Here are the 12 biggest takeaways from Apple’s WWDC 2026 keynote.

ARE APPLE DEVICES SPYING? WHAT YOUR IPHONE TRACKS

Apple CEO Tim Cook holds an iPhone 17 Pro and an iPhone Air during an event at the Steve Jobs Theater on Apple’s campus in Cupertino, Calif., on Sept. 9, 2025. (Manuel Orbegozo/Reuters)

Join CyberGuy Live: Lock Down Your Phone in 30 Minutes (This Saturday, June 13, 10 am ET)

Your phone holds your email, passwords, photos, banking apps and personal data. In this free, live online class, Kurt the CyberGuy will walk you step by step through simple phone security fixes you can do in real time. You’ll learn how to improve your privacy settings, spot the latest phone scams, use trusted security tools and walk away with a simple checklist to stay protected. Register here: CyberGuyLive.com

1) Siri AI is the biggest announcement

The headline from WWDC 2026 is Siri AI. Apple says it rebuilt Siri around Apple Intelligence so it can handle more complex requests and carry on longer conversations.

The new Siri still works in familiar ways, including “Hey Siri.” Apple also showed a dedicated Siri app where you can return to past conversations. That means a longer answer or planning session does not disappear after one interaction.

Siri can also sound a lot more expressive. Apple says you can customize Siri’s voice by adjusting its pace and expressivity until it feels right for you.

During the keynote, Apple showed Siri answering a question about a local concert. From there, Siri helped with tickets, created a reminder for the lottery opening and played a song from the artist.

Apple also showed Siri using what was already on the screen. In one demo, Siri identified a location along the Santa Cruz coast from an image. Then it found a friend’s address from Messages and helped create a route with a stop along the way.

In another example, Siri searched Photos for images from a recent trip. It narrowed the results to specific family members and added those photos to a shared family album.

On Mac, Apple showed Siri working inside Spotlight and context menus. Siri compared selected files, turned the information into a table and used details from Messages and Mail to help draft an email.

2) Apple Intelligence now has a Google connection

One of the most surprising moments came when Apple said it worked with Google on the next generation of Apple Foundation models.

Apple said it used technologies behind Google’s Gemini family of models to help create new models for Apple Intelligence. Those models are designed to run on-device and through Private Cloud Compute.

Apple is still presenting the experience as Apple Intelligence. Still, the Google connection is important. It shows Apple is willing to lean on outside AI technology to make its own system stronger.

Apple says the new models bring better reasoning, image understanding, speech support and image generation.

3) iOS 27 keeps older iPhones in the game

Apple confirmed that iOS 27 will support iPhone 11 and the same iPhone models as iOS 26.

That is good news if you are not rushing to buy a new phone. Some of Apple’s biggest software updates will still reach older devices.

Apple also said it brought an improved CPU scheduler to older iPhones going back to iPhone 11. That system helps your phone manage processing power as you move between tasks.

In everyday terms, Apple says older iPhones should feel more responsive. That could help when you switch apps, search for photos or use several features at once.

FIRST 15 THINGS TO DO OR TRY FIRST WHEN YOU GET A NEW IPHONE

4) Apple says your devices should feel faster

Apple did not spend the keynote only chasing new AI features. It also talked about speed. The company said iPhone and iPad apps can launch up to 30% faster. New photos may appear in your library up to 70% faster. AirDrop transfers may be up to 80% faster. On iPad, browsing files and moving them to an external drive may be up to five times faster.

Waiting for an app to open is annoying. So is taking a photo, then waiting for it to appear. Faster AirDrop could also make file sharing feel less clunky.

Apple also said it improved network transitions. Your iPhone should be smarter about moving between Wi-Fi and cellular. That could help in places where your phone clings to a weak Wi-Fi network, even though cellular would work better.

Apple’s WWDC 2026 keynote focused on Siri AI, Apple Intelligence and software updates coming to iPhone, Mac, iPad, Apple Watch and Vision Pro. (Cheng Xin/Getty Images)

5) Liquid Glass is getting easier to read

Apple also revisited Liquid Glass, the visual design system it introduced last year. This time, Apple said it refined Liquid Glass so that complex content behind it is easier to read. The goal is better contrast and clearer separation between controls and background content.

Apple is also adding a new slider in Settings. You can adjust Liquid Glass from ultra clear to fully tinted. That gives you more control. Some of you may like the transparent look. Others may want a stronger tint so buttons and text stand out.

On Mac, Apple is also bringing back more structure. Toolbars look more uniform. Sidebars stretch to the edge of app windows. Sidebar icons regain color. Windows also have a more consistent shape. The message is clear. Apple still likes the look of Liquid Glass, but it knows readability matters.

6) Apple is giving parents more control

Apple devoted a major part of WWDC 2026 to kids, teens and parental controls. The company says the most important first step is creating a Child Account. That account automatically turns on age-based safeguards, including adult website blocking, media limits and App Store restrictions. Apple also said parents can convert an existing account into a Child Account.

This year, Apple is adding a more guided setup process. Parents can decide which apps a child can use right away, then add more as the child is ready. In other words, a child may need Messages or school apps before they are ready for broader web access.

Apple also expanded Ask to Buy. Parents can now review app requests in Messages. A new Ask to Browse feature lets kids request permission before visiting a new website in Safari. Ask to Browse and Ask to Buy are both on by default for kids under 13. Parents can also turn them on for teens.

7) Screen Time is getting more flexible

Screen Time is getting a new look and more flexible controls. Apple says parents will see a clearer view of how kids use their devices. They can also adjust access faster.

A new Time Allowances feature gives parents suggested limits for app categories such as Entertainment, Games and Social Media. Apple says those recommendations are based on a child’s age and developed with clinical and child development experts, including the American Academy of Pediatrics. Parents can still adjust the limits themselves.

Apple also added schedules. That means parents can decide which apps are available during different parts of the day. For example, a parent could allow learning apps during school hours and entertainment apps later. Weekend settings can also be different from weekday settings. That is all very important because families do not all handle screen time the same way.

IS APPLE INTELLIGENCE ON YOUR IPHONE REALLY SECURE?

8) Safari can organize your messy tabs

Safari is getting Apple Intelligence features that could help with one of the most common browsing problems: too many tabs. Safari can now organize open tabs into topics. If you are researching a vacation, comparing products or planning a project, Safari can group related pages together. It can also add new related tabs to a topic as you keep browsing. That could help anyone who leaves tabs open because they are afraid of losing something important.

Safari is also adding Notify Me. You can ask Safari to watch a page for a change, then close the tab. Apple gave examples like waiting for camp signups or a product to come back in stock. When Safari detects the update, it sends you a notification. That may sound small. For tab hoarders, it could be a big relief.

9) Passwords can help fix weak accounts

Apple is also bringing Apple Intelligence into the Passwords app. That could be a big help because weak and reused passwords are still one of the easiest ways for scammers to break into accounts. Passwords already warns you when a password may be weak or compromised. Now, Apple says it can help update eligible accounts to stronger passwords with one tap.

That is the part that may get more people to act. Most of us know we should clean up old passwords. The hassle is getting it done. You have to visit the site, sign in, hunt for the account settings and create a better password.

Apple says Passwords can use Safari to handle supported password changes for you. That could make it much easier to fix risky accounts before they become a problem. Just do not treat it like a set-it-and-forget-it tool. After changing a password, make sure it is saved correctly and know where to find it later.

10) Visual Intelligence is spreading across Apple devices

Visual Intelligence is becoming a bigger part of Apple’s AI plan. On iPhone, Apple is adding a Siri mode inside the Camera app. You can point your camera at something, tap the shutter button and let Siri respond to what it sees.

Apple showed examples like getting nutritional insights from food and helping split a restaurant bill with Apple Cash.

On Mac, Visual Intelligence works through a keyboard shortcut. You can select something on your display, then ask Siri about it.

On iPad, Visual Intelligence connects with screenshots. On Vision Pro, Apple showed Siri answering questions about objects someone was looking at.

This could make Apple Intelligence feel more useful because it connects to what is in front of you. It is not limited to typing a question into a chat window.

Apple CEO Tim Cook delivered his final WWDC keynote as Apple CEO, announcing smarter features included in the tech company’s next big software update. (Josh Edelson / AFP via Getty Images)

11) Apple Intelligence is moving deeper into everyday apps

Apple also showed how Apple Intelligence will show up inside the apps you already use. This is where the update could become more useful in everyday life. Instead of making you open a separate AI tool, Apple is building these features into places like Messages, Mail, Calendar, Phone, Home and Shortcuts. In Messages, Apple says it can understand the context of a conversation and offer one-tap suggestions. For example, it could help create a reminder or note from a message. If someone asks for photos, Messages can help find the right shots by recognizing keywords, locations and people in your library. Mail is getting more capable suggestions, too. Apple says those suggestions will be based on the email you are reading and can help you take action with your favorite apps, including third-party apps.

Calendar is also getting a more natural way to add events. You can type what you want in plain language, and Calendar can fill in details as you go. Apple showed it identifying a contact, adding a location and creating a title. It can also adjust a recurring event when you describe the change. The Phone app may get one of the more useful upgrades. With Call Context, your iPhone can surface helpful details when you call a business. Apple gave the example of calling an airline and having your confirmation code appear from Mail when the call starts. Apple says the feature looks at who you are calling, not what you are saying, and runs entirely on your device.

The Home app is getting smarter about notifications and cameras. Apple says it can understand related accessory alerts as one activity, so you get one notification that keeps updating. For compatible cameras, the Home app can also summarize recorded clips, pull up related footage and let you search by what was captured. Shortcuts may become less intimidating, too. Instead of building an automation step by step, you can describe what you want. Apple showed an example where Shortcuts could message a partner with an ETA when someone leaves work. That is the bigger point here. Apple Intelligence is not only about Siri answering questions. Apple wants it to handle small tasks that usually require digging, tapping or searching inside the apps people already use.

HOW HACKERS ARE BREAKING INTO APPLE DEVICES THROUGH AIRPLAY

12) Photos and image creation are getting AI upgrades

Apple also announced several visual creation features. Image Playground is getting a major upgrade with more powerful image models. Apple says it can create higher-quality images in many styles, including photorealistic images. It can also use people from your Photos library, create images in different dimensions and help make Messages backgrounds, contact posters and Lock Screen wallpapers.

Apple also said you can refine images by describing changes. You can touch part of an image, then move it, resize it or add details. Photos is getting its own AI tools. Apple said Clean Up is improving. It also announced Extend, which can expand a photo beyond its original frame. Another feature, Spatial Reframing, lets you adjust the framing of a photo after you take it.

That could be very useful when a photo is close to perfect, but the edges feel off. These features show where Apple is headed. Your photo library is becoming more editable and easier to search with help from Apple Intelligence.

Important limits to know

Siri AI will not arrive for everyone at the same time. Apple said Siri AI will be available in beta later this year. Developers can try it first. It starts in English, with more languages to follow. Apple also said Siri AI will not initially be available in the EU on iOS and iPadOS. In China, Siri AI and other new Apple Intelligence features will not be available while Apple works through regulatory requirements.

Some Apple Intelligence features will also have daily usage limits. That includes image generation and other features that rely on Apple’s server-based models. Apple says people with most iCloud+ subscription plans will get increased access. While some features may depend on region, language, device support and usage limits.

Other WWDC 2026 updates worth noting

Apple also announced several smaller updates that may be useful.

Shared Albums can now include contributions from friends on Android or Windows. They also support full-resolution sharing.
The Health app is adding support for perimenopause and menopause. It can notify people when cycle patterns may suggest perimenopause. It also adds symptom logging and educational information.
AirPods are getting custom EQ so you can personalize their sound.
Apple Vision Pro can turn panoramas into spatial scenes with depth and realism. You can also use those panoramas as your environment.
Maps is improving Flyover with sharper detail using aerial imagery and vision intelligence models.

These may not be the headline features. Still, they could end up being the updates some of you use most.

What this means for you

Apple is trying to make AI feel like part of your device instead of another app you need to open. That means Siri could search your photos, understand messages, draft emails, compare files, summarize camera clips and help you act inside apps. Safari could organize tabs. Passwords could fix weak accounts. Calendar could understand a normal sentence. Shortcuts could become easier for people who never wanted to build automations. That sounds convenient. It also requires trust.

Apple says its approach is privacy-first. The company says Apple Intelligence uses on-device processing and Private Cloud Compute, so your data is only used to complete your request. Apple also says outside experts can verify those privacy promises. Still, you should pay attention to the features you enable. AI becomes more useful when it understands your personal context. That same access makes it more important to know what your device can search and use. The promise is less friction. The question is how much access you are comfortable giving.

Kurt’s key takeaways

Apple’s WWDC 2026 keynote felt like a reset for Siri and Apple Intelligence. Apple is trying to turn Siri into a more useful assistant that can understand what is on your screen and help inside the apps you already use. I also like that Apple focused on everyday frustrations, from faster apps and better AirDrop to smarter search, stronger passwords and improved parental controls. Still, Siri AI has to prove itself outside a keynote demo. Some features will have limits, and some regions will have delays. To me, Apple is finally saying it is serious about AI. Now it has to prove it on the devices people already own.

CLICK HERE TO DOWNLOAD THE FOX NEWS APP

Would you trust Siri AI to search your messages, photos, files and apps to get things done for you, or does that feel like too much access? Let us know by writing to us at Cyberguy.com

Sign up for my FREE CyberGuy Report

Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox.
For simple, real-world ways to spot scams early and stay protected, visit CyberGuy.com – trusted by millions who watch CyberGuy on TV daily.
Plus, you’ll get instant access to my Ultimate Scam Survival Guide free when you join.

Kurt “CyberGuy” Knutsson is an award-winning tech journalist who has a deep love of technology, gear and gadgets that make life better with his contributions for Fox News & FOX Business beginning mornings on “FOX & Friends.” Got a tech question? Get Kurt’s free CyberGuy Newsletter, share your voice, a story idea or comment at CyberGuy.com.

Technology

Bluesky is getting ‘communities’

Published

13 hours ago

June 11, 2026

Press Room

Bluesky will be getting “communities,” which will function as smaller spaces where you can “go deeper and hang out with people who care about the same stuff” sometime this year, according to head of product Alex Benzer. They will be built on the decentralized AT Protocol that underpins Bluesky, with Benzer saying that “it’s a new structure for everyone” that’s part of the “Atmosphere” (a shorthand for the AT Protocol ecosystem).

Benzer listed out a “few ideas we have in mind so far” in a thread. “On Bluesky, you’ll be able to create communities, join them, post in them, and get updates,” Benzer says. “The core features on Bluesky stay simple. The magic comes from communities also existing on the open web. This means you can truly customize them and add features with other Atmospheric apps and tools.”

Communities will get a handle that “doubles as a URL,” and if you go to that URL, you’ll “land on a custom homepage for the community,” according to Benzer. “Builders can also host a completely custom experience there instead.” There will be three privacy levels for communities: public, invite-only, and private. And each community would have its own feed, Benzer says.

Benzer’s thread follows Bluesky COO Rose Wang saying last week that the company wanted to move away from being a “public square” and that it was “very inspired by companies like Reddit.” Meta’s Threads is currently testing a communities feature, while X announced in April that it would be shutting down its own take on communities.