Connect with us

Business

Column: It’s not just Zoom: How websites and apps harvest your data to build AI

Published

on

Column: It’s not just Zoom: How websites and apps harvest your data to build AI

When, earlier this month, Zoom users realized that the company had updated its terms of service to allow it to use data collected from video calls to train its artificial intelligence systems, the backlash was swift. Celebrities, politicians and academics threatened to quit the service. Zoom quickly backtracked.

These are tense times. Many are worried, and quite rightfully so, that AI companies are threatening their livelihoods — that AI services like OpenAI, Google’s Bard and Midjourney have ingested work that artists, writers, photographers and content creators have put online, and can now emulate and produce it for cheap.

Other anxieties are more diffuse. We’re not yet entirely certain what these AI companies are capable of, exactly, or to what ends their products will be used. We worry that AI can be used to mimic our digital profiles, our voices, our identities. We worry about scams and exploitation.

Which is why the outrage against Zoom’s policy makes perfect sense — videoconferencing is one of the most intimate, personal and data-rich services we use. Whenever we Zoom — or FaceTime or Google Meet — we are transmitting detailed information about our faces, homes and voices to our friends, family and colleagues; the notion that data would be mined to train an AI that could be used for any purpose a tech company saw fit is disconcerting, to say the least.

And it raises the question: What kind of info are we comfortable forking over to the AIs, if any? Right now we are in the midst of a destabilizing moment. It’s alarming, yes, but it’s also an opportunity to renegotiate what we do and do not want to hand over to tech giants that have been gathering our personal data for decades now. But to make those sorts of decisions, first we have to know where we stand. What are the websites and apps we use every day doing with our data? Are they using it to train their AI systems? What can we do about it if so?

Advertisement

A good rule of thumb, to begin with: If you are posting pictures or words to a public-facing platform or website, chances are that information is going to be scraped by a system crawling the internet gathering data for AI companies, and very likely used to train an AI model of one kind or another. If it hasn’t already.

WEBSITES

If you have a website for your business, a personal blog, or write for a company that publishes stories or copy online, that information is getting hoovered up and put to work training an AI, no doubt about it. Unless, that is, the website owner has put in certain safeguards to keep AI crawlers out, but more on that in a second.

The sort of AI that has made headlines this year — OpenAI’s ChatGPT and DALL-e, Google’s Bard, Meta’s LLaMa — is more technically known as a large language model, or LLM. Simply put, LLMs work by “training” on large data sets of images and words. Very large data sets: Google’s “Colossal Clean Crawl Corpus,” or C4, spans 15 million websites.

Earlier this year, investigative reporters at the Washington Post teamed up with the Paul Allen Institute to analyze the kinds of websites that were scraped up to build that data set, which has played a major role in training many of the AI products you’re most familiar with. (Newer AI products have been trained on data sets that are even bigger than that.)

Advertisement

Everything from Wikipedia entries to Kickstarter projects to New York Times stories to personal blogs was scanned for use in amassing the AI data set. Perhaps we should see it as a badge of honor that we here at the Los Angeles Times provided C4 with the 6th-largest amount of training data of any site on the web. (Or maybe we should, you know, ask for some compensation for our contributions.) The largest source of data in C4, by some margin, is the U.S. patent office. My own embarrassing personal website, brianmerchant.org, was scraped by the AI crawler and deposited into C4 — when you chat with an AI bot, just bear in mind that it may be 1/15,000,000th the online CV of Brian Merchant.

OK, so let’s say you don’t want OpenAI building ChatGPT-7 with fresh posts from your personal blog, or your copywriters’ finely crafted prose. What can you do?

Well, just this week, OpenAI announced its latest web-crawling tool, GPTBot, along with instructions on how to block it. Website owners and admins who want to block future crawling should add an entry to their site’s robots.txt file and tell it to “Disallow: /”. As some have noted, not all crawlers obey such commands, but it’s a start. Still, any data that have already been scraped will not be removed from those data sets.

Furthermore, the web trawlers looking for data aren’t supposed to penetrate paywalls or any websites requiring passwords for entry, so putting your site under lock and key will keep it from AI adoption.

So that’s the open web — what about apps?

Advertisement

First off, the same principle that goes for the web goes for 99% of apps out there — if you are creating something to post publicly, on a digital platform, chances are it’s going into one AI crawler or another, or already has. Remember, most social media apps have, from the beginning, predicated their entire business models on encouraging you to produce content that they will analyze and use to sell you ads with automated systems. Nothing is sacred here, or even truly private, unless the service in question offers end-to-end encryption or particularly good privacy settings.

TIKTOK

Take TikTok, which is one of the most-downloaded apps in the world, and boasts over a billion users. It has run on AI and machine learning from the start. Its much-discussed algorithm, which serves users the content it thinks they’ll want most, is based on battle-tested AI techniques such as computer vision and machine learning, and has been from the start. Every post submitted to TikTok is being scanned, stored and analyzed by AI, and is training its algorithm to improve its ability to send you content it thinks you’ll like.

Beyond that, we don’t have much information about what ByteDance, the Chinese company that owns TikTok, might plan to do with all the data it’s processed. But they’ve got a vast trove of it — from users and creators alike — and a lot is possible.

INSTAGRAM

Advertisement

Now, with Instagram, we know that your posts have been fed into an AI training system operated by Meta, the company that owns Instagram and Facebook. News broke in 2018 that the company had scraped billions of Instagram posts for AI data training purposes. The company said it was using those data to improve object recognition and its computer vision systems, but who knows.

FACEBOOK

Technically, Facebook prohibits scraping, so the biggest crawlers probably haven’t scooped up your posts for wider use in products like ChatGPT. But Meta itself is very much in the AI game, just like all the major tech giants — it has trained its own proprietary system, LLaMa — and it’s not clear what the company itself is doing with your posts. But we do know that it’s been earmarking user posts for AI processing in the recent past. In 2019, Reuters reported that Facebook contractors were looking at posts, even those set as private, in order to label them for AI training.

TWITTER/X

Like Facebook, X-née-Twitter has technically prohibited scraping of its posts, making it harder for bots to get at them. But owner Elon Musk has said that he’s interested in charging the AI scrapers for access, and in using them to train X’s own nascent AI efforts.

Advertisement

“We will use the public tweets — obviously not anything private — for training,” Musk said in a Twitter Spaces chat in July, “just like everyone else has.”

REDDIT

The popular and massive web forum Reddit has been scraped for data plenty. But recently, its CEO, Steve Huffman, has said that he intends to start charging AI scrapers for access. So, yes, if you post on Reddit, you’re feeding the bots.

We could keep going down the line — but this sampling should help make the gist of the matter clear: Almost everything is up for grabs if you’re creating content online for public consumption.

So that leaves at least one big question: What about messages, posts and work you make with digital tools for private consumption?

Advertisement

The reason the Zoom issue turned into a mini-scandal is because it’s a service not usually meant for public-facing use. And this is where it gets more complicated. It’s case by case, and if you really want to be sure about whether the products you’re using are harvesting your words or work for AI training, you’re going to have to dive into some terms of service yourself — or seek out products built with privacy in mind.

GOOGLE / GMAIL

Let’s start with a big one. It’s easy to forget that until a few years ago, Google’s AI read your email. In order to serve you better ads, the search giant’s automated systems combed your gmail for data. Google says it doesn’t do that anymore, and claims that any of the Work products you might use, such as Docs or Sheets, won’t be used to train AI without your consent. Nonetheless, authors are uneasy about the prospect that their drafts will wind up training an AI, and quite reasonably so.

GRAMMARLY

Grammarly, the popular grammar and spell-checking tool, explicitly states that any text you place in its system can be used to train AI systems in perpetuity. Every customer, its terms of service says, “acknowledges that a fundamental component of the Service is the use of machine learning…. Customer hereby grants us the right to use, during and after the Subscription Term, aggregated and anonymized Customer Data to improve the Services, including to train our algorithms internally through machine learning techniques.”

Advertisement

In other words, you’re handing Grammarly AI training material every time you check your spelling.

APPLE MESSAGES

Apple’s in the AI game too, though it doesn’t publicly flaunt it as much. And it insists that the kind of machine learning it’s interested in is what’s known as on-device AI — instead of taking your data and adding it to large data sets stored on the cloud, its automated systems live locally on the chips in your device.

Apple harnesses machine learning to do things like improve autocorrect in your text messages, recognize the shape of your face, pick out friends and family members in your camera roll, automatically adjust noise cancellation on your Airpods when it’s loud, and ID that plant you just snapped on a hike. So Apple’s machine learning systems are reading your texts and scanning your photos, but only within the confines of your iPhone — it’s not sending that information to the cloud, like most of its competitors.

ZOOM

Advertisement

And finally, we return to Zoom. Because I have one last point to add to the dust-up that got us started here. Which is, while Zoom may have added one little line to its terms of service indicating that it will not use your on-call data for its AI services — unless the host of your call has consented, which is a pretty major exception — it can still keep your data for just about everything else.

Here’s the part that still remains very much in effect, every time you boot up Zoom:

“You agree to grant and hereby grant Zoom a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license and all other rights required or necessary to redistribute, publish, import, access, use, store, transmit, review, disclose, preserve, extract, modify, reproduce, share, use, display, copy, distribute, translate, transcribe, create derivative works, and process Customer Content.”

In other words, they can do just about anything they want with our private recorded conversations, except for training AI without our consent. That still seems rather onerous!

And therein, ultimately, lies the rub.

Advertisement

So much of what the tech industry is doing with AI is not orders of magnitude more invasive or exploitative than what they’ve been doing all along — they’re incremental amplifications. The tech giants have harvested, hoarded, scraped and sold our personal data for well over a decade now, and this is just another step.

But we should be grateful that it’s a genuinely unnerving one: It gives us a chance to demand more from the companies that have erected the digital infrastructure, services and playgrounds we spend so much of our time on, even depend on. It gives the opportunity for us to renegotiate what we should consider socially — and economically — acceptable in how our data are taken and used.

Adobe, for instance — whose Beta users automatically opt in to having their work help train AI — has promised to pay creators who opt into a program that trains AI on their works. Few have seen any returns, as of yet, but it’s an idea, at least.

The best solution, right now, if you want to keep your words, images and likeness away from AI is to use encrypted apps and services that are good on privacy.

Instead of using Zoom for texting and video calls, use Signal, which is widely available, popular and boasts end-to-end encryption. For email, try a service like Proton mail, which does not rely on harvesting ads for revenue, and puts privacy first. If you have a blog or a personal site, you can tell OpenAI not to scrape through robots.txt. You can put up a paywall, or require a password to enter.

Advertisement

If you’re a developer or a product manager working on a project, in good faith, that relies on gathering other people’s data, seek consent first. And by all means, keep making noise when other folks don’t. We have a real chance to reevaluate and reestablish a true doctrine of consent online, and set new standards — before our words are sucked up and mutated and integrated into the chat-borg bots of the future.

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Business

SEC probes B. Riley loan to founder, deals with franchise group

Published

on

SEC probes B. Riley loan to founder, deals with franchise group

B. Riley Financial Inc. received more demands for information from federal regulators about its dealings with now-bankrupt Franchise Group as well as a personal loan for Chairman and co-founder Bryant Riley.

The Los Angeles-based investment firm and Riley each received additional subpoenas in November from the U.S. Securities and Exchange Commission seeking documents and information about Franchise Group, or FRG, the retail company that was once one of its biggest investments before its collapse last year, according to a long-delayed quarterly filing. The agency also wants to know more about Riley’s pledge of B. Riley shares as collateral for a personal loan, the filing shows.

B. Riley previously received SEC subpoenas in July for information about its dealings with ex-FRG chief executive Brian Kahn, part of a long-running probe that has rocked B. Riley and helped push its shares to their lowest in more than a decade. Bryant Riley, who founded the company in 1997 and built it into one of the biggest U.S. investment firms beyond Wall Street, has been forced to sell assets and raise cash to ease creditors’ concerns.

The firm and Riley “are responding to the subpoenas and are fully cooperating with the SEC,” according to the filing. The company said the subpoenas don’t mean the SEC has determined any violations of law have occurred.

Advertisement

Shares in B. Riley jumped more than 25% in New York trading after the company’s overdue quarterly filing gave investors their first formal look at the firm’s performance in more than half a year. The data included a net loss of more than $435 million for the three months ended June 30. The shares through Monday had plunged more than 80% in the past 12 months, trading for less than $4 each.

B. Riley and Kahn — a longstanding client and friend of Riley’s — teamed up in 2023 to take FRG private in a $2.8-billion deal. The transaction soon came under pressure when Kahn was tagged as an unindicted co-conspirator by authorities in the collapse of an unrelated hedge fund called Prophecy Asset Management, which led to a fraud conviction for one of the fund’s executives.

Kahn has said he didn’t do anything wrong, that he wasn’t aware of any fraud at Prophecy and that he was among those who lost money in the collapse. But federal investigations into his role have spilled over into his dealings with B. Riley and its chairman, who have said internal probes found they “had no involvement with, or knowledge of, any alleged misconduct concerning Mr. Kahn or any of his affiliates.”

FRG filed for Chapter 11 bankruptcy in November, a move that led to hundreds of millions of dollars of losses for B. Riley. The collapse made Riley “personally sick,” he said at the time.

One of the biggest financial problems to arise from the FRG deal was a loan that B. Riley made to Kahn for about $200 million, which was secured against FRG shares. With that company’s collapse into bankruptcy in November wiping out equity holders, the value of the remaining collateral for this debt has now dwindled to only about $2 million, the filing shows.

Advertisement

Griffin writes for Bloomberg.

Continue Reading

Business

Starbucks Reverses Its Open-Door Policy for Bathroom Use and Lounging

Published

on

Starbucks Reverses Its Open-Door Policy for Bathroom Use and Lounging

Starbucks will require people visiting its coffee shops to buy something in order to stay or to use its bathrooms, the company announced in a letter sent to store managers on Monday.

The new policy, outlined in a Code of Conduct, will be enacted later this month and applies to the company’s cafes, patios and bathrooms.

“Implementing a Coffeehouse Code of Conduct is something most retailers already have and is a practical step that helps us prioritize our paying customers who want to sit and enjoy our cafes or need to use the restroom during their visit,” Jaci Anderson, a Starbucks spokeswoman, said in an emailed statement.

Ms. Anderson said that by outlining expectations for customers the company “can create a better environment for everyone.”

The Code of Conduct will be displayed in every store and prohibit behaviors including discrimination, harassment, smoking and panhandling.

Advertisement

People who violate the rules will be asked to leave the store, and employees may call law enforcement, the policy says.

Before implementation of the new policy begins on Jan. 27, store managers will be given 40 hours to prepare stores and workers, according to the company. There will also be training sessions for staff.

This training time will be used to prepare for other new practices, too, including asking customers if they want their drink to stay or to go and offering unlimited free refills of hot or iced coffee to customers who order a drink to stay.

The changes are part of an attempt by the company to prioritize customers and make the stores more inviting, Sara Trilling, the president of Starbucks North America, said in a letter to store managers.

“We know from customers that access to comfortable seating and a clean, safe environment is critical to the Starbucks experience they love,” she wrote. “We’ve also heard from you, our partners, that there is a need to reset expectations for how our spaces should be used, and who uses them.”

Advertisement

The changes come as the company responds to declining sales, falling stock prices and grumbling from activist investors. In August, the company appointed a new chief executive, Brian Niccol.

Mr. Niccol outlined changes the company needed to make in a video in October. “We will simplify our overly complex menu, fix our pricing architecture and ensure that every customer feels Starbucks is worth it every single time they visit,” he said.

The new purchase requirement reverses a policy Starbucks instituted in 2018 that said people could use its cafes and bathrooms even if they had not bought something.

The earlier policy was introduced a month after two Black men were arrested in a Philadelphia Starbucks while waiting to meet another man for a business meeting.

Officials said that the men had asked to use the bathroom, but that an employee had refused the request because they had not purchased anything. An employee then called the police, and part of the ensuing encounter was recorded on video and viewed by millions of people online, prompting boycotts and protests.

Advertisement

In 2022, Howard Schultz, the Starbucks chief executive at the time, said that the company was reconsidering the open-bathroom policy.

Continue Reading

Business

'TikTok refugees' unexpectedly turn to Chinese alternative as ban looms

Published

on

'TikTok refugees' unexpectedly turn to Chinese alternative as ban looms

TikTok users concerned about a looming ban are finding solace in a strange place.

Days ahead of a Supreme Court decision that could determine whether the popular short-video app shuts down starting Sunday, a number of users appear to be turning to an app called RedNote — more commonly known to its majority-Chinese audience by its Chinese name, Xiaohongshu.

It’s a surprising choice since Xiaohongshu is Chinese-owned, and such ties are the reason U.S. lawmakers moved to ban TikTok in the U.S., citing privacy and national security concerns.

Also Xiaohongshu is dominated by Chinese language, and its content is subject to censorship by Chinese government officials, something alien to most U.S. users.

But by embracing a Chinese social media and lifestyle app similar to Instagram, some U.S. TikTok users say they are protesting what they believe is the unfair ban of the ubiquitous app.

Advertisement

“I think America is trying to bully China into selling to an American owner. A lot of us just don’t want to give in to them,” said Samantha Manassero, a 39-year-old nurse in L.A. who downloaded Xiaohongshu on Sunday night after watching content creators on TikTok pitch it as a comparable app. “I think some of it is literally just pettiness.”

Last year, Congress passed a bill that requires TikTok’s owner, Bytedance, to sell the app to a U.S.-approved owner or face a nationwide ban. As soon as Wednesday, the Supreme Court is expected to uphold the legality of the ban.

It was unclear whether Xiaohongshu, which was started in 2013, would become a viable alternative to TikTok or if the recent migration to the Chinese platform accounts for a significant share of TikTok’s 170 million U.S. users.

But a surge in new users made Xiaohongshu the top free download on Apple’s App Store this week. No. 2 on the charts was another social media app developed by Bytedance, Lemon8. It’s unclear whether either app will be subjected to the same U.S. government scrutiny as TikTok.

It is also difficult to determine exactly how many U.S. TikTok users have created accounts on Xiaohongshu or how many will stay on it. While many Xiaohongshu regulars have welcomed the influx of Americans identifying themselves as “TikTok refugees,” the app’s interface is largely in Chinese, making it difficult to navigate for non-native speakers.

Advertisement

Chinese apps are subject to stringent censorship on discussions that the Chinese government deems politically sensitive. These topics can range from illegal activities to LGBTQ+ rights to Winnie the Pooh, images of which have been used to mock Chinese President Xi Jinping.

The Chinese version of TikTok, called Douyin, has different content restrictions and is only available for mobile download in China. Bytedance has argued that TikTok, which is used by the rest of the world, is a separate entity from Douyin and not beholden to the Chinese Communist Party.

That did not stop President-elect Donald Trump from proposing a ban of TikTok in 2020, or President Biden from signing it into law in 2024.

The legality of such a ban has been questioned several times. Last month, in an about-face, Trump, who has 14.8 million followers on TikTok, filed a legal brief requesting to stay the ban so he can negotiate a deal once he takes office.

As TikTok faces an uncertain future, Xiaohongshu’s latest arrivals were eager to try out the new app despite its foreign nature.

Advertisement

Manassero, who posts videos about healthcare and power lifting to about 7,000 followers on TikTok, said she already has a much larger audience of 26,000 on Instagram. However, she was motivated to create an account on Xiaohongshu partly out of frustration at the U.S. government’s determination to outlaw TikTok.

“I don’t know what I’m doing, I don’t know what I’m reading, I’m just pressing buttons,” Manassero said in her first video post. The next morning, her account had received 5,000 views and 3,500 new followers. By Tuesday, the hashtag “Tiktok refugee” had received more than 90 million views and 2 million comments.

TikTokers sought each other out with introductions, follow requests and shared tips on how to navigate the app’s Chinese functions. On Monday, more than 190,000 viewers joined a live chat named “TikTok Refugees Club,” and held discussions in English about what a TikTok ban would mean and future plans for social media content. In the comments, users greeted new arrivals and lamented they could not understand each other.

“Maybe you can learn how to speak Chinese,” one user wrote in English.

“Where’s the translator?” another viewer asked in Chinese.

Advertisement

On Tuesday, the Wall Street Journal reported that Chinese officials had discussed the possibility of selling TikTok to a trusted non-Chinese party such as Elon Musk, who already owns social media platform X. However, analysts said that Bytedance is unlikely to agree to a sale of the underlying algorithm that powers the app, meaning the platform under a new owner could still look drastically different.

Manassero and other TikTokers expressed distaste at the prospect of migrating to U.S. tech platforms such as Instagram or X that could benefit from an influx of users if TikTok shuts down.

“We don’t want to turn around and make a bunch of billionaires even more rich,” she said. “I would honestly rather the app get shut down than be owned by Elon Musk.”

Though she is still trying to figure out how to use Xiaohongshu and message people back, Manassero said she would likely stay on the Chinese lifestyle app regardless of whether the TikTok ban goes through.

“The response has been so friendly and nice. It’s good energy,” she said. “This feels like the early TikTok days: a little more organic, so it’s fun.”

Advertisement
Continue Reading

Trending