Technology
How to Run Your Own ChatGPT-Like LLM for Free (and in Private)
The power of large language models (LLMs) such as ChatGPT, generally made possible by cloud computing, is obvious, but have you ever thought about running an AI chatbot on your own laptop or desktop? Depending on how modern your system is, you can likely run LLMs on your own hardware. But why would you want to?
Well, maybe you want to fine-tune a tool for your own data. Perhaps you want to keep your AI conversations private and offline. You may just want to see what AI models can do without the companies running cloud servers shutting down any conversation topics they deem unacceptable. With a ChatGPT-like LLM on your own hardware, all of these scenarios are possible.
And hardware is less of a hurdle than you might think. The latest LLMs are optimized to work with Nvidia graphics cards and with Macs using Apple M-series processors—even low-powered Raspberry Pi systems. And as new AI-focused hardware comes to market, like the integrated NPU of Intel’s “Meteor Lake” processors or AMD’s Ryzen AI, locally run chatbots will be more accessible than ever before.
Thanks to platforms like Hugging Face and communities like Reddit’s LocalLlaMA, the software models behind sensational tools like ChatGPT now have open-source equivalents—in fact, more than 200,000 different models are available at this writing. Plus, thanks to tools like Oobabooga’s Text Generation WebUI, you can access them in your browser using clean, simple interfaces similar to ChatGPT, Bing Chat, and Google Bard.
The software models behind sensational tools like ChatGPT now have open-source equivalents—in fact, more than 200,000 different models are available.
So, in short: Locally run AI tools are freely available, and anyone can use them. However, none of them is ready-made for non-technical users, and the category is new enough that you won’t find many easy-to-digest guides or instructions on how to download and run your own LLM. It’s also important to remember that a local LLM won’t be nearly as fast as a cloud-server platform because its resources are limited to your system alone.
Nevertheless, we’re here to help the curious with a step-by-step guide to setting up your own ChatGPT alternative on your own PC. Our guide uses a Windows machine, but the tools listed here are generally available for Mac and Linux systems as well, though some extra steps may be involved when using different operating systems.
Some Warnings About Running LLMs Locally
First, however, a few caveats—scratch that, a lot of caveats. As we said, these models are free, made available by the open-source community. They rely on a lot of other software, which is usually also free and open-source. That means everything is maintained by a hodgepodge of solo programmers and teams of volunteers, along with a few massive companies like Facebook and Microsoft. The point is that you’ll encounter a lot of moving parts, and if this is your first time working with open-source software, don’t expect it to be as simple as downloading an app on your phone. Instead, it’s more like installing a bunch of software before you can even think about downloading the final app you want—which then still may not work. And no matter how thorough and user-friendly we try to make this guide, you may run into obstacles that we can’t address in a single article.
Also, finding answers can be a real pain. The online communities devoted to these topics are usually helpful in solving problems. Often, someone’s solved the problem you’re encountering in a conversation you can find online with a little searching. But where is that conversation? It might be on Reddit, in an FAQ, on a GitHub page, in a user forum on HuggingFace, or somewhere else entirely.
AI is quicksand. Everything moves whip-fast, and the environment undergoes massive shifts on a constant basis.
It’s worth repeating that open-source AI is moving fast. Every day new models are released, and the tools used to interact with them change almost as often, as do the underlying training methods and data, and all the software undergirding that. As a topic to write about or to dive into, AI is quicksand. Everything moves whip-fast, and the environment undergoes massive shifts on a constant basis. So much of the software discussed here may not last long before newer and better LLMs and clients are released.
Bottom line: Proceed at your own risk. There’s no Geek Squad to call for help with open-source software; it’s not all professionally maintained; and you’ll find no handy manual to read or customer service department to turn to—just a bunch of loosely organized online communities.
Finally, once you get it all running, these AI models have varying degrees of polish, but they all carry the same warnings: Don’t trust what they say at face value, because it’s often wrong. Never look to an AI chatbot to help make your health or financial decisions. The same goes for writing your school essays or your website articles. Also, if the AI says something offensive, try not to take it personally. It’s not a person passing judgment or spewing questionable opinions; it’s a statistical word generator made to spit out mostly legible sentences. If any of this sounds too scary or tedious, this may not be a project for you.
Select Your Hardware
Before you begin, you’ll need to know a few things about the machine on which you want to run an LLM. Is it a Windows PC, a Mac, or a Linux box? This guide, again, will focus on Windows, but most of the resources referenced offer additional options and instructions for other operating systems.
You also need to know whether your system has a discrete GPU or relies on its CPU’s integrated graphics. Plenty of open-source LLMs can run solely on your CPU and system memory, but most are made to leverage the processing power of a dedicated graphics chip and its extra video RAM. Gaming laptops, desktops, and workstations are better suited to these applications, since they have the powerful graphics hardware these models often rely on.
Gaming laptops and mobile workstations offer the best hardware for running LLMs at home. (Credit: Molly Flores)
In our case, we’re using a Lenovo Legion Pro 7i Gen 8 gaming notebook, which combines a potent Intel Core i9-13900HX CPU, 32GB of system RAM, and a powerful Nvidia GeForce RTX 4080 mobile GPU with 12GB of dedicated VRAM.
If you’re on a Mac or Linux system, are CPU-dependent, or are using AMD instead of Intel hardware, be aware that while the general steps in this guide are correct, you may need extra steps and additional or different software to install. And the performance you see could be markedly different from what we discuss here.
Set Up Your Environment and Required Dependencies
To start, you must download some necessary software: Microsoft Visual Studio 2019. Any updated version of Visual Studio 2019 will work (though not newer annualized releases), but we recommend getting the latest version directly from Microsoft.
(Credit: Brian Westover/Microsoft)
Personal users will be fine to skip the Enterprise and Professional versions and use just the BuildTools version of the software.
Find the latest version of Visual Studio 2019 and download the BuildTools version (Credit: Brian Westover/Microsoft)
After choosing that, be sure to select “Desktop Development with C++.” This step is essential in order for other pieces of software to work properly.
Be sure to select “Desktop development with C++.” (Credit: Brian Westover/Microsoft)
Begin your download and kick back: Depending on your internet connection, it could take several minutes before the software is ready to launch.
(Credit: Brian Westover/Microsoft)
Download Oobabooga’s Text Generation WebUI Installer
Next, you need to download the Text Generation WebUI tool from Oobabooga. (Yes, it’s a silly name, but the GitHub project makes an easy-to-install and easy-to-use interface for AI stuff, so don’t get hung up on the moniker.)
(Credit: Brian Westover/Oobabooga)
To download the tool, you can either navigate through the GitHub page or go directly to the collection of one-click installers Oobabooga has made available. We’ve installed the Windows version, but this is also where you’ll find installers for Linux and macOS. Download the zip file shown below.
(Credit: Brian Westover/Oobabooga)
Create a new file folder someplace on your PC that you’ll remember and name it AI_Tools or something similar. Do not use any spaces in the folder name, since that will mess up some of the automated download and install processes of the installer.
(Credit: Brian Westover/Microsoft)
Then, extract the contents of the zip file you just downloaded into your new AI_Tools folder.
Run the Text Generation WebUI Installer
Once the zip file has been extracted to your new folder, look through the contents. You should see several files, including one called start_windows.bat. Double-click it to begin installation.
Depending on your system settings, you might get a warning about Windows Defender or another security tool blocking this action, because it’s not from a recognized software vendor. (We haven’t experienced or seen anything reported online to indicate that there’s any problem with these files, but we’ll repeat that you do this at your own risk.) If you wish to proceed, select “More info” to confirm whether you want to run start_windows.bat. Click “Run Anyway” to continue the installation.
(Credit: Brian Westover/Microsoft)
Now, the installer will open up a command prompt (CMD) and begin installing the dozens of software pieces necessary to run the Text Generation WebUI tool. If you’re unfamiliar with the command-line interface, just sit back and watch.
(Credit: Brian Westover/Microsoft)
First, you’ll see a lot of text scroll by, followed by simple progress bars made up of hashtag or pound symbols, and then a text prompt will appear. It will ask you what your GPU is, giving you a chance to indicate whether you’re using Nvidia, AMD, or Apple M series silicon or just a CPU alone. You should already have figured this out before downloading anything. In our case, we select A, because our laptop has an Nvidia GPU.
(Credit: Brian Westover/Microsoft)
Once you’ve answered the question, the installer will handle the rest. You’ll see plenty of text scroll by, followed first by simple text progress bars and then by more graphically pleasing pink and green progress bars as the installer downloads and sets up everything it needs.
(Credit: Brian Westover/Microsoft)
At the end of this process (which may take up to an hour), you’ll be greeted by a warning message surrounded by asterisks. This warning will tell you that you haven’t downloaded any large language model yet. That’s good news! It means that Text Generation WebUI is just about done installing.
(Credit: Brian Westover/Microsoft)
At this point you’ll see some text in green that reads “Info: Loading the extension gallery.” Your installation is complete, but don’t close the command window yet.
(Credit: Brian Westover/Microsoft)
Copy and Paste the Local Address for WebUI
Immediately below the green text, you’ll see another line that says “Running on local URL: http://127.0.01:7860.” Just click that URL text, and it will open your web browser, serving up the Text Generation WebUI—your interface for all things LLM.
(Credit: Brian Westover/Microsoft)
You can save this URL somewhere or bookmark it in your browser. Even though Text Generation WebUI is accessed through your browser, it runs locally, so it’ll work even if your Wi-Fi is turned off. Everything in this web interface is local, and the data generated should be private to you and your machine.
(Credit: Brian Westover/Oobabooga)
Close and Reopen WebUI
Once you’ve successfully accessed the WebUI to confirm it’s installed correctly, go ahead and close both the browser and your command window.
In your AI_Tools folder, open up the same start_windows batch file that we ran to install everything. It will reopen the CMD window but, instead of going through that whole installation process, will load up a small bit of text including the green text from before telling you that the extension gallery is loaded. That means the WebUI is ready to open again in your browser.
(Credit: Brian Westover/Oobabooga)
Use the same local URL you copied or bookmarked earlier, and you’ll be greeted once again by the WebUI interface. This is how you will open the tool in the future, leaving the CMD window open in the background.
Select and Download an LLM
Now that you have the WebUI installed and running, it’s time to find a model to load. As we said, you’ll find thousands of free LLMs you can download and use with WebUI, and the process of installing one is pretty straightforward.
If you want a curated list of the most recommended models, you can check out a community like Reddit’s /r/LocalLlaMA, which includes a community wiki page that lists several dozen models. It also includes information about what different models are built for, as well as data about which models are supported by different hardware. (Some LLMs specialize in coding tasks, while others are built for natural text chat.)
These lists will all end up sending you to Hugging Face, which has become a repository of LLMs and resources. If you came here from Reddit, you were probably directed straight to a model card, which is a dedicated information page about a specific downloadable model. These cards provide general information (like the datasets and training techniques that were used), a list of files to download, and a community page where people can leave feedback as well as request help and bug fixes.
At the top of each model card is a big, bold model name. In our case, we used the the WizardLM 7B Uncensored model made by Eric Hartford. He uses the screen name ehartford, so the model’s listed location is “ehartford/WizardLM-7B-Uncensored,” exactly how it’s listed at the top of the model card.
Next to the title is a little copy icon. Click it, and it will save the properly formatted model name to your clipboard.
(Credit: Brian Westover/Hugging Face)
Back in WebUI, go to the model tab and enter that model name into the field labeled “Download custom model or LoRA.” Paste in the model name, hit Download, and the software will start downloading the necessary files from Hugging Face.
(Credit: Brian Westover/Oobabooga)
If successful, you’ll see an orange progress bar pop up in the WebUI window and several progress bars will appear in the command window you left open in the background.
(Credit: Brian Westover/Oobabooga)
(Credit: Brian Westover/Oobabooga)
Once it’s finished (again, be patient), the WebUI progress bar will disappear and it will simply say “Done!” instead.
Load Your Model and Settings in WebUI
Once you’ve got a model downloaded, you need to load it up in WebUI. To do this, select it from the drop-down menu at the upper left of the model tab. (If you have multiple models downloaded, this is where you choose one to use.)
Before you can use the model, you need to allocate some system or graphics memory (or both) to running it. While you can tweak and fine-tune nearly anything you want in these models, including memory allocation, I’ve found that setting it at roughly two-thirds of both GPU and CPU memory works best. That leaves enough unused memory for your other PC functions while still giving the LLM enough memory to track and hold a longer conversation.
(Credit: Brian Westover/Oobabooga)
Once you’ve allocated memory, hit the Save Settings button to save your choice, and it will default to that memory allocation every time. If you ever want to change it, you can simply reset it and press Save Settings again.
Enjoy Your LLM!
With your model loaded up and ready to go, it’s time to start chatting with your ChatGPT alternative. Navigate within WebUI to the Text Generation tab. Here you’ll see the actual text interface for chatting with the AI. Enter text into the box, hit Enter to send it, and wait for the bot to respond.
(Credit: Brian Westover/Oobabooga)
Here, we’ll say again, is where you’ll experience a little disappointment: Unless you’re using a super-duper workstation with multiple high-end GPUs and massive amounts of memory, your local LLM won’t be anywhere near as quick as ChatGPT or Google Bard. The bot will spit out fragments of words (called tokens) one at a time, with a noticeable delay between each.
However, with a little patience, you can have full conversations with the model you’ve downloaded. You can ask it for information, play chat-based games, even give it one or more personalities. Plus, you can use the LLM with the assurance that your conversations and data are private, which gives peace of mind.
You’ll encounter a ton of content and concepts to explore while starting with local LLMs. As you use WebUI and different models more, you’ll learn more about how they work. If you don’t know your text from your tokens, or your GPTQ from a LoRA, these are ideal places to start immersing yourself in the world of machine learning.
Like What You’re Reading?
Sign up for Tips & Tricks newsletter for expert advice to get the most out of your technology.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Technology
Amazon’s Echo Spot alarm clock is on sale with a free color smart bulb
The clocks have fallen back an hour for many of us, and if it all feels a bit disorienting, you’re not alone. Thankfully, Amazon’s versatile Echo Spot is on sale to help you adjust. Normally $79.99, right now you can buy the smart alarm clock at Amazon in black, blue, or white with a free Globe Electric smart bulb for $49.99, which equates to $41.99 in savings. That’s just $5 shy of the all-time low we saw during Amazon’s recent Prime Day event.
Amazon’s speedy smart speaker can be set up so that it gently wakes you up with music instead of typical alarm clock sounds, which can be jarring. The Spot is also a lot more useful than your run-of-the-mill clock, as it offers a customizable 2.83-inch screen that displays helpful info (including the weather and music playback). However, unlike Amazon’s larger smart displays, the latest Spot doesn’t push on-screen ads and lacks a camera, so there’s less of a privacy concern.
What makes the Spot particularly useful, though, is that it functions as an inexpensive Alexa speaker. That means you can use it to perform all kinds of tasks, from setting reminders to playing podcasts, audiobooks, and music. You can also use it to control other smart home devices with just your voice, including lights and smart thermostats. That’ll come in handy as the days get colder and darker — after all, no one wants to leave the warmth of their bed just to hit the lights.
Technology
Updated Android malware can hijack calls you make to your bank
Do you remember those TV shows where the villain gets defeated in one season but comes back even stronger in the next? Think “Stranger Things” on Netflix. The malware we’re talking about here is just like that. It’s called FakeCalls, and every time researchers figure out how it infects devices, it evolves with new ways to hide.
Earlier this year, it was reported to be impersonating large financial institutions, and now security researchers have discovered that the malware has gone through another upgrade. It can even hijack the calls you make to your bank using your Android phone.
ENTER CYBERGUY’S $500 HOLIDAY GIFT CARD SWEEPSTAKES
What you need to know
FakeCalls is a banking trojan that focuses on voice phishing, where victims are deceived through fraudulent calls impersonating banks and are asked to share sensitive information. Earlier versions did this by prompting users to call the bank from within an app that impersonated the financial institution, as reported by Bleeping Computer. However, the latest version, analyzed by Zimperium, sets itself as the default call handler.
The default call handler app manages incoming and outgoing calls, allowing users to answer, reject or initiate calls. Giving these permissions to a malicious app, as you can imagine, carries serious risks.
When a user gives the app permission to set itself as the default call handler, the malware gets the green light to intercept and mess with both outgoing and incoming calls. It even shows a fake call interface that looks just like the real Android dialer, complete with trusted contact info and names. This level of deception makes it really tough for victims to see what’s happening.
“When the compromised individual attempts to contact their financial institution, the malware redirects the call to a fraudulent number controlled by the attacker,” explains the new Zimperium report. “The malicious app will deceive the user, displaying a convincing fake UI that appears to be the legitimate Android’s call interface showing the real bank’s phone number.”
“The victim will be unaware of the manipulation, as the malware’s fake UI will mimic the actual banking experience, allowing the attacker to extract sensitive information or gain unauthorized access to the victim’s financial accounts,” the report added.
ANDROID BANKING TROJAN EVOLVES TO EVADE DETECTION AND STRIKE GLOBALLY
The malware can also steal your data
This malware not only hijacks your calls but can also steal your data. It gets access to Android’s Accessibility permissions, which basically gives it free rein to do whatever it wants. The developer of the malware has also added several new commands, including the ability to start livestreaming the device’s screen, take screenshots, unlock the device if it’s locked and temporarily turn off auto-lock. It can also use accessibility features to mimic pressing the home button, delete images specified by the command server, and access, compress and upload photos and thumbnails from storage, especially from the DCIM folder.
ANDROID BANKING TROJAN MASQUERADES AS GOOGLE PLAY TO STEAL YOUR DATA
6 ways to protect yourself from FakeCalls malware
1) Have strong antivirus software: Android has its own built-in malware protection called Play Protect, but the FakeCalls malware proves it’s not enough. Historically, Play Protect hasn’t been 100% foolproof at removing all known malware from Android phones. Also, avoid clicking on any links in messages or emails that seem suspicious. The best way to protect yourself from clicking malicious links that install malware that may get access to your private information is to have antivirus protection installed on all your devices. This can also alert you of any phishing emails or ransomware scams.
Get my picks for the best 2024 antivirus protection winners for your Windows, Mac, Android and iOS devices.
2) Download apps from reliable sources: It’s important to download apps only from trusted sources, like the Google Play Store. The FakeCalls malware infects your phone when you download an app from an unknown link. As an Android user, you should only download apps from the Play Store, which has strict checks to prevent malware and other harmful software. Avoid downloading apps from unknown websites or unofficial stores, as they pose a higher risk to your personal data and device. Also, never trust download links that you receive through SMS.
3) Be cautious with app permissions: Always review the permissions requested by apps before installation. If an app requests access to features that seem unnecessary for its function, it could be a sign of malicious intent. Do not give any app Accessibility permissions unless you really need to. Avoid granting permissions that could compromise your personal data.
4) Regularly update your device’s operating system and apps: Keeping your software up to date is crucial, as updates often include security patches for newly discovered vulnerabilities that could be exploited by malware like FakeCalls.
5) Monitor financial activity regularly: Check your bank and credit card statements often for unauthorized transactions. Set up alerts for any account activity, which can notify you immediately if suspicious activity occurs.
6) Limit sensitive transactions on mobile: Whenever possible, avoid performing high-risk transactions (like large money transfers) on your mobile device, especially if you’re in public or connected to unsecured Wi-Fi. Use a secure computer or contact your bank directly from a verified number.
THE HIDDEN COSTS OF FREE APPS: YOUR PERSONAL INFORMATION
Kurt’s key takeaway
Hackers are constantly upgrading their tactics and finding clever ways to hack your devices and scam you out of your hard-earned money. I really think Android phone manufacturers and Google need to step up their game on security to help keep users from getting hacked so often. I don’t see the same level of malware affecting iPhones.
How comfortable are you using your mobile phone for financial transactions, and what would make you feel safer? Let us know by writing us at Cyberguy.com/Contact.
For more of my tech tips and security alerts, subscribe to my free CyberGuy Report Newsletter by heading to Cyberguy.com/Newsletter.
Ask Kurt a question or let us know what stories you’d like us to cover. Follow Kurt on his social channels:
Answers to the most asked CyberGuy questions:
New from Kurt:
Copyright 2024 CyberGuy.com. All rights reserved.
Technology
Your favorite musician’s favorite TikTok show
Guess the artist, win five bucks. Whether you’re a random person on the streets of New York, an A-list celebrity, or the sitting Vice President of the United States, that’s the pitch behind one of the most fun music shows on social media. You show up, you get some headphones and a microphone, and you hope you know what song is playing.
The show is called Track Star, and it’s hosted by Jack Coyne. On this episode of The Vergecast, the first in our three-part miniseries about the future of music, Coyne joins the show to tell us the story of Track Star.
We talk about the show’s beginnings as a trivia show about New York called Public Opinion, how Coyne and his co-creators figured out the show’s structure and pace, how he thinks about his role as the host, and why a bunch of famous people started clamoring to be on the show. Coyne never expected Track Star to feature the likes of Ed Sheeran, Olivia Rodrigo, Jack Antonoff, Nelly Furtado, Kamala Harris, and Oprah, but it happened. And somewhat remarkably, it didn’t change the show at all.
We also dig into why a show like Track Star works, and why it matters, in the current music landscape. Coyne and his team have big plans for expanding the franchise, too, and sees a place for Track Star even in an online world already overloaded with stuff to listen to. If you start with music, conversation, and a decent playlist, there are plenty of places you can go.
If you want to know more about everything we discuss in this episode, here are some links to get you started:
-
Sports1 week ago
Freddie Freeman's walk-off grand slam gives Dodgers Game 1 World Series win vs. Yankees
-
News1 week ago
Sikh separatist, targeted once for assassination, says India still trying to kill him
-
Culture1 week ago
Freddie Freeman wallops his way into World Series history with walk-off slam that’ll float forever
-
Technology1 week ago
When a Facebook friend request turns into a hacker’s trap
-
Business3 days ago
Carol Lombardini, studio negotiator during Hollywood strikes, to step down
-
Health4 days ago
Just Walking Can Help You Lose Weight: Try These Simple Fat-Burning Tips!
-
Business2 days ago
Hall of Fame won't get Freddie Freeman's grand slam ball, but Dodgers donate World Series memorabilia
-
Business7 days ago
Will Newsom's expanded tax credit program save California's film industry?