Technology

How to Run Your Own ChatGPT-Like LLM for Free (and in Private)

Published

on

The power of large language models (LLMs) such as ChatGPT, generally made possible by cloud computing, is obvious, but have you ever thought about running an AI chatbot on your own laptop or desktop? Depending on how modern your system is, you can likely run LLMs on your own hardware. But why would you want to?

Well, maybe you want to fine-tune a tool for your own data. Perhaps you want to keep your AI conversations private and offline. You may just want to see what AI models can do without the companies running cloud servers shutting down any conversation topics they deem unacceptable. With a ChatGPT-like LLM on your own hardware, all of these scenarios are possible.

And hardware is less of a hurdle than you might think. The latest LLMs are optimized to work with Nvidia graphics cards and with Macs using Apple M-series processors—even low-powered Raspberry Pi systems. And as new AI-focused hardware comes to market, like the integrated NPU of Intel’s “Meteor Lake” processors or AMD’s Ryzen AI, locally run chatbots will be more accessible than ever before.

Thanks to platforms like Hugging Face and communities like Reddit’s LocalLlaMA, the software models behind sensational tools like ChatGPT now have open-source equivalents—in fact, more than 200,000 different models are available at this writing. Plus, thanks to tools like Oobabooga’s Text Generation WebUI, you can access them in your browser using clean, simple interfaces similar to ChatGPT, Bing Chat, and Google Bard.


The software models behind sensational tools like ChatGPT now have open-source equivalents—in fact, more than 200,000 different models are available.

Advertisement

So, in short: Locally run AI tools are freely available, and anyone can use them. However, none of them is ready-made for non-technical users, and the category is new enough that you won’t find many easy-to-digest guides or instructions on how to download and run your own LLM. It’s also important to remember that a local LLM won’t be nearly as fast as a cloud-server platform because its resources are limited to your system alone.

Nevertheless, we’re here to help the curious with a step-by-step guide to setting up your own ChatGPT alternative on your own PC. Our guide uses a Windows machine, but the tools listed here are generally available for Mac and Linux systems as well, though some extra steps may be involved when using different operating systems.


Some Warnings About Running LLMs Locally

First, however, a few caveats—scratch that, a lot of caveats. As we said, these models are free, made available by the open-source community. They rely on a lot of other software, which is usually also free and open-source. That means everything is maintained by a hodgepodge of solo programmers and teams of volunteers, along with a few massive companies like Facebook and Microsoft. The point is that you’ll encounter a lot of moving parts, and if this is your first time working with open-source software, don’t expect it to be as simple as downloading an app on your phone. Instead, it’s more like installing a bunch of software before you can even think about downloading the final app you want—which then still may not work. And no matter how thorough and user-friendly we try to make this guide, you may run into obstacles that we can’t address in a single article.

Also, finding answers can be a real pain. The online communities devoted to these topics are usually helpful in solving problems. Often, someone’s solved the problem you’re encountering in a conversation you can find online with a little searching. But where is that conversation? It might be on Reddit, in an FAQ, on a GitHub page, in a user forum on HuggingFace, or somewhere else entirely. 


AI is quicksand. Everything moves whip-fast, and the environment undergoes massive shifts on a constant basis.

Advertisement

It’s worth repeating that open-source AI is moving fast. Every day new models are released, and the tools used to interact with them change almost as often, as do the underlying training methods and data, and all the software undergirding that. As a topic to write about or to dive into, AI is quicksand. Everything moves whip-fast, and the environment undergoes massive shifts on a constant basis. So much of the software discussed here may not last long before newer and better LLMs and clients are released.

Bottom line: Proceed at your own risk. There’s no Geek Squad to call for help with open-source software; it’s not all professionally maintained; and you’ll find no handy manual to read or customer service department to turn to—just a bunch of loosely organized online communities.

Finally, once you get it all running, these AI models have varying degrees of polish, but they all carry the same warnings: Don’t trust what they say at face value, because it’s often wrong. Never look to an AI chatbot to help make your health or financial decisions. The same goes for writing your school essays or your website articles. Also, if the AI says something offensive, try not to take it personally. It’s not a person passing judgment or spewing questionable opinions; it’s a statistical word generator made to spit out mostly legible sentences. If any of this sounds too scary or tedious, this may not be a project for you.


Select Your Hardware

Before you begin, you’ll need to know a few things about the machine on which you want to run an LLM. Is it a Windows PC, a Mac, or a Linux box? This guide, again, will focus on Windows, but most of the resources referenced offer additional options and instructions for other operating systems.

You also need to know whether your system has a discrete GPU or relies on its CPU’s integrated graphics. Plenty of open-source LLMs can run solely on your CPU and system memory, but most are made to leverage the processing power of a dedicated graphics chip and its extra video RAM. Gaming laptops, desktops, and workstations are better suited to these applications, since they have the powerful graphics hardware these models often rely on.

Advertisement

Gaming laptops and mobile workstations offer the best hardware for running LLMs at home. (Credit: Molly Flores)

In our case, we’re using a Lenovo Legion Pro 7i Gen 8 gaming notebook, which combines a potent Intel Core i9-13900HX CPU, 32GB of system RAM, and a powerful Nvidia GeForce RTX 4080 mobile GPU with 12GB of dedicated VRAM.

If you’re on a Mac or Linux system, are CPU-dependent, or are using AMD instead of Intel hardware, be aware that while the general steps in this guide are correct, you may need extra steps and additional or different software to install. And the performance you see could be markedly different from what we discuss here.


Set Up Your Environment and Required Dependencies

To start, you must download some necessary software: Microsoft Visual Studio 2019. Any updated version of Visual Studio 2019 will work (though not newer annualized releases), but we recommend getting the latest version directly from Microsoft.

(Credit: Brian Westover/Microsoft)

Advertisement

Personal users will be fine to skip the Enterprise and Professional versions and use just the BuildTools version of the software.

Find the latest version of Visual Studio 2019 and download the BuildTools version (Credit: Brian Westover/Microsoft)

After choosing that, be sure to select “Desktop Development with C++.” This step is essential in order for other pieces of software to work properly.

Be sure to select “Desktop development with C++.” (Credit: Brian Westover/Microsoft)

Begin your download and kick back: Depending on your internet connection, it could take several minutes before the software is ready to launch.

Advertisement

(Credit: Brian Westover/Microsoft)


Download Oobabooga’s Text Generation WebUI Installer

Next, you need to download the Text Generation WebUI tool from Oobabooga. (Yes, it’s a silly name, but the GitHub project makes an easy-to-install and easy-to-use interface for AI stuff, so don’t get hung up on the moniker.)

(Credit: Brian Westover/Oobabooga)

To download the tool, you can either navigate through the GitHub page or go directly to the collection of one-click installers Oobabooga has made available. We’ve installed the Windows version, but this is also where you’ll find installers for Linux and macOS. Download the zip file shown below.

(Credit: Brian Westover/Oobabooga)

Advertisement

Create a new file folder someplace on your PC that you’ll remember and name it AI_Tools or something similar. Do not use any spaces in the folder name, since that will mess up some of the automated download and install processes of the installer.

(Credit: Brian Westover/Microsoft)

Then, extract the contents of the zip file you just downloaded into your new AI_Tools folder.


Run the Text Generation WebUI Installer

Once the zip file has been extracted to your new folder, look through the contents. You should see several files, including one called start_windows.bat. Double-click it to begin installation.

Depending on your system settings, you might get a warning about Windows Defender or another security tool blocking this action, because it’s not from a recognized software vendor. (We haven’t experienced or seen anything reported online to indicate that there’s any problem with these files, but we’ll repeat that you do this at your own risk.) If you wish to proceed, select “More info” to confirm whether you want to run start_windows.bat. Click “Run Anyway” to continue the installation.

Advertisement

(Credit: Brian Westover/Microsoft)

Now, the installer will open up a command prompt (CMD) and begin installing the dozens of software pieces necessary to run the Text Generation WebUI tool. If you’re unfamiliar with the command-line interface, just sit back and watch.

(Credit: Brian Westover/Microsoft)

First, you’ll see a lot of text scroll by, followed by simple progress bars made up of hashtag or pound symbols, and then a text prompt will appear. It will ask you what your GPU is, giving you a chance to indicate whether you’re using Nvidia, AMD, or Apple M series silicon or just a CPU alone. You should already have figured this out before downloading anything. In our case, we select A, because our laptop has an Nvidia GPU.

(Credit: Brian Westover/Microsoft)

Advertisement

Once you’ve answered the question, the installer will handle the rest. You’ll see plenty of text scroll by, followed first by simple text progress bars and then by more graphically pleasing pink and green progress bars as the installer downloads and sets up everything it needs.

(Credit: Brian Westover/Microsoft)

At the end of this process (which may take up to an hour), you’ll be greeted by a warning message surrounded by asterisks. This warning will tell you that you haven’t downloaded any large language model yet. That’s good news! It means that Text Generation WebUI is just about done installing.

(Credit: Brian Westover/Microsoft)

At this point you’ll see some text in green that reads “Info: Loading the extension gallery.” Your installation is complete, but don’t close the command window yet.

Advertisement

(Credit: Brian Westover/Microsoft)


Copy and Paste the Local Address for WebUI 

Immediately below the green text, you’ll see another line that says “Running on local URL: http://127.0.01:7860.” Just click that URL text, and it will open your web browser, serving up the Text Generation WebUI—your interface for all things LLM.

(Credit: Brian Westover/Microsoft)

You can save this URL somewhere or bookmark it in your browser. Even though Text Generation WebUI is accessed through your browser, it runs locally, so it’ll work even if your Wi-Fi is turned off. Everything in this web interface is local, and the data generated should be private to you and your machine.

(Credit: Brian Westover/Oobabooga)

Advertisement

Close and Reopen WebUI

Once you’ve successfully accessed the WebUI to confirm it’s installed correctly, go ahead and close both the browser and your command window.

In your AI_Tools folder, open up the same start_windows batch file that we ran to install everything. It will reopen the CMD window but, instead of going through that whole installation process, will load up a small bit of text including the green text from before telling you that the extension gallery is loaded. That means the WebUI is ready to open again in your browser.

(Credit: Brian Westover/Oobabooga)

Use the same local URL you copied or bookmarked earlier, and you’ll be greeted once again by the WebUI interface. This is how you will open the tool in the future, leaving the CMD window open in the background.


Select and Download an LLM

Now that you have the WebUI installed and running, it’s time to find a model to load. As we said, you’ll find thousands of free LLMs you can download and use with WebUI, and the process of installing one is pretty straightforward.

Advertisement

If you want a curated list of the most recommended models, you can check out a community like Reddit’s /r/LocalLlaMA, which includes a community wiki page that lists several dozen models. It also includes information about what different models are built for, as well as data about which models are supported by different hardware. (Some LLMs specialize in coding tasks, while others are built for natural text chat.)

These lists will all end up sending you to Hugging Face, which has become a repository of LLMs and resources. If you came here from Reddit, you were probably directed straight to a model card, which is a dedicated information page about a specific downloadable model. These cards provide general information (like the datasets and training techniques that were used), a list of files to download, and a community page where people can leave feedback as well as request help and bug fixes.

At the top of each model card is a big, bold model name. In our case, we used the the WizardLM 7B Uncensored model made by Eric Hartford. He uses the screen name ehartford, so the model’s listed location is “ehartford/WizardLM-7B-Uncensored,” exactly how it’s listed at the top of the model card.

Next to the title is a little copy icon. Click it, and it will save the properly formatted model name to your clipboard.

(Credit: Brian Westover/Hugging Face)

Advertisement

Back in WebUI, go to the model tab and enter that model name into the field labeled “Download custom model or LoRA.” Paste in the model name, hit Download, and the software will start downloading the necessary files from Hugging Face.

(Credit: Brian Westover/Oobabooga)

If successful, you’ll see an orange progress bar pop up in the WebUI window and several progress bars will appear in the command window you left open in the background.

(Credit: Brian Westover/Oobabooga)

(Credit: Brian Westover/Oobabooga)

Advertisement

Once it’s finished (again, be patient), the WebUI progress bar will disappear and it will simply say “Done!” instead.


Load Your Model and Settings in WebUI

Once you’ve got a model downloaded, you need to load it up in WebUI. To do this, select it from the drop-down menu at the upper left of the model tab. (If you have multiple models downloaded, this is where you choose one to use.)

Before you can use the model, you need to allocate some system or graphics memory (or both) to running it. While you can tweak and fine-tune nearly anything you want in these models, including memory allocation, I’ve found that setting it at roughly two-thirds of both GPU and CPU memory works best. That leaves enough unused memory for your other PC functions while still giving the LLM enough memory to track and hold a longer conversation.

(Credit: Brian Westover/Oobabooga)

Once you’ve allocated memory, hit the Save Settings button to save your choice, and it will default to that memory allocation every time. If you ever want to change it, you can simply reset it and press Save Settings again.

Advertisement

Enjoy Your LLM!

With your model loaded up and ready to go, it’s time to start chatting with your ChatGPT alternative. Navigate within WebUI to the Text Generation tab. Here you’ll see the actual text interface for chatting with the AI. Enter text into the box, hit Enter to send it, and wait for the bot to respond.

(Credit: Brian Westover/Oobabooga)

Here, we’ll say again, is where you’ll experience a little disappointment: Unless you’re using a super-duper workstation with multiple high-end GPUs and massive amounts of memory, your local LLM won’t be anywhere near as quick as ChatGPT or Google Bard. The bot will spit out fragments of words (called tokens) one at a time, with a noticeable delay between each.

However, with a little patience, you can have full conversations with the model you’ve downloaded. You can ask it for information, play chat-based games, even give it one or more personalities. Plus, you can use the LLM with the assurance that your conversations and data are private, which gives peace of mind.

You’ll encounter a ton of content and concepts to explore while starting with local LLMs. As you use WebUI and different models more, you’ll learn more about how they work. If you don’t know your text from your tokens, or your GPTQ from a LoRA, these are ideal places to start immersing yourself in the world of machine learning.

Advertisement

Like What You’re Reading?

Sign up for Tips & Tricks newsletter for expert advice to get the most out of your technology.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending

Exit mobile version