Skip to content
Top 10 Open Source ChatGPT Alternatives & How to Use Them

Top 10 Open Source ChatGPT Alternatives: Bridging the Gap in Conversational AI

Chatbots have revolutionized the way businesses interact with their customers. The state-of-the-art GPT-4, developed by OpenAI, is a dominant player in this space. However, it is not open-source, which restricts developers from reproducing the results or developing their own chatbots similar to GPT-4.

To fill this void, open-source communities have started offering alternatives to GPT-4 that deliver near-identical performance and functionality while requiring less computational power. This article aims to introduce you to the top 10 open-source ChatGPT alternatives that you can utilize in your next AI project.

1. ColossalChat

ColossalChat (opens in a new tab), developed by HPC AI Tech, is an open-source project designed to replicate ChatGPT-like models based on the LLaMa model and the PyTorch AI framework. It is the first practical open-source project that includes a complete Reinforcement Learning from Human Feedback (RLHF) process, making it the closest project to the original technical route of ChatGPT.

ColossalChat leverages PyTorch's flexible and efficient deep learning framework, which allows for rapid prototyping, seamless integration with other libraries, and the delivery of a high-performance, user-friendly conversational AI experience.

One of the key features of ColossalChat is its bilingual dataset comprising approximately 100,000 Q&A pairs in both English and Chinese. This dataset was collected and cleaned from real-life question scenarios on social media platforms and serves as the seed dataset. It was expanded using self-instruct technology. This high-quality data allows ColossalChat to achieve better dialogue interactions and also support Chinese.

ColossalChat follows a three-stage RLHF algorithm replication process. The first stage involves supervised instruct fine-tuning. The second stage involves training a reward model. And the third stage uses the reinforcement learning algorithm. This replication process allows for greater consistency of the generated content with human values.

This project is supported by the AI large model development system Colossal-AI, which can efficiently and quickly deploy AI large model training and inference based on default PyTorch functionality. This infrastructure provides foundational support and significantly improves training speed.

Here's an example of how to train ColossalChat in each RLHF stage:

# Training with a 4-GPU servers
colossalai run β€” nproc_per_node=4 \
β€” pretrain β€œ/path/to/LLaMa-7B/” \
β€” model β€˜llama’ \
β€” strategy colossalai_zero2 \
β€” log_interval 10 \
β€” save_path /path/to/Coati-7B \
β€” dataset /path/to/data.json \
β€” batch_size 4 \
β€” accimulation_steps 8 \
β€” lr 2e-5
# Training with a 4-GPU servers
colossalai run β€” nproc_per_node=4 \
β€” pretrain β€œ/path/to/LLaMa-7B/” \
β€” model β€˜llama’ \
β€” strategy colossalai_zero2 \
β€” dataset /path/to/datasets
# Training with a 8-GPU servers
colossalai run β€” nproc_per_node
=8 prompts.csv \
β€” strategy colossalai_zero2 \
β€” pretrain β€œ/path/to/Coati-7B” \
β€” model β€˜llama’ \
β€” pretrain_dataset /path/to/dataset

The complete code for replicating ChatGPT based on the LLaMa model is open-sourced and can be accessed by developers and researchers alike.

2. Alpaca-LoRA

Alpaca-LoRA (opens in a new tab) appears to be a highly efficient tool for fine-tuning language models such as LLaMa, thanks to its use of the LoRA (Low-rank adaptation) technique.

LoRA offers multiple benefits over other fine-tuning methods, including:

  • Greater speed and less memory consumption, making it compatible with consumer hardware.
  • Smaller output size (megabytes instead of gigabytes).
  • The ability to combine multiple fine-tuned models during runtime.

Alpaca-LoRA, which implements PEFT (Python Easy Fine-Tuning) library, enables fine-tuning of transformer-based language models using LoRA. This leads to efficient and inexpensive model fine-tuning even on modest hardware, with potentially composable outputs.

The steps to fine-tune LLaMa using Alpaca-LoRA are as follows:


Before starting, ensure that you have access to a GPU machine. Even low-spec GPUs like an NVIDIA T4 or consumer GPUs like a 4090 are suitable due to the efficiency of LoRA. Also, you need the weights for LLaMa, which are not yet publicly available. You can apply for access through the Meta Research form.

Step 1: Clone the Alpaca-LoRA repo

Clone the Alpaca-LoRA repository that includes support for Cog (a tool used to package machine learning models in containers). Use the following commands:

git clone
cd alpaca-lora

Step 2: Install Cog

Next, install Cog with the following commands:

sudo curl -o /usr/local/bin/cog -L "$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog

Step 3: Get LLaMa weights

Place your downloaded weights in a folder named 'unconverted-weights'. The directory structure should look like this:

β”œβ”€β”€ 7B
β”‚   β”œβ”€β”€ checklist.chk
β”‚   β”œβ”€β”€ consolidated.00.pth
β”‚   └── params.json
β”œβ”€β”€ tokenizer.model
└── tokenizer_checklist.chk

Convert the weights from a PyTorch checkpoint to a transformers-compatible format using the following command:

cog run python -m transformers.models.llama.convert_llama_weights_to_hf \
  --input_dir unconverted-weights \
  --model_size 7B \
  --output_dir weights

Your final directory structure should look like this:

β”œβ”€β”€ llama-7b
└── tokenizermdki

Step 4: Fine-tune the model

If you have a GPU with more memory, you can increase MICRO_BATCH_SIZE to 32 or 64 in If you have your own instruction tuning dataset, edit DATA_PATH in to point to your own dataset. Make sure it has the same format as alpaca_data_cleaned.json.

Run the fine-tuning script:

cog run python

Fine-tuning can take around 3.5 hours on a 40GB A100 GPU. It might take longer for GPUs with less processing power.

Step 5: Run the model with Cog

Finally, you can run the model using Cog. For example:

$ cog predict -i prompt="Tell me something about alpacas."

The response will be an informative output about alpacas, demonstrating the successful fine-tuning of your LLaMa model.

3. Vicuna-13B

Part of FastChat, Vicuna leverages a transformer-based architecture, similar to GPT models, and is fine-tuned on conversational datasets from It delivers approximately 90% of ChatGPT's performance, providing an accessible and cost-effective alternative. Despite the lower performance, Vicuna stands out due to its excellent customizability and adaptability to a broad range of tasks.

For more details about how to use it, refer to our detailed article on Vicuna-13B.


The Nomic AI Team's GPT4ALL offers a chatbot built on extensive curated data, such as word problems, code, stories, illustrations, and multi-turn dialogues. While it utilizes LLaMa for low-latency ML acceleration like GPT-4, GPT4ALL's strength lies in its diverse dataset and adaptability to various tasks.

Here's an example of GPT4ALL in action:

For more details about how to use it, refer to our detailed article on GPT4ALL.

5. Raven RWKV

The RWKV (opens in a new tab) (Raven RWKV) is a newer model, as of my knowledge cutoff in September 2021. However, based on the information you provided, here's a general step-by-step guide on how to use it, along with some code snippets:

First, you'll want to install the necessary package. The RWKV package is hosted on PyPI, and you can install it using pip:

pip install rwkv

Then, you will need to import the model from the package:

from rwkv.model import RWKV

Next, you'll instantiate the model. This involves specifying the model path, and the strategy to be used:

model = RWKV(model='/path/to/your/model', strategy='cuda fp16')

This creates an instance of the model that can be used for inference.

Next, you will use the model's forward method to perform inference. This method takes two parameters: the input tokens, and the state. For the initial run, you can set the state to None:

out, state = model.forward([187, 510, 1563, 310, 247], None)

You can then print the output of the model:


Then, for subsequent runs, you can provide the state from the previous run:

out, state = model.forward([187, 510], None)
out, state = model.forward([1563], state)
out, state = model.forward([310, 247], state)

This step-by-step guide shows the basic usage of the RWKV model for inference. It is important to note that the specific steps might vary depending on the task, the specific model weights being used, and other factors. Please refer to the official documentation for the most accurate information.

Also, remember that this model is relatively new, and further developments and improvements might have been made since my knowledge cutoff in September 2021. Always refer to the most recent and relevant literature and codebase for up-to-date and accurate information.

6. OpenChatKit

OpenChatKit (opens in a new tab) provides a complete toolkit for chatbot application development, positioning itself as an open-source ChatGPT alternative. While similar to GPT models in terms of structure, OpenChatKit enhances customization by enabling the training of instruction-tuned large language models and offering an extensible retrieval system for bot responses.

Step 1: Setup

Ensure you have the necessary system requirements and dependencies. You'll need Git LFS, Miniconda, and PyTorch, among others. The provided environment.yml file contains the specifications for the environment needed.

First, install Git LFS and Miniconda, then setup the environment as follows:

git lfs install
conda install mamba -n base -c conda-forge
mamba env create -f environment.yml 
conda activate OpenChatKit

Step 2: Chatting with Pythia-Chat-Base-7B:

To interact with the model, you can use the script located in the inference directory:

python inference/ --model togethercomputer/Pythia-Chat-Base-7B

You can then chat with the model by entering text at the provided command line prompt.

Step 3: Reproducing Pythia-Chat-Base-7B:

If you want to train the model yourself, you'll first need to download the training data and the base model:

python data/OIG/
python pretrained/Pythia-6.9B-deduped/

Then, you can fine-tune the model using the provided shell script:

bash training/

After training, convert the model to the Huggingface format using the conversion tool:

mkdir huggingface_models
python tools/ \
   --config-name EleutherAI/pythia-6.9b-deduped \
   --ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \
   --save-path huggingface_models/Pythia-Chat-Base-7B \
   --n-stages 4 \
   --n-layer-per-stage 8 \

Replace the model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 with the path to your model checkpoint.

Step 4: Testing the new model:

Once you've fine-tuned your model, you can chat with it using the script:

python inference/ --model ./huggingface_models/Pythia-Chat-Base-7B

Step 5. Monitoring:

For monitoring training, OpenChatKit provides support for both loguru and Weights & Biases.

Step 6. Experimental: Retrieval-Augmented Models

OpenChatKit also provides an experimental feature for retrieval-augmented models. This is implemented by querying a Faiss index of Wikipedia. You can run it by:

python data/wikipedia-3sentence-level-retrieval-index/
python inference/ --retrieval

Please refer to the official OpenChatKit documentation for more detailed and accurate information. These steps are based on the information you provided.

7. OPT

OPT (opens in a new tab) (Open Pre-trained Transformer) Language Models exhibit exceptional abilities in zero-shot and few-shot learning, and Stereotypical Bias analysis, although they do not match the quality of ChatGPT. These models are decoder-only transformers, meaning they generate text autoregressively from left to right, similar to the approach of GPT models.

Here's a more detailed step-by-step breakdown on how you can use OPT models for each of these tasks:

Step 1: Text Generation

To use an OPT model for text generation, you'll first need to load it into a pipeline. Here is an example using the transformers library by Hugging Face:

from transformers import pipeline
generator = pipeline('text-generation', model="facebook/opt-350m")

Once you've set up the pipeline, you can generate text like this:

print(generator("Hello, I am a", max_length=50)[0]['generated_text'])

This will print a text that starts with "Hello, I am a" and continues for up to 50 tokens.

Step 2. Zero-shot Learning

Zero-shot learning involves applying the model to tasks it wasn't specifically trained on. For instance, you can use it for text classification without any further training. Here's how you can do it:

from transformers import pipeline
classifier = pipeline("text-classification", model="facebook/opt-350m")
print(classifier("I love sunny days.", ["weather", "emotion"]))

This will classify the sentence "I love sunny days." in terms of "weather" and "emotion" and print the probabilities.

Step 3: Few-shot Learning

Few-shot learning involves providing a small number of examples to help the model understand the task. For instance, if you want the model to translate English to French, you can provide a few example translations:

from transformers import pipeline
translator = pipeline('translation', model="facebook/opt-350m")
examples = [
    {"English": "Hello", "French": "Bonjour"},
    {"English": "Goodbye", "French": "Au revoir"},
print(translator("Good morning"))

Please note that this example is oversimplified for the sake of illustration. The actual usage might be a bit more complicated and require more sophisticated setup.

Step 4: Stereotypical Bias Analysis

You can use the OPT model to analyze the stereotypical biases present in its generated text. Here's an example:

from transformers import pipeline
generator = pipeline('text-generation', model="facebook/opt-350m")
female_prompt = "The woman worked as a"
male_prompt = "The man worked as a"
female_output = generator(female_prompt, num_return_sequences=5)
male_output = generator(male_prompt, num_return_sequences=5)
print("Female prompt outputs:")
for output in female_output:
print("Male prompt outputs:")
for output in male_output:

This will print 5 generated sequences for each prompt, and you can analyze these for any potential biases. Be aware that such analysis can be a complex task and might require advanced natural language processing (NLP) techniques.

Remember, you might need to adjust the model names depending on the specific OPT models available on the Hugging Face Model Hub. Also, as of my knowledge cutoff in September 2021, some of the functions like translator.set_examples(examples) might not exist in the Transformers library. It was given to show a conceptual example.

8. Flan-T5-XXL

Flan-T5-XXL (opens in a new tab) is a collection of fine-tuned T5 models that have been trained on a vast compilation of instructional datasets. These models, although not based on the transformer architecture like GPT models, exhibit significantly improved performance across various model classes, including PaLM, T5, and U-PaLM.

To use Flan-T5-XXL, you can follow the sample usage guide below:

# Assuming you have already cloned the Flan-T5-XXL repository and set up the environment
from flan_t5_xx1 import FlanT5XXL
# Initialize the Flan-T5-XXL model
model = FlanT5XXL()
# Example usage: Generate instructions for a task
task_input = "How to bake a cake"
instructions = model.generate_instructions(task_input)

This example demonstrates how you can generate instructions for a given task using the Flan-T5-XXL model. The task_input variable contains the task description, and the generate_instructions() method generates the corresponding instructions.

Please note that the above code snippet assumes that you have already cloned the Flan-T5-XXL repository and set up the required dependencies.

Flan-T5-XXL provides a modular and composable framework for training and evaluating sequence models, with a focus on language tasks. It is implemented using JAX and Flax, based on the T5 codebase. Flan-T5-XXL offers a high level of configurability and self-service capabilities, allowing researchers to train and evaluate sequence models at different scales.

It's important to refer to the official documentation and examples provided by Flan-T5-XXL for a comprehensive understanding of the available functionalities and how to use them effectively.

A sample usage of Flan-T5-XXL could be as follows:

from flan import FlanT5
# Initialize model
flan_model = FlanT5()
# Generate response
response = flan_model.generate("Translate this text to French.")

9. Baize

Baize (opens in a new tab) is an open-source chat model trained with LoRA. It incorporates 100k self-generated dialogs from ChatGPT and utilizes Alpaca's data for improved performance. Models with different sizes, such as 7B, 13B, and 30B, have been released.

To interact with Baize using Fastchat's CLI and API, follow these steps:

  1. Install Fastchat:
pip install git+
pip install git+
  1. Merge Baize's LoRA weights (V1 models only):
python3 -m fastchat.model.apply_lora --base huggyllama/llama-7b --target ./model_weights/baize-7b --lora project-baize/baize-lora-7B
  1. Run the CLI:
python -m fastchat.serve.cli --model-path ./model_weights/baize-7b

Baize can also be used with OpenAI API or Hugging Face API.

For the Baize demo, you can run it locally by following these steps:

  1. Install required packages:
cd demo
pip install -r requirements.txt
  1. Host the model locally:
# For V1 models
python $base_model $lora_model
# For V2 models
python $base_model None

The Baize demo provides a user-friendly Gradio interface for chatting.

These are simplified sample codes. For more detailed instructions and options, please refer to the Baize project documentation.

10. Koala

Koala (opens in a new tab) is an AI dialogue model trained through fine-tuning LLaMA on a dialogue dataset collected from the web. It surpasses Alpaca's performance and demonstrates comparable results to ChatGPT in various scenarios. One of Koala's key advantages is its extensive customization and adaptability, facilitated by the availability of training code, public weights, and a dialogue fine-tuner.

In the context of building a 100% free personal "ChatGPT" bot powered by Koala, you can utilize the provided Colab notebook. Here is an overview of the process:

Step 1: Access the Koala Colab notebook

A pre-configured notebook by a machine learning expert named Sam Witteveen is available for running the Koala model. You can find the notebook here. Copy the notebook to your own Google Drive.

Step 2: Run the notebook

Once you have the notebook in your Google Drive, you can execute it. The notebook starts by installing necessary modules and importing them. It then loads the pre-trained model, samwit/koala-7b, using the LlamaTokenizer and LlamaForCausalLM from the transformers library. The model is loaded in 8-bit mode, enabling compatibility with cost-effective GPUs.

from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig, pipeline
import torch
import textwrap
tokenizer = LlamaTokenizer.from_pretrained("samwit/koala-7b")
base_model = LlamaForCausalLM.from_pretrained(

Step 3: Set up the text generation pipeline

The notebook sets up a pipeline for text generation using the Hugging Face pipeline method. Parameters such as maximum length, temperature, and repetition penalty are defined. Additionally, a utility function named wrap_text_preserve_newlines() is provided to enhance the appearance of the generated text.

pipe = pipeline(
def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')
    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]
    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)
    return wrapped_text

Step 4: Engage in conversations

The notebook provides examples of prompt-response conversations using the pipe() method from the Hugging Face library. It's important to note that the success of the model heavily relies on using appropriate prompts at the beginning of each conversation. The notebook suggests using a prompt that starts with "BEGINNING OF CONVERSATION: USER:" to activate the desired logic. You are encouraged to experiment with different prompts and parameters to observe the model's responses.

Overall, Koala proves to be a promising alternative to larger language models like GPT-3. By carefully curating the training data, even a smaller model can deliver impressive performance. The Koala team and community experts have made it convenient to access and experiment with the model through the online demo and the provided Google Colab notebook. Whether you aim to develop a chatbot or conduct LLM research without incurring model usage costs, Koala is an excellent choice.


The open-source landscape is rich with alternatives to ChatGPT, each offering unique capabilities. Whether you're an AI enthusiast, researcher, or developer, these tools can help you build and fine-tune your own conversational models. So go ahead, and dive into the world of open-source conversational AI.