Creating a ChatGPT Clone (With Context)

This tutorial is part of my book "OpenAI GPT For Python Developers".

The goal of this book is to provide a step-by-step guide to using GPT-3 in your projects through this API but not only - many other tools and models built by OpenAI such as Whisper (an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data), CLIP (Contrastive Language-Image Pre-Training), a neural network trained on a variety of (image, text) pairs and DALL·E 2, a new AI system that can create realistic images and art from a description in natural language.

OpenAI GPT For Python Developers (source)

Note: After publishing this article, OpenAI announced that has released ChatGPT API. ChatGPT API will maintain context. However, the model behind it (gpt-3.5-turbo) does not support fine-tuning. Therefore, for fine-tuned models and advanced use cases, the instructions in this tutorial are still valid and helpful.

The Problem

GPT is a generative text model which means that produces new text by predicting what comes next based on the input it gets from the user. The model was trained on a large corpus of text (books, articles, and websites) and it used this data and learned patterns and relationships between words and phrases.

By default, the model has no memory when you initiate a discussion with it. This means each input is treated independently without any context or information carried over from the previous user prompts. This is certainly not ideal for human-friendly interactions. While this seems like a limitation, it actually allows the model to generate more diverse and less repetitive text.

In some cases, carrying over context is useful and necessary. Some techniques like fine-tuning on a specific topic help in improving the quality of outputs, the other technique that we are going to implement next is much easier to implement.

No Context = Chaos of Randomness

Let’s start by building a simple chatbot, initially, we are going to initiate a discussion as our goal is to compare the outputs of the model now and later when we add more context to the conversation.

The following test shows that by default, the chat agent we built doesn’t have any memory:

You: Hey

AI: Hey! How can I help you today?

You: Remember this string 'blah' and store it. I'll ask you about it later.

AI: Sure, I remember the string 'blah'.

You: What was the string?

AI: I'm not sure what you mean - can you elaborate?

You: What was the string I told you to remember?

AI: I'm sorry, I don't remember any strings you told me.

You: I already told you!

AI: I'm sorry, I didn't catch that. Could you please explain the context of the last message?

History = Context

The idea is pretty simple, and it works by creating a history variable where we store the request of the user and the text generated by the model. When the user asks a new question, the history is inserted before the new question.

This is how the same discussion went:

You: Hey AI: Hi there! How are you? You: Remember this string 'blah' and store it. I'll ask you about it later. AI: Got it! What would you like to know about 'blah'? You: What was the string? AI: The string was 'blah'. You: Why? AI: You asked me to remember the string 'blah' and store it, so I did.

The Problem with Carrying Over History

With long discussions, the user prompt will be longer since it will always be added to the history until the point when it reaches the maximum number of tokens allowed by OpenAI. In this case, the result is a total failure, as the API will respond with errors.

The second problem here is the cost. You are charged by tokens, so the more tokens you have in your input, the more expensive it will be.

Last in First out (LIFO) Memory

I am not sure if this approach has a technical name, but I called it “last in first out” since the idea behind it is simple:

Users will always initiate discussions with a context.
Context evolves and the discussion too.
Users will most likely include the context in the latest 2 to 5 prompts.

Based on this, we could assume that a better approach is to only store the most recent prompts.

This is how it works in a few words: We create a text file where we will store the history, then we store the historical prompts and answers separated by a separator that is not found in the discussion. For example: #####

Then we retrieve the last 2 and add them to the user prompt as a context. Instead of a text file, you can use a PostgreSQL database, a Redis database, or whatever you want.

Let’s take a look at the code:

The Problem with Last in First out Memory

This approach I called the Last in First out memory may struggle when a discussion becomes very complex, and the user needs to switch back and forth between different contexts. In such cases, the approach may not be able to provide the required context to the user as it only stores the most recent prompts. This can lead to confusion and frustration for the user, which is not ideal for human-friendly interactions.

Selective Context

The solution suggested in this part will work as follows:

An initial prompt is saved to a text file
The user enters a prompt
The program creates embeddings for all interactions in the file
The program creates embeddings for the user's prompt
The program calculates the cosine similarity between the user's prompt and all interactions in the file
The program sorts the file by the cosine similarity
The best n interactions are read from the file and sent with the prompt to the user

We are using a text file here to make things simple, but as said previously, you can use any data store.

These are the different functions we are going to use to perform the above:

Here is what sort_history function does, explained step by step:

Split the history into segments: The function first splits the input history string into segments based on the specified separator (in our example, we will use ##### which we will declare later). This creates a list of segments representing each interaction in history.
Compute the cosine similarity: For each segment, the function computes the cosine similarity between the user's input and the segment using the cos_sim function. The cosine similarity measures the similarity between two vectors as we have seen in the previous chapters. Although we could have used OpenAI embedding, our goal is to reduce computing costs by performing certain tasks locally instead of relying on the API.
Sort the similarities: The function sorts the similarities in ascending order using np.argsort, which returns the indices of the sorted similarities in the order of their values. This creates a list of indices representing the segments sorted by their similarity to the user's input.
Reconstruct the sorted history: We iterate over the sorted indices in reverse order and concatenate the corresponding segments together into a new string. This creates a new, sorted history string in which the most similar interactions to the user's input appear first.
Save the sorted history: Finally, we save the sorted history to a file using the save_history_to_file function.

This is how we used these functions after defining the initial prompts, the value of the separator and saved the initial prompts to a file.

If we put everything together, this is what we get:

This tutorial is part of my book "OpenAI GPT For Python Developers".

OpenAI GPT For Python Developers (source)

Whether you’re building a chatbot, an AI (voice) assistant, a semantic search engine, a classification system, a recommendation engine a web app providing AI-generated data, or any other sort of natural language/image/voice processing and generation platform, this guide will help you reach your goals.

If you have the basics of Python programming language and are open to learning a few more techniques like using Pandas Dataframes and some NLP techniques, you have all the necessary tools to start building intelligent systems using OpenAI tools.