Build Smart Custom AI Chatbots using OpenAI API and Python

With the introduction of OpenAI’s Natural Language Processing models, such as GPT 3.5 and GPT 4 (also known as ChatGPT), it has become extremely easy to create your own custom AI chatbot using your favorite programming language.

In this blog, we will cover how to make an OpenAI chatbot that can be used to chat with documents using Python and PGVector (PostgreSQL).

Step 1 – Installing Required Packages

We have selected Python as our programming language because it has a wide range of packages that can help us simplify our code and develop a Python AI chatbot faster.

For our chatbot, the main package we will be using is LangChain. LangChain is an open-source framework that streamlines the process of creating generative AI applications. It contains many helper methods and classes that will simplify our code to a very big extent.

To work with OpenAI, PostgreSQL, and PDF files, LangChain also requires a few extra packages to be installed. We can run the following command on the terminal to install all the required packages:

pip install langchain langchain_openai pypdf psycopg2-binary pgvector

Step 2 – Storing Document Embeddings in PostgreSQL

For chatting with documents, we need to convert them first into embeddings and store them in a database (PostgreSQL) for easy retrieval. To make this tutorial simple, we will use just one PDF document that contains a weather report for Houston, Texas. Here is the content of the document:

“Houston, Texas Weather Report

Date: January 23, 2024

Good morning, Houston! Here’s your weather update for today:

Current Conditions:

The temperature in Houston is currently 65°F (18°C) with partly cloudy skies. Humidity is at 75%, and winds are calm at 5 mph. It’s a comfortable start to the day.

Today’s Forecast:

Expect a mix of sun and clouds throughout the day. The high temperature will reach around 78°F (26°C) in the afternoon, providing a pleasant and mild day. However, there is a slight chance of scattered showers later in the day, so it’s a good idea to keep an umbrella handy just in case.

Tonight:

As the evening approaches, the temperature will gradually drop to around 60°F (15°C). The chance of rain persists, so it’s recommended to stay prepared for a few scattered showers. Winds will
remain light, contributing to a calm evening.

Extended Outlook:

Looking ahead, the weather pattern remains relatively mild over the next few days. Expect temperatures to hover around the mid-70s during the day and the low 60s at night. There’s a chance of intermittent clouds and isolated showers, so it’s advisable to stay updated on the forecast for any changes.

Stay Informed:

As always, stay tuned to local news and weather updates for any changes in the forecast. If you have outdoor plans, keep an eye on the sky and be prepared for potential rain showers.
That’s your Houston weather update for today. Have a great day and stay weather-aware!”

Let’s look at how we load this document and create embeddings for it using the LangChain package.

To store embeddings in PGVector, PostgreSQL should have vector extension enabled. To do that, run the following command in PSQL:

CREATE EXTENSION vector;

Then, we use the following Python code to load our PDF document, create its embeddings, and store them in a PGVector collection.

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores.pgvector import PGVector
from langchain_openai import OpenAIEmbeddings
import os

# set the openai api key
os.environ["OPENAI_API_KEY"] = ""

# load the document using langchain's pdf loader
loader = PyPDFLoader("Texas Weather Report.pdf")
pages = loader.load()


# create the db connection (replace credentials with your own)
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver="psycopg2",
    host="localhost",
    port="5432",
    database="chatbot",
    user="postgres",
    password="postgres",
)

# to create embeddings, we will use openai embeddings API
# langchain provides us a helper class to do this
embeddings = OpenAIEmbeddings()

# save embeddings to db
collection_store = PGVector.from_documents(
    embedding=embeddings,
    documents=pages,
    collection_name="chatbot",
    pre_delete_collection=False,
    connection_string=CONNECTION_STRING,
)

Here is what the above code does:

First, we set the OpenAI API Key as an environment variable so that the LangChain package can use it to send requests to the OpenAI API. Make sure to replace the placeholder with your own key.
Next, we load our PDF document using LangChain’s PyPDFLoader.
Then, we form a connection string with our PostgreSQL credentials using the LangChain PGVector helper method.
To create embeddings, we used the OpenAIEmbeddings class from LangChain that will call OpenAI’s API for us behind the scenes.
Finally, we use LangChain PGVector.from_documents method to create embeddings and store them in PostgreSQL, under the collection name “chatbot”. This method returns an instance of a collection store, which we will use in the next step.

Step 3 – Create Chat Chain and Connect With Collection Store

The next step in our Python and OpenAI chatbot is to create a “chat chain” that connects the document collection store with OpenAI’s Chat API and returns a response for the given query.

Let’s look at the code for this:

from langchain.chains import RetrievalQA
from langchain_openai import OpenAI

# create chat chain
chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
)


# helper function to chat with the chain
def chat(query):
    response = chain.invoke({"query": query})
    return response["result"]

Here’s the explanation for this block of code:

We use LangChain’s RetrievalQA.from_chain_type method to create a chat chain. We choose OpenAI as our LLM (Large Language Model).

This means LangChain will use OpenAI’s chat model to generate the responses, and all this will be happening behind the scenes. We also provide it with the collection store we created in the previous step.

Next, we create a helper method chat() that will be used to interact with the chain. This function will take in a query as input and return the response as output. We do this to create a “chatbot” feel, abstracting the logic of interacting with the chain.

Step 4 – Chat Away!

That’s it! Our custom AI chatbot is ready. We can chat for information within the document. Let’s see how it performs:

# input
chat("What is the forecast for houston in the next few days?")

# output: " The forecast for Houston in the next few days is relatively mild, with temperatures in the mid-70s during the day and low 60s at night. There is a chance of intermittent clouds and isolated showers, so it's important to stay updated on the forecast for any changes."

Amazing! Let’s look at a final example with a question that is a bit complex:

# input
chat("Is it safe to go outside today in Houston?")

# output: “ It is generally safe to go outside today in Houston, but there is a chance of scattered showers in the afternoon and evening. It is recommended to stay prepared with an umbrella.”

Awesome! We can see how smart our custom AI chatbot is, as this information was not directly available in the document, but it managed to derive it from the provided data.

Conclusion

In this blog, we saw how we can create a smart custom AI chatbot using Python and the OpenAI API. Using the LangChain package, we saw how easy it is to create and store embeddings for documents in PostgreSQL and chat with those documents using the OpenAI API.

A chatbot like this can be easily connected to an application or website. You can even use it to smartly query important documents related to the website.

So why not try it out for yourself? Start creating your own custom AI chatbot today!

Build Smart Custom AI Chatbots using OpenAI API and Python

Posted by: Mustansir Muzaffar Hussain March 11, 2024

COMMENTS ()