With the introduction of OpenAI’s Natural Language Processing models, such as GPT 3.5 and GPT 4 (also known as ChatGPT), it has become extremely easy to create your own custom AI chatbot using your favorite programming language.
In this blog, we will cover how to make an OpenAI chatbot that can be used to chat with documents using Python and PGVector (PostgreSQL).
We have selected Python as our programming language because it has a wide range of packages that can help us simplify our code and develop a Python AI chatbot faster.
For our chatbot, the main package we will be using is LangChain. LangChain is an open-source framework that streamlines the process of creating generative AI applications. It contains many helper methods and classes that will simplify our code to a very big extent.
To work with OpenAI, PostgreSQL, and PDF files, LangChain also requires a few extra packages to be installed. We can run the following command on the terminal to install all the required packages:
pip install langchain langchain_openai pypdf psycopg2-binary pgvector
For chatting with documents, we need to convert them first into embeddings and store them in a database (PostgreSQL) for easy retrieval. To make this tutorial simple, we will use just one PDF document that contains a weather report for Houston, Texas. Here is the content of the document:
“Houston, Texas Weather Report
Date: January 23, 2024
Good morning, Houston! Here’s your weather update for today:
The temperature in Houston is currently 65°F (18°C) with partly cloudy skies. Humidity is at 75%, and winds are calm at 5 mph. It’s a comfortable start to the day.
Expect a mix of sun and clouds throughout the day. The high temperature will reach around 78°F (26°C) in the afternoon, providing a pleasant and mild day. However, there is a slight chance of scattered showers later in the day, so it’s a good idea to keep an umbrella handy just in case.
As the evening approaches, the temperature will gradually drop to around 60°F (15°C). The chance of rain persists, so it’s recommended to stay prepared for a few scattered showers. Winds will
remain light, contributing to a calm evening.
Looking ahead, the weather pattern remains relatively mild over the next few days. Expect temperatures to hover around the mid-70s during the day and the low 60s at night. There’s a chance of intermittent clouds and isolated showers, so it’s advisable to stay updated on the forecast for any changes.
As always, stay tuned to local news and weather updates for any changes in the forecast. If you have outdoor plans, keep an eye on the sky and be prepared for potential rain showers.
That’s your Houston weather update for today. Have a great day and stay weather-aware!”
Let’s look at how we load this document and create embeddings for it using the LangChain package.
To store embeddings in PGVector, PostgreSQL should have vector extension enabled. To do that, run the following command in PSQL:
CREATE EXTENSION vector;
Then, we use the following Python code to load our PDF document, create its embeddings, and store them in a PGVector collection.
from langchain.document_loaders import PyPDFLoader from langchain.vectorstores.pgvector import PGVector from langchain_openai import OpenAIEmbeddings import os # set the openai api key os.environ["OPENAI_API_KEY"] = "" # load the document using langchain's pdf loader loader = PyPDFLoader("Texas Weather Report.pdf") pages = loader.load() # create the db connection (replace credentials with your own) CONNECTION_STRING = PGVector.connection_string_from_db_params( driver="psycopg2", host="localhost", port="5432", database="chatbot", user="postgres", password="postgres", ) # to create embeddings, we will use openai embeddings API # langchain provides us a helper class to do this embeddings = OpenAIEmbeddings() # save embeddings to db collection_store = PGVector.from_documents( embedding=embeddings, documents=pages, collection_name="chatbot", pre_delete_collection=False, connection_string=CONNECTION_STRING, )
Here is what the above code does:
The next step in our Python and OpenAI chatbot is to create a “chat chain” that connects the document collection store with OpenAI’s Chat API and returns a response for the given query.
Let’s look at the code for this:
from langchain.chains import RetrievalQA from langchain_openai import OpenAI # create chat chain chain = RetrievalQA.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=vector_store.as_retriever(), return_source_documents=True, ) # helper function to chat with the chain def chat(query): response = chain.invoke({"query": query}) return response["result"]
Here’s the explanation for this block of code:
We use LangChain’s RetrievalQA.from_chain_type method to create a chat chain. We choose OpenAI as our LLM (Large Language Model).
This means LangChain will use OpenAI’s chat model to generate the responses, and all this will be happening behind the scenes. We also provide it with the collection store we created in the previous step.
Next, we create a helper method chat() that will be used to interact with the chain. This function will take in a query as input and return the response as output. We do this to create a “chatbot” feel, abstracting the logic of interacting with the chain.
That’s it! Our custom AI chatbot is ready. We can chat for information within the document. Let’s see how it performs:
# input chat("What is the forecast for houston in the next few days?") # output: " The forecast for Houston in the next few days is relatively mild, with temperatures in the mid-70s during the day and low 60s at night. There is a chance of intermittent clouds and isolated showers, so it's important to stay updated on the forecast for any changes."
Amazing! Let’s look at a final example with a question that is a bit complex:
# input chat("Is it safe to go outside today in Houston?") # output: “ It is generally safe to go outside today in Houston, but there is a chance of scattered showers in the afternoon and evening. It is recommended to stay prepared with an umbrella.”
Awesome! We can see how smart our custom AI chatbot is, as this information was not directly available in the document, but it managed to derive it from the provided data.
In this blog, we saw how we can create a smart custom AI chatbot using Python and the OpenAI API. Using the LangChain package, we saw how easy it is to create and store embeddings for documents in PostgreSQL and chat with those documents using the OpenAI API.
A chatbot like this can be easily connected to an application or website. You can even use it to smartly query important documents related to the website.
So why not try it out for yourself? Start creating your own custom AI chatbot today!
USA408 365 4638
1301 Shoreway Road, Suite 160,
Belmont, CA 94002
Whether you are a large enterprise looking to augment your teams with experts resources or an SME looking to scale your business or a startup looking to build something.
We are your digital growth partner.
Tel:
+1 408 365 4638
Support:
+1 (408) 512 1812
COMMENTS ()
Tweet