Langchain — RAG — Retrieval Augmented Generation

3 min readMay 29, 2024

In this tutorial, we will explore how to use the Retrieval-Augmented Generation (RAG) technique with the LangChain library to integrate proprietary data into large language models (LLMs). Let’s take it step by step, from initial configuration to practical implementation.

Integrating Proprietary Data into LLMs with LangChain

In this tutorial, you will learn how to use the Retrieval-Augmented Generation (RAG) technique with the LangChain library to integrate proprietary data into large language models (LLMs). This technique allows you to use external data, such as web pages or documents, to improve language model responses.

Initial setting

Before we begin, make sure you have Python installed on your system. Then install the LangChain library and its dependencies using the following command:

pip install langchain openai faiss-cpu beautifulsoup4

Step 1: Creating the Main Tutorial File

Let’s start by creating the main file where the tutorial will run. This file will show the practical application of integrating LangChain with an LLM

from langchain import OpenAI, PromptTemplate, LLMChain

# Inicialize a API do OpenAI com sua chave API
openai = OpenAI(api_key="SUA_CHAVE_API")

# Defina o template do prompt
prompt_template = PromptTemplate(
    template="Quantos Oscars o filme {filme} ganhou em {ano}?",
    input_variables=["filme", "ano"]
)

# Crie a cadeia LLM
llm_chain = LLMChain(llm=openai, prompt=prompt_template)

# Execute a cadeia com parâmetros de exemplo
resposta = llm_chain.run(filme="Openheimer", ano="2024")
print(resposta)

Step 2: Creating the File for Web Scraping and Vector Creation

Now, let’s create a file responsible for web scraping the Wikipedia page and transforming the content into embeddings for storage in the vector database.

from langchain.document_loaders import WebBaseLoader
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Carregue os documentos da página da Wikipedia
loader = WebBaseLoader("https://pt.wikipedia.org/wiki/Filme_Openheimer")
documents = loader.load()

# Crie embeddings para os documentos
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

# Crie o retriever
retriever = vectorstore.as_retriever()

Step 3: Creating Utility Functions File

Let’s encapsulate the scraping, embedding, and data retrieval operations in a utility function.

from langchain.document_loaders import WebBaseLoader
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

def url_to_retriever(url):
    loader = WebBaseLoader(url)
    documents = loader.load()

    embeddings = OpenAIEmbeddings()
    vectorstore = FAISS.from_documents(documents, embeddings)

    return vectorstore.as_retriever()

Step 4: Creating the Main File for RAG Integration

We will integrate all components created to implement the RAG technique with an LLM.

from langchain import OpenAI, ChatPromptTemplate, create_retrieval_chain
from your_module import url_to_retriever

# Inicialize a API do OpenAI com sua chave API
openai = OpenAI(api_key="SUA_CHAVE_API")

# Crie o retriever a partir da URL da Wikipedia
retriever = url_to_retriever("https://pt.wikipedia.org/wiki/Filme_Openheimer")

# Defina o template do prompt
prompt_template = ChatPromptTemplate(
    template="Quantos Oscars o filme {filme} ganhou em {ano}?",
    input_variables=["filme", "ano"]
)

# Crie a cadeia de recuperação
chain = create_retrieval_chain(
    llm=openai,
    retriever=retriever,
    prompt_template=prompt_template
)

# Execute a cadeia com parâmetros de exemplo
resposta = chain.run(filme="Openheimer", ano="2024")
print(resposta)

Conclusion

In this tutorial, you learned how to use the RAG technique with the LangChain library to integrate proprietary data into large language models (LLMs). This process involves creating embeddings from external data, storing these embeddings in a vector database, and retrieving this information to improve language model responses.

With this knowledge, you can apply this technique to different types of data and create more intelligent and informed applications. If you have any questions or suggestions, feel free to comment below. Until the next tutorial!

Note: Replace “YOUR_API_KEY” with your OpenAI API key.

This tutorial should help you understand and implement the RAG technique with LangChain in your own applications. Good luck!

#llm #langchain #RAG

Langchain — RAG — Retrieval Augmented Generation

Written by Airton de Sousa Lira Junior

No responses yet