LangChain: From Simple Prompt to Local Knowledge

Mic
Mar 12
6 min read

Large language models are remarkable tools. You can ask them to explain complex concepts, summarise research, or generate code, and they will produce surprisingly coherent answers. But these systems have one important limitation: they only know what they were trained on. We already looked at how to use this in Python via LangChain in this post.

If you want a model to use your own documents, such as PDFs, Markdown notes, research papers, or documentation, you need to give it access to that information.

This is exactly what Retrieval Augmented Generation (RAG) systems do. A RAG pipeline allows a language model to look up relevant pieces of text from a document collection and then use those pieces as context when generating an answer. Instead of relying purely on training data, the model can ground its response in external knowledge.

In this post, we will build a small but realistic LangChain project that does exactly that. We will start with the simplest possible LangChain example and gradually extend it until we have a complete pipeline that:

loads documents from a folder
splits them into manageable chunks
converts them into embeddings
stores them in a FAISS vector index
retrieves relevant passages for a topic
asks an LLM to explain the topic using those passages
writes the final result to a Markdown file

The progression is intentionally incremental. Each step introduces one new concept so that by the end, the entire architecture makes sense. If you want to have a look at the complete code, it can be found here.

Note that when importing the packages used, there are a few dependencies that have to be available as well, including pypdf and faiss.

At least generative AI sees itself as a cute little robot.

The Simplest Possible LangChain Example

At its heart, LangChain is simply a system for building pipelines that connect prompts, models, and transformations.

A minimal example might look like this:

Even this small script already demonstrates the core LangChain idea. The | operator connects components into a chain, so the input flows through the prompt template and then into the model.

The pipeline looks like this:

topic → prompt template → LLM → text output

And this is quite literally our code as well.

The result is just plain text generated by the model. While that is fine for quick experiments, it is not ideal for applications where we want a predictable structure in the output.

That brings us to the next improvement.

Adding Structured Output

Instead of asking the model for arbitrary text, we can define exactly what structure we expect.

LangChain integrates neatly with Pydantic, which allows us to define a schema for the output. Hence we add the following Explanation schema

This model tells LangChain what the output should look like. Then we need to tell it that this should be desired output by adding it to the model:

And we have to update our prompt template to make sure that the model knows what to provide.

When the chain runs, LangChain automatically parses the response and returns a populated Explanation object.

This small change removes a lot of potential parsing headaches and gives us reliable data structures to work with later.

Introducing Our Own Documents

So far the model is relying purely on its own knowledge. This is good and nice for questions that do not involve your own documents. If you want to use the model to parse your own documents and gather information from them ...well, we have to provide those as external sources.

Imagine we have a directory containing documentation, notes, and PDFs.

docs/
    paper.pdf
    tutorial.md
    notes.txt

Our first task is simply to load these files into memory. LangChain provides convenient document loaders. We import

Let's first implement the loader methods themselves. We focus here on only two types of files, first simply text files:

and then PDF files:

Note in the pdf loader that it returns a list of documents. Each of them is a document object containing the content of one page of the original pdf.

We can then walk through a directory and load all supported files.

At this stage we have a list of Document objects containing the full text of each file. In a final version we would also add a few checks to this, e.g. whether the folder exists or whether any documents were even found.

However, feeding entire documents to a vector index is rarely ideal. Long texts reduce retrieval precision and can exceed token limits.

The usual solution is to break documents into smaller pieces.

Splitting Documents into Chunks

Chunking is an essential part of most RAG pipelines. Instead of embedding entire documents, we embed manageable fragments.

LangChain provides a variety of text splitters. A commonly used option is the RecursiveCharacterTextSplitter.

This splitter tries to break text along natural boundaries while respecting the chunk size. The overlap ensures that contextual information is not lost between adjacent fragments.

After this step, each document becomes a collection of smaller text segments that are much easier to embed and retrieve.

Converting Text into Embeddings

To enable semantic search we need to convert each text chunk into a vector representation. This process is called embedding.

We will use Google's Gemini embedding model through LangChain.

Each chunk of text is transformed into a numerical vector that captures its semantic meaning. Simply said, the similarity or relationship between chunks translates to mathematical terms like cosine similarity and chunks discussing similar topics will appear close together in the vector space.

Once we have embeddings, we need somewhere to store them.

Building a FAISS Vector Store

FAISS is a popular open-source library for fast similarity search over vectors. LangChain integrates with FAISS through a simple interface.

Here, docs are what our load_documents_from_folder method returns and embeddings is from the previous step. In the final version we will move all of this into its own method build_or_load_vectorstore and also add in an option to load the embedding from disk. This is especially important because building embeddings can take time, which is an unneccessary step if your local documents don't even change.

Retrieving Relevant Context

Once the vector store exists, we can query it for relevant information. When a user provides a topic, we embed the topic as well and search for similar vectors. LangChain provides a convenient abstraction called a retriever.

When invoked with a query, the retriever returns the most relevant document chunks.

These retrieved passages become the contextual knowledge that will guide the model’s explanation.

Building the Retrieval Pipeline

LangChain pipelines are constructed using Runnable objects. We begin by writing a small function that retrieves context and prepares it for the prompt.

This method will be the first method in our pipeline. Hence it will be the one that is called with our desired topic. It then retrieves the context via our defined retriever from the vector store and returns both topic and context in a dictionary. To be able to be used in a pipeline, we convert this function into a runnable component:

The RunnableLambda wrapper creates an object with all necessary methods to be used in a pipeline. For our purposes that simply means it possesses an invoke(input) method. The input will be our topic and the output of the Runnable is the result of the add_context method, i.e. the dictionary containing topic and context.

Next we define the generation stage. Essentially the stage where stuff really happens

The input for this step of the pipeline is the topic-context dictionary from the previous step. It will generate, in parallel, three parts and produce a dictionary with: topic, context, and the explanation as an Explanation object. Now our pipeline contains both retrieval and generation steps.

Formatting the Result as Markdown

Since the output schema is structured, converting it into Markdown is straightforward.

This step converts the structured data into a human-readable document. We also convert this into a Runnable via

The Final Chain

With all components prepared, assembling the final pipeline is simple....just pipe it all together

We plug in our little {"topic": topic} dictionary. In the first step the pipeline adds context, in the second an explanation and in the last converts all of it down to a single markdown string. Running the chain is done via the invoke method.

Finally we save the result.

At this point we have a fully working RAG system that uses our local documents to generate explanations.

Adding a --rebuild-index Flag

When working with document collections, a common annoyance appears quickly.

Embedding documents takes time. Rebuilding the FAISS index every time the script runs is wasteful. However, sometimes we do want to rebuild it, for example when new documents are added.

A simple solution is to introduce a command-line flag that forces a rebuild. We can use argparse to look for a --rebuild-index flag and modify our build_or_load_vectorstore method accordingly.

Final Thoughts

The architecture we built in this post is a typical pattern for LLM applications.

Documents are loaded, chunked, embedded, and stored in a vector database. When a query arrives, relevant fragments are retrieved and supplied to the model as context. The model then generates an answer that is grounded in those documents.

What makes LangChain interesting is not that it introduces new algorithms. Instead, it provides a composable framework that lets you assemble these systems in a modular way. Each component can be swapped independently, whether that means changing the embedding model, replacing FAISS with another vector store, or switching to a different LLM provider.

The real challenge is not asking the model questions. It is deciding what knowledge the model should have access to in the first place.