What do you use for local RAG
airagkernel-memorykhojLocal RAG? #
I saw this question asked the other day. I was excited to see what people use and recommend. To my surprise the responses were few. After quickly looking at several of the proposed solutions none seemed to stand out. I thought about recommending Kernel Memory, which I had briefly used in the past. I experimented with it further as I considered a response.
Kernel Memory #
Kernel Memory is a way to perform RAG in .NET. It's easy to run locally, provides many configuration options as well as tools to interact with it. There are many ways to access Kernel Memory, I'm only running it locally as a service.
Setup #
There are a few ways to get started. I went to the repo and cloned the project: git clone --depth 1 git@github.com:microsoft/kernel-memory.git
To get going change directory to cd service/Service/
. This contains the Kernel Memory service we can run and
then interact with. There's a helpful configuration tool to walk through getting setup: dotnet run setup
. I've recorded my answers of note for configuration. In particular I selected Qdrant as my
vector DB. The builtin "SimpleVectorDb" works fine for experimenting too, I only picked Qdrant to
try it out. Also, I'm using OpenAI for my models. You will need to have your OpenAI API key at the
ready if you are going to use OpenAI as well.
- Web Service with Asynchronous Ingestion Handlers (better for retry logic and long operations)
- Which queue service will be used? SimpleQueues persistent (only for tests, data stored on disk)
- Where should the service store files? SimpleFileStorage persistent (only for tests, data stored on disk)
- Which service should be used to extract text from images? none
- When importing data, generate embeddings, or let the memory Db class take care of it? Yes, generate embeddings
- When searching for text and/or answers, which embedding generator should be used for vector search? OpenAI embedding model
- OpenAI <text/chat model name> gpt-4o-mini]: gpt-4o
- OpenAI <embedding model name> text-embedding-ada-002
- When searching for answers, which memory DB service contains the records to search? Qdrant
- Qdrant <endpoint>: http://127.0.0.1:6333
- No Qdrant key
- When generating answers and synthetic data, which LLM text generator should be used? OpenAI text/chat model
Once configuration is finished start up the Kernel Memory service: dotnet run
How to interact with Kernel Memory #
Now open a second terminal and navigate to the repo's tools directory: cd tools
.
If you already have docker installed and running then a bash script is provided to easily start up
Qdrant: ./run-qdrant.sh
.
Once the DB has been started you can view it in the browser:
http://localhost:6333/dashboard#/collections
.
Everything is now ready to begin using Kernel Memory. Another simple bash script is provided to upload documents to Kernel Memory. I have a PDF on Semantic Kernel lying around to upload.
./upload-file.sh -f ~/Downloads/semantic-kernel.pdf -i semantic-kernel -s http://127.0.0.1:9001
The document will now be in Qdrant, and can be seen in the default
collection.
Finally, there's another helpful script to query Kernel Memory. I'll ask a Semantic Kernel related question:
./ask.sh -s http://127.0.0.1:9001 -q 'how do I configure a plugin for .NET'
I saved the response so I could parse the response with jq
.
jq -r '.question, .relevantSources[0].documentId, .relevantSources[0].sourceName, .text' answer.json
Here's the abbreviated response
- how do I configure a plugin for .NET
- semantic-kernel
- semantic-kernel.pdf
- To configure a plugin for .NET in the context of Semantic Kernel, follow these steps: ...
Thoughts #
I found this easy to get up and going. I don't know how practical it is to use, at least not using the method I illustrated here. It provides a base level of functionality which can be built upon. Therefore, if you want a lot of control then this may be a great solution. If you want something out of the box then this may not be it. I'm a little surprised no one has taken these tools and created an open source product for this. All of the pieces are here, they just need a good UI on top of them.
Khoj: the nice UI for RAG? #
The one solution I saw come up a couple times in response to the original question about a local RAG solution was Khoj. I had seen this product come up months prior but it is a commercial product, and I don't want another subscription at the moment. I believe I saw that it could be run locally for free, but at the time it seemed more effort than I wanted to spend. I, however, had a renewed motivation, so I looked at the documentation again. It seemed simple.
Setup #
Download the docker compose file, add your OpenAI API key and start it!
wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml
Edit the docker-compose.yml
and add your OpenAI API Key to the configuration value
"OPENAI_API_KEY". Then start it up:
docker-compose up
I actually had several issues when I tried this, I couldn't access the images. But once that was
sorted out it started up. Point a browser to http://localhost:42110/
and you're greeted by a
decent UI allowing you to perform a variety of actions, including uploading a document and then
asking questions.
Obsidian #
What's extra nice about this solution is that there is a plugin for Obsidian. It's as easy as installing the plugin in Obsidian. One plugin configuration value needs to be update, the Khoj server configuration value needs to be updated to the local running instance, http://127.0.0.1:42110. With that in place it began syncing my Obsidian notes. After it finished I could ask question about my documents. Nice!
Thoughts #
Even though both of these solutions were pretty simple to run locally I haven't found myself using them. Some things stick and some things do not. This has not. Most likely because I haven't felt a real need yet. Maybe that day will come. If so, I'm prepared, I think.