What do you use for local RAG

Nov 02 2024ai rag kernel-memory khoj

Local RAG? #

I saw this question asked the other day. I was excited to see what people use and recommend. To my surprise the responses were few. After quickly looking at several of the proposed solutions none seemed to stand out. I thought about recommending Kernel Memory, which I had briefly used in the past. I experimented with it further as I considered a response.

Kernel Memory #

Kernel Memory is a way to perform RAG in .NET. It's easy to run locally, provides many configuration options as well as tools to interact with it. There are many ways to access Kernel Memory, I'm only running it locally as a service.

Setup #

There are a few ways to get started. I went to the repo and cloned the project: git clone --depth 1 git@github.com:microsoft/kernel-memory.git

To get going change directory to cd service/Service/. This contains the Kernel Memory service we can run and then interact with. There's a helpful configuration tool to walk through getting setup: dotnet run setup. I've recorded my answers of note for configuration. In particular I selected Qdrant as my vector DB. The builtin "SimpleVectorDb" works fine for experimenting too, I only picked Qdrant to try it out. Also, I'm using OpenAI for my models. You will need to have your OpenAI API key at the ready if you are going to use OpenAI as well.

- Web Service with Asynchronous Ingestion Handlers (better for retry logic and long operations)
- Which queue service will be used? SimpleQueues persistent (only for tests, data stored on disk)
- Where should the service store files? SimpleFileStorage persistent (only for tests, data stored on disk)
- Which service should be used to extract text from images? none
- When importing data, generate embeddings, or let the memory Db class take care of it? Yes, generate embeddings
- When searching for text and/or answers, which embedding generator should be used for vector search? OpenAI embedding model
- OpenAI <text/chat model name> gpt-4o-mini]: gpt-4o
- OpenAI <embedding model name> text-embedding-ada-002
- When searching for answers, which memory DB service contains the records to search? Qdrant
- Qdrant <endpoint>: http://127.0.0.1:6333
- No Qdrant key
- When generating answers and synthetic data, which LLM text generator should be used? OpenAI text/chat model

Once configuration is finished start up the Kernel Memory service: dotnet run

How to interact with Kernel Memory #

Now open a second terminal and navigate to the repo's tools directory: cd tools.

If you already have docker installed and running then a bash script is provided to easily start up Qdrant: ./run-qdrant.sh.

Once the DB has been started you can view it in the browser: http://localhost:6333/dashboard#/collections.

Everything is now ready to begin using Kernel Memory. Another simple bash script is provided to upload documents to Kernel Memory. I have a PDF on Semantic Kernel lying around to upload.

./upload-file.sh -f ~/Downloads/semantic-kernel.pdf -i semantic-kernel -s http://127.0.0.1:9001

The document will now be in Qdrant, and can be seen in the default collection.

Finally, there's another helpful script to query Kernel Memory. I'll ask a Semantic Kernel related question:

./ask.sh -s http://127.0.0.1:9001 -q 'how do I configure a plugin for .NET'

I saved the response so I could parse the response with jq.

jq -r '.question, .relevantSources[0].documentId, .relevantSources[0].sourceName, .text' answer.json

Here's the abbreviated response

- how do I configure a plugin for .NET
- semantic-kernel
- semantic-kernel.pdf
- To configure a plugin for .NET in the context of Semantic Kernel, follow these steps: ...

Thoughts #

I found this easy to get up and going. I don't know how practical it is to use, at least not using the method I illustrated here. It provides a base level of functionality which can be built upon. Therefore, if you want a lot of control then this may be a great solution. If you want something out of the box then this may not be it. I'm a little surprised no one has taken these tools and created an open source product for this. All of the pieces are here, they just need a good UI on top of them.

Khoj: the nice UI for RAG? #

The one solution I saw come up a couple times in response to the original question about a local RAG solution was Khoj. I had seen this product come up months prior but it is a commercial product, and I don't want another subscription at the moment. I believe I saw that it could be run locally for free, but at the time it seemed more effort than I wanted to spend. I, however, had a renewed motivation, so I looked at the documentation again. It seemed simple.

Setup #

Download the docker compose file, add your OpenAI API key and start it!

wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml

Edit the docker-compose.yml and add your OpenAI API Key to the configuration value "OPENAI_API_KEY". Then start it up:

docker-compose up

I actually had several issues when I tried this, I couldn't access the images. But once that was sorted out it started up. Point a browser to http://localhost:42110/ and you're greeted by a decent UI allowing you to perform a variety of actions, including uploading a document and then asking questions.

Obsidian #

What's extra nice about this solution is that there is a plugin for Obsidian. It's as easy as installing the plugin in Obsidian. One plugin configuration value needs to be update, the Khoj server configuration value needs to be updated to the local running instance, http://127.0.0.1:42110. With that in place it began syncing my Obsidian notes. After it finished I could ask question about my documents. Nice!

Thoughts #

Even though both of these solutions were pretty simple to run locally I haven't found myself using them. Some things stick and some things do not. This has not. Most likely because I haven't felt a real need yet. Maybe that day will come. If so, I'm prepared, I think.