Retrieval-Augmented Generation

Back in February I learned about Open WebUI and have been wanting to test it out with Retrieval-Augmented Generation (RAG). I'll follow the tutorial available on Open WebUI's site.

The goal of RAG is to provide Large Language Models (LLMs) with specific information with the goal of improving the output. My main interest in RAG is to combine LLMs with domain-specific information, so I can have a model for a specific task.

The tutorial uses Open WebUI's documentation as the knowledge base, so let's download it and prepare it for uploading.

wget https://github.com/open-webui/docs/archive/refs/heads/main.zip
unzip main.zip
mkdir markdown
# ignore the bunch of overwrite messages
find docs-main -type f -name "*.md*" -exec cp -t markdown {} +

Start Open WebUI using Docker on port 8080 (default); I already have Ollama running on my server on port 11434. There is another Docker image with Ollama bundled with it if you prefer.

docker run --rm -d --network=host -v /data/open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui ghcr.io/open-webui/open-webui:main

Now to create the knowledge base!

  • Navigate to Workspace -> Knowledge -> + to Create a Knowledge Base.
  • Name it: Open WebUI Documentation
  • What are you trying to achieve?: Documentation assistance
  • Click Create Knowledge.

After creating the knowledge base click on it and click on the + button and then upload the markdown directory.

Next we will create a custom model with the knowledge base.

  • Go to Workspace -> Models -> + to add a new model.
  • Model name will be Open WebUI
  • I'll use phi4:14b as the base model
  • To attach knowledge base here, add them to the "Knowledge" workspace first. Click on Select Knowledge and select Open WebUI Documentation
  • Finally click on save and create

To test it out, start a new chat and select Open WebUI as the model. Test with the query: How do I configure environment variables? (This took several minutes because I'm running it on an old and slow computer that I re-purposed into one of my servers.)

And the output, which I have no idea on whether it is correct or not.

The official documentation on RAG doesn't really provide much information on how it finds the relevant information but after it finds it, it will perform the following:

The retrieved text is then combined with a predefined RAG template and prefixed to the user's prompt, providing a more informed and contextually relevant response.

Since I can't verify whether the output is correct or not, I'll write another blog post soon on something I can verify.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.