Back in February I learned about Open WebUI and have been wanting to test it out with Retrieval-Augmented Generation (RAG). I'll follow the tutorial available on Open WebUI's site.
The goal of RAG is to provide Large Language Models (LLMs) with specific information with the goal of improving the output. My main interest in RAG is to combine LLMs with domain-specific information, so I can have a model for a specific task.
The tutorial uses Open WebUI's documentation as the knowledge base, so let's download it and prepare it for uploading.
wget https://github.com/open-webui/docs/archive/refs/heads/main.zip
unzip main.zip
mkdir markdown
# ignore the bunch of overwrite messages
find docs-main -type f -name "*.md*" -exec cp -t markdown {} +
Start Open WebUI using Docker on port 8080 (default); I already have Ollama running on my server on port 11434. There is another Docker image with Ollama bundled with it if you prefer.
docker run --rm -d --network=host -v /data/open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui ghcr.io/open-webui/open-webui:main
Now to create the knowledge base!
- Navigate to
Workspace->Knowledge->+to Create a Knowledge Base. - Name it:
Open WebUI Documentation - What are you trying to achieve?: Documentation assistance
- Click
Create Knowledge.
After creating the knowledge base click on it and click on the + button and then upload the markdown directory.
Next we will create a custom model with the knowledge base.
- Go to
Workspace->Models->+to add a new model. - Model name will be
Open WebUI - I'll use
phi4:14bas the base model - To attach knowledge base here, add them to the "Knowledge" workspace first. Click on
Select Knowledgeand selectOpen WebUI Documentation - Finally click on save and create
To test it out, start a new chat and select Open WebUI as the model. Test with the query: How do I configure environment variables? (This took several minutes because I'm running it on an old and slow computer that I re-purposed into one of my servers.)
And the output, which I have no idea on whether it is correct or not.
The official documentation on RAG doesn't really provide much information on how it finds the relevant information but after it finds it, it will perform the following:
The retrieved text is then combined with a predefined RAG template and prefixed to the user's prompt, providing a more informed and contextually relevant response.
Since I can't verify whether the output is correct or not, I'll write another blog post soon on something I can verify.

This work is licensed under a Creative Commons
Attribution 4.0 International License.

