Join us

From zero to a RAG system: successes and failures

From zero to a RAG system: successes and failures

An engineer spun up an internal chat with a local LLaMA model via Ollama, a Python Flask API, and a Streamlit frontend.

They moved off in-memory LlamaIndex to batch ingestion into ChromaDB (SQLite). Checkpoints and tolerant parsing went in to stop RAM disasters.

Indexing produced 738,470 vectors (~54 GB). They rented an NVIDIA RTX 4000 VM for embeddings and pushed originals to Azure Blob via SAS links.


Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Avatar

Kala #GenAI

FAUN.dev()

@kala
Generative AI Weekly Newsletter, Kala. Curated GenAI news, tutorials, tools and more!
Developer Influence
16

Influence

1

Total Hits

162

Posts