An engineer spun up an internal chat with a local LLaMA model via Ollama, a Python Flask API, and a Streamlit frontend.
They moved off in-memory LlamaIndex to batch ingestion into ChromaDB (SQLite). Checkpoints and tolerant parsing went in to stop RAM disasters.
Indexing produced 738,470 vectors (~54 GB). They rented an NVIDIA RTX 4000 VM for embeddings and pushed originals to Azure Blob via SAS links.










