← Back

How it works

Velqua is a transparent proxy. It sits between your apps and your LLM provider. Your apps don't change. The port number changes.

Architecture

Before: your app → localhost:11434 (Ollama)
After: your app → localhost:11435 (Velqua) → localhost:11434 (Ollama)

The request flow

Intercept

Your app sends a chat request to :11435. Velqua intercepts it before it reaches your model.

Retrieve

Velqua searches your personal memory graph for facts relevant to the current conversation topic.

Inject

Relevant facts are injected into the system prompt. Your model now has context it didn't have before.

Forward

The enriched request is forwarded to your provider. The model responds with full context awareness.

Learn

After the response, Velqua scans for new facts. Quality scored. Automatically stored. Your AI gets smarter every conversation.

Memory extraction pipeline

Velqua uses a pipeline called Anamnesis to extract facts from your conversation history:

Import — Drag in JSON exports from ChatGPT, Claude, or any compatible format
Extract — Anamnesis identifies personal facts: who you are, what you do, preferences
Score — Each fact is quality scored. Low-confidence facts are filtered out
Deduplicate — Facts compared against existing knowledge. Duplicates merged, contradictions flagged
Store — Clean, scored facts are stored in a local vector index for fast retrieval

Privacy

Everything runs local. API keys are encrypted at rest. The proxy binds to 127.0.0.1 by default. Nothing phones home. No telemetry. No cloud dependency.