A local SQL database your agent can actually query — with chDB.
Auxten Wang · Technical Director @ ClickHouse · Creator of chDB
@auxten · auxten.com
It hit a constraint and found the fix that worked.
Same task. It reads the prompt — not yesterday's fix.
Repeats the failed plan. Burns tool calls on retries.
Context engineering just means choosing what the agent sees. Four kinds of data:
Rules, constraints, preferences.
Files, tables, API results, logs.
Decisions, facts, prior solutions.
What worked, what failed, what changed.
Great for tiny state. No columnar scan, no vectors, no remote files.
Great in memory. Dies past RAM. Can't query files you never loaded.
Closest match — but fewer formats and no mature cloud.
Similarity only. Can't filter + join + aggregate + audit in one query.
Open-source. Fast SQL over huge tables: logs, events, metrics.
Stores by column, so scans over millions of rows stay fast.
A real engine: SQL, JSON, arrays, text search, vector distance.
The ClickHouse engine, embedded in your Python process — SQLite-simple, 100× faster for analytics.
# pip install chdb
from chdb import session as chs
sess = chs.Session("./agent")
sess.query("SELECT count() FROM file('log.pq')")
runs in the same process
Parquet, CSV, JSON, Arrow, ORC — read and write, no converters.
Query a DataFrame or Arrow table in place — no copy, no load.
Prefer DataFrames to raw SQL? chDB speaks that too.
SELECT city,
AVG(salary) AS mean,
SUM(salary) AS sum,
COUNT(salary) AS count
FROM file('employee_data.csv',
'CSVWithNames')
WHERE age > 25
AND salary > 50000
GROUP BY city
ORDER BY mean DESC
LIMIT 10
Three sources, one query — no load step.
Memory the agent can filter, rank, and grow — not a prompt blob.
Filter, rank, and audit them with plain SQL.
Meaning and rules in one query.
Same SQL from laptop to ClickHouse.
explicit decision
similarity + filter
similar but different
better belief wins
"Use Poetry for this repo."
"This repo now standardizes on uv." (old row → conflicted)
Rank memories by meaning, keep the active ones, pinned first.
how close two meanings are, as a number (0 = identical)
Need project, tags, or task-history? Add WHERE / JOIN lines — same query.
SELECT
content,
1 - cosineDistance(embedding, query_vec) AS meaning
FROM memories
WHERE status = 'active'
ORDER BY pinned DESC, meaning DESC
LIMIT 8
import chdb.datastore as pd
m = pd.DataStore(
table="memories", database="./agent_context")
m = m.assign(
meaning=1 - pd.F.cosineDistance(
m.embedding, pd.F.array(*query_vec)))
m[m.status == "active"].sort_values(
["pinned", "meaning"], ascending=False).head(8)
Recall runs many times per turn — jitter becomes think-time jitter.
A timed-out call is retried — each retry re-sends the whole prompt.
In-process recall is orders of magnitude faster. No tail, no retries.
chDB session on a laptop or agent runtime
ClickHouse Server or Cloud when memory is shared or large
# Day 1 — query local memory
sess.query("SELECT content FROM memories WHERE project = 'checkout'")
# Day 100 — repoint the same name at the cluster (SELECT unchanged)
sess.query("""CREATE OR REPLACE VIEW memories AS
SELECT * FROM remote('cloud:9440', db, memories)""")
Honest note: remote reads pay the network cost again — keep the hot path local.
That whole pattern — lifecycle, recall, local-first — packaged as an open-source project on chDB. Build the core yourself:
memories table
explicit decisions
similarity + filter
which memory helped?
sess = chs.Session("./agent_context")
sess.query("CREATE TABLE memories (...) ENGINE = MergeTree ORDER BY id")
sess.query("INSERT INTO memories VALUES (...)")
sess.query("""
SELECT content FROM memories
WHERE created_at > now() - INTERVAL 30 DAY -- filter
ORDER BY cosineDistance(embedding, ?) LIMIT 8 -- similarity
""")
Embeddings: Qwen/Qwen3-Embedding-0.6B via sentence-transformers, runs locally.
Filter it, rank it, join it, evaluate it.
Semantic meaning + simple rules, in one SQL.
chDB now; ClickHouse when shared or large.