Why Agent Observability?
When your AI agent runs 20 steps behind the scenes, you have no idea what happened: which step burned the most tokens? Which tool call was the slowest? Where did the error occur? You guess, or grep through logs.
Existing solutions are either too heavy (ELK needs 16GB+ RAM) or too expensive (Datadog $15/mo, Sentry $26/mo) for indie developers.
What I needed: zero cost, 5-minute setup, one dashboard for all agent behavior.
Architecture
Three components:
Hermes Agent → auto-record each step → Quack (HTTP) → DuckDB (obs.db) → Streamlit Dashboard
Deployment
1. Install DuckDB
# Linux
curl -fsSL https://install.duckdb.org | sh
# macOS
brew install duckdb
# Windows
winget install DuckDB.cli
2. Create Database and Table
duckdb /var/lib/hermes-obs/obs.db
CREATE TABLE agent_traces (
session_id UUID,
step_id INTEGER,
action VARCHAR,
tool_name VARCHAR,
content VARCHAR,
token_cost INTEGER DEFAULT 0,
latency_ms INTEGER DEFAULT 0,
model VARCHAR,
created_at TIMESTAMP DEFAULT now()
);
3. Instrument Your Agent
Write a row after each tool call:
import duckdb
conn = duckdb.connect('/var/lib/hermes-obs/obs.db')
conn.execute("""
INSERT INTO agent_traces (session_id, step_id, action, tool_name, content, token_cost, latency_ms, model)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (session_id, step_id, 'tool_call', 'search_content', 'Searching user request', 45, 320, 'deepseek-v4-flash'))
4. Start the Dashboard
pip install streamlit streamlit-autorefresh
streamlit run dashboard.py --server.port 5803
Features: auto-refresh every 30s, manual refresh, bar charts (multi-color), pie charts, token breakdown by model and action type.
5. Auto-Recording via Cron
cronjob action=create \
name="hermes-obs-heartbeat" \
schedule="every 10m" \
prompt="Write a heartbeat record to Hermes-Obs DB"
Useful Queries
Daily summary:
SELECT COUNT(*) AS steps, SUM(token_cost) AS tokens,
ROUND(AVG(latency_ms), 1) AS avg_latency
FROM agent_traces WHERE created_at >= CURRENT_DATE;
Most expensive tools:
SELECT tool_name, SUM(token_cost) AS tokens, COUNT(*) AS calls
FROM agent_traces WHERE action = 'tool_call'
GROUP BY tool_name ORDER BY tokens DESC LIMIT 10;
Model cost comparison:
SELECT model, SUM(token_cost) AS tokens, COUNT(*) AS calls,
ROUND(AVG(latency_ms), 1) AS avg_latency
FROM agent_traces WHERE model IS NOT NULL
GROUP BY model ORDER BY tokens DESC;
Cost Comparison
| Solution | Monthly Cost | RAM | Setup Time |
|---|---|---|---|
| Hermes-Obs | $0 | < 100MB | 5 min |
| ELK Stack | $0(self)/$200+(cloud) | 16GB+ | 1-2 days |
| Datadog APM | $15+/host | — | 30 min |
| Sentry Performance | $26+/mo | — | 20 min |
FAQ
Q: DuckDB lock conflicts? Stop Streamlit before writing, restart after. Or use WAL mode.
Q: Scaling concerns? DuckDB handles hundreds of millions of rows. Clean old data:
DELETE FROM agent_traces WHERE created_at < now() - INTERVAL '30 days';
Q: Data not updating? Check: ① is hermes-obs-record writing new data? ② is auto-refresh enabled? (default 30s)
Real-world Data
Writing this article, Hermes Agent ran 36 steps consuming 12,595 tokens:
- deepseek-v4-flash handled most tool calls
- Pro model used only for key decisions (92% tokens but 21% calls)
- Average tool latency: ~200ms for patching, ~350ms for terminal
- Total cost: under $0.01
Next Steps
Quack remote deployment, token budget alerts, and anomaly detection coming in future posts.
