## Introducción al Debugging de Agentes
Debuggear un agente de IA es diferente a debuggear código tradicional. No tienes breakpoints, pero sí tienes **logs**, **trazas de comportamiento**, y **patterns de respuesta**. La clave está en saber qué observar y cómo interpretar las señales.
El debugging de agentes es 70% observación de patrones y 30% conocimiento técnico. Un bug en un agente a menudo se manifiesta como comportamiento inconsistente, no como error fatal.
## Tipos de Problemas Comunes
### 1. Problemas de Comportamiento
- **Síntoma**: El agente ignora instrucciones específicas
- **Causa común**: Prompt overload, conflicto de prioridades
- **Diagnóstico**: Revisar SOUL.md, simplificar instrucciones
### 2. Problemas de Memoria
- **Síntoma**: El agente olvida contexto importante
- **Causa común**: Archivos de memoria no cargados, sesiones fragmentadas
- **Diagnóstico**: Verificar structure de memory/, INDEX.md
### 3. Problemas de Herramientas
- **Síntoma**: APIs fallan, comandos no funcionan
- **Causa común**: Tokens expirados, rate limits, permisos
- **Diagnóstico**: Logs de sistema, testing manual
### 4. Problemas de Rendimiento
- **Síntoma**: Respuestas lentas, timeouts
- **Causa común**: Model choice incorrecto, context too large
- **Diagnóstico**: Session timing, model metrics
## Sistema de Logging
### Niveles de Log
```markdown
## Estructura de Logs
### TRACE (Máximo detalle)
- Function calls con parámetros
- API requests/responses completos
- Memory loading/saving
- Token usage por request
### DEBUG (Desarrollo)
- Decision points importantes
- Variable states
- Tool invocations
- Error recovery attempts
### INFO (Producción normal)
- Actions completadas exitosamente
- State changes importantes
- User interactions
- Session events
### WARN (Atención requerida)
- Rate limits approached
- Fallback behaviors activated
- Memory cleanup events
- Performance degradation
### ERROR (Problemas críticos)
- Tool failures
- Memory corruption
- Authentication failures
- Unhandled exceptions
```
### Log Analysis Scripts
```bash
#!/bin/bash
# analyze-logs.sh
# Analizar patterns en logs del agente
LOG_DIR="~/.clawdbot/agents/main/sessions"
TODAY=$(date +%Y-%m-%d)
echo "=== Agent Log Analysis ==="
# Errores frecuentes
echo "## Top Errors (last 24h):"
find "$LOG_DIR" -name "*.jsonl" -mtime -1 \
| xargs grep '"level":"ERROR"' \
| jq -r '.message' \
| sort | uniq -c | sort -nr | head -10
# Performance issues
echo "## Slow Responses (>10s):"
find "$LOG_DIR" -name "*.jsonl" -mtime -1 \
| xargs grep '"duration":[0-9][0-9][0-9][0-9][0-9]' \
| jq -r '{time: .timestamp, duration: .duration, action: .action}'
# Tool usage stats
echo "## Tool Usage:"
find "$LOG_DIR" -name "*.jsonl" -mtime -1 \
| xargs grep '"tool_name":' \
| jq -r '.tool_name' \
| sort | uniq -c | sort -nr
# Memory loading patterns
echo "## Memory Files Loaded:"
find "$LOG_DIR" -name "*.jsonl" -mtime -1 \
| xargs grep 'memory.*loaded' \
| jq -r '.message' | sort | uniq -c
```
## Trazas de Comportamiento
### Behavior Tracing
```markdown
# Template de traza de comportamiento
## Session: abc123-def456
**Timestamp**: 2025-02-08 14:30:15
**Model**: anthropic/claude-opus-4-5
**Context Size**: 15,420 tokens
### Input
User: "Deploy the app to production"
### Processing Trace
1. **14:30:15.123** - Load SOUL.md (242 tokens)
2. **14:30:15.156** - Load memory/2025-02-08.md (1,120 tokens)
3. **14:30:15.189** - Load TOOLS.md deployment section (890 tokens)
4. **14:30:15.234** - Parse user intent: deployment request
5. **14:30:15.267** - Check prerequisites: environment=production
6. **14:30:15.298** - Safety check: requires confirmation (SOUL.md rule)
7. **14:30:15.331** - Generate response: ask for confirmation
### Output
Agent: "I can deploy to production, but I need confirmation since this affects live users. Which version should I deploy? Current staging is v2.3.1."
### Analysis
- ✅ Correctly loaded context files
- ✅ Followed safety rules from SOUL.md
- ✅ Identified need for confirmation
- ❌ Could have specified exact deployment steps
```
### Manual Tracing
```bash
#!/bin/bash
# trace-behavior.sh
# Manual tracing de comportamiento específico
echo "=== Behavior Trace ==="
echo "Session: $1"
echo "Timestamp: $(date)"
# Context loading
echo "## Context Loaded:"
ls -la memory/$(date +%Y-%m-%d).md 2>/dev/null && echo "✅ Today's memory" || echo "❌ Missing today's memory"
ls -la SOUL.md 2>/dev/null && echo "✅ SOUL.md" || echo "❌ Missing SOUL.md"
ls -la TOOLS.md 2>/dev/null && echo "✅ TOOLS.md" || echo "❌ Missing TOOLS.md"
# Tool availability
echo "## Tools Status:"
command -v git >/dev/null && echo "✅ Git" || echo "❌ Git"
command -v docker >/dev/null && echo "✅ Docker" || echo "❌ Docker"
curl -s https://api.github.com >/dev/null && echo "✅ GitHub API" || echo "❌ GitHub API"
# Memory health
echo "## Memory Health:"
wc -l memory/*.md | tail -1
find memory/ -name "*.md" -mtime +30 | wc -l | xargs echo "Old files:"
```
## Debugging por Categorías
### Memory Issues
#### Síntomas
- Agente repite errores ya resueltos
- No recuerda decisiones importantes
- Pierde contexto de proyectos
#### Diagnosis Tools
```bash
# Memory file integrity
echo "=== Memory Diagnostics ==="
# Check if files exist and are readable
for file in MEMORY.md memory/$(date +%Y-%m-%d).md memory/INDEX.md; do
if [[ -r "$file" ]]; then
echo "✅ $file ($(wc -l < "$file") lines)"
else
echo "❌ $file (missing/unreadable)"
fi
done
# Check memory cross-references
grep -n "\[.*\]" memory/INDEX.md | head -5
echo "Cross-references found: $(grep -c "\[.*\]" memory/INDEX.md)"
# Check for orphaned references
grep -o "memory/[0-9-]*\.md" memory/*.md | sort -u > /tmp/referenced
ls memory/20*.md > /tmp/existing
echo "Orphaned files: $(comm -23 /tmp/existing /tmp/referenced | wc -l)"
```
#### Common Fixes
```markdown
## Memory Repair Checklist
### Missing Files
- [ ] Create today's memory file: `memory/$(date +%Y-%m-%d).md`
- [ ] Check MEMORY.md exists and is recent
- [ ] Verify INDEX.md has recent entries
### Broken References
- [ ] Fix broken links in INDEX.md
- [ ] Update session references in daily files
- [ ] Clean up orphaned memory files
### Content Issues
- [ ] Remove duplicate entries
- [ ] Merge related decisions
- [ ] Archive old unimportant entries
```
### Tool Integration Issues
#### Síntomas
- APIs devuelven errores 401/403
- Commands fail silently
- Rate limits exceeded constantemente
#### Diagnosis Tools
```bash
#!/bin/bash
# diagnose-tools.sh
# Test all tool integrations
echo "=== Tool Integration Diagnostics ==="
# API tokens
echo "## API Authentication:"
for service in github_token slack_bot_token hetzner_token; do
if security find-generic-password -a clawdbot -s "$service" >/dev/null 2>&1; then
echo "✅ $service exists in keychain"
# Test token validity (GitHub example)
if [[ "$service" == "github_token" ]]; then
TOKEN=$(security find-generic-password -a clawdbot -s github_token -w)
if curl -s -H "Authorization: Bearer $TOKEN" https://api.github.com/user >/dev/null; then
echo " → Token is valid"
else
echo " → ❌ Token is invalid/expired"
fi
fi
else
echo "❌ $service missing from keychain"
fi
done
# Command availability
echo "## Command Availability:"
for cmd in git docker npm curl jq; do
if command -v "$cmd" >/dev/null; then
echo "✅ $cmd ($(command -v "$cmd"))"
else
echo "❌ $cmd not found"
fi
done
# Network connectivity
echo "## Network Tests:"
for host in api.github.com api.hetzner.cloud slack.com; do
if curl -s --max-time 5 "$host" >/dev/null; then
echo "✅ $host reachable"
else
echo "❌ $host unreachable"
fi
done
```
#### Common Fixes
```markdown
## Tool Repair Checklist
### Authentication Issues
- [ ] Regenerate expired tokens
- [ ] Update keychain with new tokens
- [ ] Test token validity manually
- [ ] Check token scopes/permissions
### Network Issues
- [ ] Check VPN connection status
- [ ] Verify firewall rules
- [ ] Test DNS resolution
- [ ] Check rate limit headers
### Command Issues
- [ ] Verify command installation
- [ ] Check PATH variables
- [ ] Test command manually
- [ ] Review command permissions
```
### Behavior Issues
#### Síntomas
- Agente ignora instrucciones específicas
- Respuestas inconsistentes
- No sigue workflows establecidos
#### Diagnosis Tools
```bash
#!/bin/bash
# diagnose-behavior.sh
# Analyze behavior patterns
echo "=== Behavior Analysis ==="
# Check SOUL.md conflicts
echo "## SOUL.md Analysis:"
wc -w SOUL.md | xargs echo "Words in SOUL.md:"
grep -c "NUNCA\|NEVER\|PROHIBIDO" SOUL.md | xargs echo "Prohibitions:"
grep -c "SIEMPRE\|ALWAYS\|OBLIGATORIO" SOUL.md | xargs echo "Mandates:"
# Check for contradictions
echo "## Potential Conflicts:"
grep -n -B2 -A2 "except\|unless\|pero\|sin embargo" SOUL.md
# Recent behavior patterns
echo "## Recent Patterns:"
grep -h "DECISIÓN\|DECISION" memory/$(date +%Y-%m-%d).md memory/$(date -d yesterday +%Y-%m-%d).md 2>/dev/null | head -5
```
#### Common Fixes
```markdown
## Behavior Repair Checklist
### Instruction Conflicts
- [ ] Simplify competing priorities in SOUL.md
- [ ] Make rules more specific and unambiguous
- [ ] Remove outdated/conflicting instructions
- [ ] Add clear precedence rules
### Context Overload
- [ ] Reduce SOUL.md length (keep under 2000 words)
- [ ] Archive non-essential memory
- [ ] Split complex workflows into steps
- [ ] Use clear section headers
### Consistency Issues
- [ ] Document decision rationale
- [ ] Create templates for common responses
- [ ] Establish clear workflows
- [ ] Add behavior validation tests
```
## Advanced Debugging Techniques
### Session Replay
```bash
#!/bin/bash
# replay-session.sh
# Replay a problematic session to understand what happened
SESSION_ID="$1"
if [[ -z "$SESSION_ID" ]]; then
echo "Usage: $0 "
exit 1
fi
echo "=== Replaying Session $SESSION_ID ==="
# Find session file
SESSION_FILE=$(find ~/.clawdbot/agents/main/sessions -name "*$SESSION_ID*" -type f)
if [[ -z "$SESSION_FILE" ]]; then
echo "Session file not found"
exit 1
fi
echo "Session file: $SESSION_FILE"
echo "Size: $(wc -l < "$SESSION_FILE") lines"
echo ""
# Extract key events
echo "## Key Events:"
jq -r 'select(.level == "INFO" or .level == "ERROR" or .level == "WARN") | "\(.timestamp) [\(.level)] \(.message)"' "$SESSION_FILE" | head -20
echo ""
echo "## Tool Calls:"
jq -r 'select(.tool_name) | "\(.timestamp) \(.tool_name): \(.tool_params // {})"' "$SESSION_FILE"
echo ""
echo "## Errors:"
jq -r 'select(.level == "ERROR") | "\(.timestamp) ERROR: \(.message)"' "$SESSION_FILE"
```
### Performance Profiling
```bash
#!/bin/bash
# profile-performance.sh
# Profile agent performance metrics
echo "=== Performance Profile ==="
# Model usage stats
echo "## Model Usage (last 24h):"
find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \
| xargs grep '"model":' \
| jq -r '.model' | sort | uniq -c | sort -nr
# Response time distribution
echo "## Response Times:"
find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \
| xargs jq -r 'select(.duration) | .duration' \
| sort -n \
| awk '
{times[NR]=$1; sum+=$1}
END {
print "Min: " times[1] "ms"
print "Median: " times[int(NR/2)] "ms"
print "Max: " times[NR] "ms"
print "Average: " sum/NR "ms"
print "Total requests: " NR
}'
# Token usage
echo "## Token Usage:"
find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \
| xargs grep '"tokens":' \
| jq -r '.tokens.total' \
| awk '{sum+=$1; count++} END {print "Total tokens: " sum ", Average: " sum/count}'
```
### A/B Testing Behaviors
```bash
#!/bin/bash
# test-behavior-variants.sh
# Test different SOUL.md configurations
ORIGINAL_SOUL="SOUL.md"
TEST_SOUL="SOUL-test.md"
TEST_PROMPT="Deploy app to production"
echo "=== Behavior A/B Test ==="
# Backup original
cp "$ORIGINAL_SOUL" "$ORIGINAL_SOUL.backup"
echo "## Testing Original Configuration:"
echo "$TEST_PROMPT" | clawd-test-prompt > /tmp/response_a.txt
echo "Response length: $(wc -w < /tmp/response_a.txt) words"
echo "## Testing Modified Configuration:"
cp "$TEST_SOUL" "$ORIGINAL_SOUL"
echo "$TEST_PROMPT" | clawd-test-prompt > /tmp/response_b.txt
echo "Response length: $(wc -w < /tmp/response_b.txt) words"
# Restore original
cp "$ORIGINAL_SOUL.backup" "$ORIGINAL_SOUL"
echo "## Comparison:"
echo "A (original): $(grep -c 'confirmation\|confirm' /tmp/response_a.txt) confirmations requested"
echo "B (modified): $(grep -c 'confirmation\|confirm' /tmp/response_b.txt) confirmations requested"
diff -u /tmp/response_a.txt /tmp/response_b.txt | head -20
```
Ten cuidado con el A/B testing de comportamientos. Siempre haz backup de tu configuración original y testea en entorno seguro.
## Herramientas de Monitoring
### Health Dashboard
```bash
#!/bin/bash
# agent-health-dashboard.sh
# Generate comprehensive health report
cat << 'EOF'
╭─────────────────────────────────────╮
│ AGENT HEALTH │
╰─────────────────────────────────────╯
EOF
# System status
echo "📊 System Status:"
echo " • Uptime: $(uptime | awk '{print $3,$4}')"
echo " • Load: $(uptime | awk -F'load average:' '{print $2}')"
echo " • Memory: $(free | awk 'NR==2{printf "%.1f%%", $3*100/$2 }')"
# Agent status
echo ""
echo "🤖 Agent Status:"
echo " • Sessions today: $(find ~/.clawdbot/agents/main/sessions -name "*$(date +%Y%m%d)*" | wc -l)"
echo " • Errors today: $(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 | xargs grep -c '"level":"ERROR"' 2>/dev/null || echo "0")"
echo " • Memory health: $(ls memory/$(date +%Y-%m-%d).md >/dev/null 2>&1 && echo "✅ Current" || echo "❌ Missing")"
# Tools status
echo ""
echo "🛠️ Tools Status:"
TOOLS_OK=0
TOOLS_TOTAL=0
for tool in git docker curl; do
TOOLS_TOTAL=$((TOOLS_TOTAL + 1))
if command -v "$tool" >/dev/null; then
echo " • $tool: ✅"
TOOLS_OK=$((TOOLS_OK + 1))
else
echo " • $tool: ❌"
fi
done
echo " • Tools health: $TOOLS_OK/$TOOLS_TOTAL"
# Memory status
echo ""
echo "🧠 Memory Status:"
echo " • Files today: $(ls memory/$(date +%Y-%m-%d).md 2>/dev/null | wc -l)"
echo " • Files total: $(ls memory/*.md 2>/dev/null | wc -l)"
echo " • Index entries: $(grep -c '^\[' memory/INDEX.md 2>/dev/null || echo "0")"
echo " • Long-term size: $(wc -w < MEMORY.md 2>/dev/null || echo "0") words"
```
### Alerting System
```bash
#!/bin/bash
# agent-alerting.sh
# Simple alerting for critical issues
ALERT_EMAIL="[email protected]"
ALERT_THRESHOLD_ERRORS=10
ALERT_THRESHOLD_RESPONSE_TIME=30000 # 30 seconds
# Count errors in last hour
ERRORS_LAST_HOUR=$(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mmin -60 \
| xargs grep -c '"level":"ERROR"' 2>/dev/null | paste -sd+ - | bc 2>/dev/null || echo "0")
# Check average response time
AVG_RESPONSE_TIME=$(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mmin -60 \
| xargs jq -r 'select(.duration) | .duration' 2>/dev/null \
| awk '{sum+=$1; count++} END {if(count>0) print sum/count; else print 0}')
# Alert conditions
if (( ERRORS_LAST_HOUR > ALERT_THRESHOLD_ERRORS )); then
echo "🚨 HIGH ERROR RATE: $ERRORS_LAST_HOUR errors in last hour" | mail -s "Agent Alert: High Error Rate" "$ALERT_EMAIL"
fi
if (( $(echo "$AVG_RESPONSE_TIME > $ALERT_THRESHOLD_RESPONSE_TIME" | bc -l) )); then
echo "🐌 SLOW RESPONSES: Average ${AVG_RESPONSE_TIME}ms in last hour" | mail -s "Agent Alert: Slow Responses" "$ALERT_EMAIL"
fi
```
## Checklist de Debugging
### Cuando algo va mal:
1. **🔍 Gather Information**
- [ ] ¿Qué estaba intentando hacer el agente?
- [ ] ¿Cuál fue el input exacto del usuario?
- [ ] ¿Qué outputs/errores se produjeron?
- [ ] ¿Cuándo empezó el problema?
2. **📊 Check Basics**
- [ ] ¿Están cargados los archivos de contexto?
- [ ] ¿Las herramientas están disponibles?
- [ ] ¿Hay suficiente espacio/memoria?
- [ ] ¿La conectividad de red es buena?
3. **🔬 Deep Diagnosis**
- [ ] Revisar logs de la sesión problemática
- [ ] Reproducir el problema manualmente
- [ ] Comparar con sesiones que funcionaron
- [ ] Verificar cambios recientes en configuración
4. **🛠️ Fix and Verify**
- [ ] Aplicar fix específico al problema
- [ ] Documentar la solución en memory/
- [ ] Testear que el fix funciona
- [ ] Añadir prevención si es posible
5. **📝 Document and Learn**
- [ ] Actualizar MEMORY.md con el learning
- [ ] Añadir entry a memory/INDEX.md
- [ ] Considerar si needs mejora en TOOLS.md o SOUL.md
- [ ] Share insights con otros agentes si relevante
Back to Guides
Troubleshooting15 min read
Debugging Your Agent: Logs, Trazas y Solución de Problemas
Guía completa para diagnosticar y resolver problemas en tu agente de IA: desde logs básicos hasta técnicas avanzadas de debugging.