Debugging Your Agent: Logs, Trazas y Solución de Problemas

## Introducción al Debugging de Agentes Debuggear un agente de IA es diferente a debuggear código tradicional. No tienes breakpoints, pero sí tienes **logs**, **trazas de comportamiento**, y **patterns de respuesta**. La clave está en saber qué observar y cómo interpretar las señales. El debugging de agentes es 70% observación de patrones y 30% conocimiento técnico. Un bug en un agente a menudo se manifiesta como comportamiento inconsistente, no como error fatal. ## Tipos de Problemas Comunes ### 1. Problemas de Comportamiento - **Síntoma**: El agente ignora instrucciones específicas - **Causa común**: Prompt overload, conflicto de prioridades - **Diagnóstico**: Revisar SOUL.md, simplificar instrucciones ### 2. Problemas de Memoria - **Síntoma**: El agente olvida contexto importante - **Causa común**: Archivos de memoria no cargados, sesiones fragmentadas - **Diagnóstico**: Verificar structure de memory/, INDEX.md ### 3. Problemas de Herramientas - **Síntoma**: APIs fallan, comandos no funcionan - **Causa común**: Tokens expirados, rate limits, permisos - **Diagnóstico**: Logs de sistema, testing manual ### 4. Problemas de Rendimiento - **Síntoma**: Respuestas lentas, timeouts - **Causa común**: Model choice incorrecto, context too large - **Diagnóstico**: Session timing, model metrics ## Sistema de Logging ### Niveles de Log ```markdown ## Estructura de Logs ### TRACE (Máximo detalle) - Function calls con parámetros - API requests/responses completos - Memory loading/saving - Token usage por request ### DEBUG (Desarrollo) - Decision points importantes - Variable states - Tool invocations - Error recovery attempts ### INFO (Producción normal) - Actions completadas exitosamente - State changes importantes - User interactions - Session events ### WARN (Atención requerida) - Rate limits approached - Fallback behaviors activated - Memory cleanup events - Performance degradation ### ERROR (Problemas críticos) - Tool failures - Memory corruption - Authentication failures - Unhandled exceptions ``` ### Log Analysis Scripts ```bash #!/bin/bash # analyze-logs.sh # Analizar patterns en logs del agente LOG_DIR="~/.clawdbot/agents/main/sessions" TODAY=$(date +%Y-%m-%d) echo "=== Agent Log Analysis ===" # Errores frecuentes echo "## Top Errors (last 24h):" find "$LOG_DIR" -name "*.jsonl" -mtime -1 \ | xargs grep '"level":"ERROR"' \ | jq -r '.message' \ | sort | uniq -c | sort -nr | head -10 # Performance issues echo "## Slow Responses (>10s):" find "$LOG_DIR" -name "*.jsonl" -mtime -1 \ | xargs grep '"duration":[0-9][0-9][0-9][0-9][0-9]' \ | jq -r '{time: .timestamp, duration: .duration, action: .action}' # Tool usage stats echo "## Tool Usage:" find "$LOG_DIR" -name "*.jsonl" -mtime -1 \ | xargs grep '"tool_name":' \ | jq -r '.tool_name' \ | sort | uniq -c | sort -nr # Memory loading patterns echo "## Memory Files Loaded:" find "$LOG_DIR" -name "*.jsonl" -mtime -1 \ | xargs grep 'memory.*loaded' \ | jq -r '.message' | sort | uniq -c ``` ## Trazas de Comportamiento ### Behavior Tracing ```markdown # Template de traza de comportamiento ## Session: abc123-def456 **Timestamp**: 2025-02-08 14:30:15 **Model**: anthropic/claude-opus-4-5 **Context Size**: 15,420 tokens ### Input User: "Deploy the app to production" ### Processing Trace 1. **14:30:15.123** - Load SOUL.md (242 tokens) 2. **14:30:15.156** - Load memory/2025-02-08.md (1,120 tokens) 3. **14:30:15.189** - Load TOOLS.md deployment section (890 tokens) 4. **14:30:15.234** - Parse user intent: deployment request 5. **14:30:15.267** - Check prerequisites: environment=production 6. **14:30:15.298** - Safety check: requires confirmation (SOUL.md rule) 7. **14:30:15.331** - Generate response: ask for confirmation ### Output Agent: "I can deploy to production, but I need confirmation since this affects live users. Which version should I deploy? Current staging is v2.3.1." ### Analysis - ✅ Correctly loaded context files - ✅ Followed safety rules from SOUL.md - ✅ Identified need for confirmation - ❌ Could have specified exact deployment steps ``` ### Manual Tracing ```bash #!/bin/bash # trace-behavior.sh # Manual tracing de comportamiento específico echo "=== Behavior Trace ===" echo "Session: $1" echo "Timestamp: $(date)" # Context loading echo "## Context Loaded:" ls -la memory/$(date +%Y-%m-%d).md 2>/dev/null && echo "✅ Today's memory" || echo "❌ Missing today's memory" ls -la SOUL.md 2>/dev/null && echo "✅ SOUL.md" || echo "❌ Missing SOUL.md" ls -la TOOLS.md 2>/dev/null && echo "✅ TOOLS.md" || echo "❌ Missing TOOLS.md" # Tool availability echo "## Tools Status:" command -v git >/dev/null && echo "✅ Git" || echo "❌ Git" command -v docker >/dev/null && echo "✅ Docker" || echo "❌ Docker" curl -s https://api.github.com >/dev/null && echo "✅ GitHub API" || echo "❌ GitHub API" # Memory health echo "## Memory Health:" wc -l memory/*.md | tail -1 find memory/ -name "*.md" -mtime +30 | wc -l | xargs echo "Old files:" ``` ## Debugging por Categorías ### Memory Issues #### Síntomas - Agente repite errores ya resueltos - No recuerda decisiones importantes - Pierde contexto de proyectos #### Diagnosis Tools ```bash # Memory file integrity echo "=== Memory Diagnostics ===" # Check if files exist and are readable for file in MEMORY.md memory/$(date +%Y-%m-%d).md memory/INDEX.md; do if [[ -r "$file" ]]; then echo "✅ $file ($(wc -l < "$file") lines)" else echo "❌ $file (missing/unreadable)" fi done # Check memory cross-references grep -n "\[.*\]" memory/INDEX.md | head -5 echo "Cross-references found: $(grep -c "\[.*\]" memory/INDEX.md)" # Check for orphaned references grep -o "memory/[0-9-]*\.md" memory/*.md | sort -u > /tmp/referenced ls memory/20*.md > /tmp/existing echo "Orphaned files: $(comm -23 /tmp/existing /tmp/referenced | wc -l)" ``` #### Common Fixes ```markdown ## Memory Repair Checklist ### Missing Files - [ ] Create today's memory file: `memory/$(date +%Y-%m-%d).md` - [ ] Check MEMORY.md exists and is recent - [ ] Verify INDEX.md has recent entries ### Broken References - [ ] Fix broken links in INDEX.md - [ ] Update session references in daily files - [ ] Clean up orphaned memory files ### Content Issues - [ ] Remove duplicate entries - [ ] Merge related decisions - [ ] Archive old unimportant entries ``` ### Tool Integration Issues #### Síntomas - APIs devuelven errores 401/403 - Commands fail silently - Rate limits exceeded constantemente #### Diagnosis Tools ```bash #!/bin/bash # diagnose-tools.sh # Test all tool integrations echo "=== Tool Integration Diagnostics ===" # API tokens echo "## API Authentication:" for service in github_token slack_bot_token hetzner_token; do if security find-generic-password -a clawdbot -s "$service" >/dev/null 2>&1; then echo "✅ $service exists in keychain" # Test token validity (GitHub example) if [[ "$service" == "github_token" ]]; then TOKEN=$(security find-generic-password -a clawdbot -s github_token -w) if curl -s -H "Authorization: Bearer $TOKEN" https://api.github.com/user >/dev/null; then echo " → Token is valid" else echo " → ❌ Token is invalid/expired" fi fi else echo "❌ $service missing from keychain" fi done # Command availability echo "## Command Availability:" for cmd in git docker npm curl jq; do if command -v "$cmd" >/dev/null; then echo "✅ $cmd ($(command -v "$cmd"))" else echo "❌ $cmd not found" fi done # Network connectivity echo "## Network Tests:" for host in api.github.com api.hetzner.cloud slack.com; do if curl -s --max-time 5 "$host" >/dev/null; then echo "✅ $host reachable" else echo "❌ $host unreachable" fi done ``` #### Common Fixes ```markdown ## Tool Repair Checklist ### Authentication Issues - [ ] Regenerate expired tokens - [ ] Update keychain with new tokens - [ ] Test token validity manually - [ ] Check token scopes/permissions ### Network Issues - [ ] Check VPN connection status - [ ] Verify firewall rules - [ ] Test DNS resolution - [ ] Check rate limit headers ### Command Issues - [ ] Verify command installation - [ ] Check PATH variables - [ ] Test command manually - [ ] Review command permissions ``` ### Behavior Issues #### Síntomas - Agente ignora instrucciones específicas - Respuestas inconsistentes - No sigue workflows establecidos #### Diagnosis Tools ```bash #!/bin/bash # diagnose-behavior.sh # Analyze behavior patterns echo "=== Behavior Analysis ===" # Check SOUL.md conflicts echo "## SOUL.md Analysis:" wc -w SOUL.md | xargs echo "Words in SOUL.md:" grep -c "NUNCA\|NEVER\|PROHIBIDO" SOUL.md | xargs echo "Prohibitions:" grep -c "SIEMPRE\|ALWAYS\|OBLIGATORIO" SOUL.md | xargs echo "Mandates:" # Check for contradictions echo "## Potential Conflicts:" grep -n -B2 -A2 "except\|unless\|pero\|sin embargo" SOUL.md # Recent behavior patterns echo "## Recent Patterns:" grep -h "DECISIÓN\|DECISION" memory/$(date +%Y-%m-%d).md memory/$(date -d yesterday +%Y-%m-%d).md 2>/dev/null | head -5 ``` #### Common Fixes ```markdown ## Behavior Repair Checklist ### Instruction Conflicts - [ ] Simplify competing priorities in SOUL.md - [ ] Make rules more specific and unambiguous - [ ] Remove outdated/conflicting instructions - [ ] Add clear precedence rules ### Context Overload - [ ] Reduce SOUL.md length (keep under 2000 words) - [ ] Archive non-essential memory - [ ] Split complex workflows into steps - [ ] Use clear section headers ### Consistency Issues - [ ] Document decision rationale - [ ] Create templates for common responses - [ ] Establish clear workflows - [ ] Add behavior validation tests ``` ## Advanced Debugging Techniques ### Session Replay ```bash #!/bin/bash # replay-session.sh # Replay a problematic session to understand what happened SESSION_ID="$1" if [[ -z "$SESSION_ID" ]]; then echo "Usage: $0 " exit 1 fi echo "=== Replaying Session $SESSION_ID ===" # Find session file SESSION_FILE=$(find ~/.clawdbot/agents/main/sessions -name "*$SESSION_ID*" -type f) if [[ -z "$SESSION_FILE" ]]; then echo "Session file not found" exit 1 fi echo "Session file: $SESSION_FILE" echo "Size: $(wc -l < "$SESSION_FILE") lines" echo "" # Extract key events echo "## Key Events:" jq -r 'select(.level == "INFO" or .level == "ERROR" or .level == "WARN") | "\(.timestamp) [\(.level)] \(.message)"' "$SESSION_FILE" | head -20 echo "" echo "## Tool Calls:" jq -r 'select(.tool_name) | "\(.timestamp) \(.tool_name): \(.tool_params // {})"' "$SESSION_FILE" echo "" echo "## Errors:" jq -r 'select(.level == "ERROR") | "\(.timestamp) ERROR: \(.message)"' "$SESSION_FILE" ``` ### Performance Profiling ```bash #!/bin/bash # profile-performance.sh # Profile agent performance metrics echo "=== Performance Profile ===" # Model usage stats echo "## Model Usage (last 24h):" find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \ | xargs grep '"model":' \ | jq -r '.model' | sort | uniq -c | sort -nr # Response time distribution echo "## Response Times:" find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \ | xargs jq -r 'select(.duration) | .duration' \ | sort -n \ | awk ' {times[NR]=$1; sum+=$1} END { print "Min: " times[1] "ms" print "Median: " times[int(NR/2)] "ms" print "Max: " times[NR] "ms" print "Average: " sum/NR "ms" print "Total requests: " NR }' # Token usage echo "## Token Usage:" find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 \ | xargs grep '"tokens":' \ | jq -r '.tokens.total' \ | awk '{sum+=$1; count++} END {print "Total tokens: " sum ", Average: " sum/count}' ``` ### A/B Testing Behaviors ```bash #!/bin/bash # test-behavior-variants.sh # Test different SOUL.md configurations ORIGINAL_SOUL="SOUL.md" TEST_SOUL="SOUL-test.md" TEST_PROMPT="Deploy app to production" echo "=== Behavior A/B Test ===" # Backup original cp "$ORIGINAL_SOUL" "$ORIGINAL_SOUL.backup" echo "## Testing Original Configuration:" echo "$TEST_PROMPT" | clawd-test-prompt > /tmp/response_a.txt echo "Response length: $(wc -w < /tmp/response_a.txt) words" echo "## Testing Modified Configuration:" cp "$TEST_SOUL" "$ORIGINAL_SOUL" echo "$TEST_PROMPT" | clawd-test-prompt > /tmp/response_b.txt echo "Response length: $(wc -w < /tmp/response_b.txt) words" # Restore original cp "$ORIGINAL_SOUL.backup" "$ORIGINAL_SOUL" echo "## Comparison:" echo "A (original): $(grep -c 'confirmation\|confirm' /tmp/response_a.txt) confirmations requested" echo "B (modified): $(grep -c 'confirmation\|confirm' /tmp/response_b.txt) confirmations requested" diff -u /tmp/response_a.txt /tmp/response_b.txt | head -20 ``` Ten cuidado con el A/B testing de comportamientos. Siempre haz backup de tu configuración original y testea en entorno seguro. ## Herramientas de Monitoring ### Health Dashboard ```bash #!/bin/bash # agent-health-dashboard.sh # Generate comprehensive health report cat << 'EOF' ╭─────────────────────────────────────╮ │ AGENT HEALTH │ ╰─────────────────────────────────────╯ EOF # System status echo "📊 System Status:" echo " • Uptime: $(uptime | awk '{print $3,$4}')" echo " • Load: $(uptime | awk -F'load average:' '{print $2}')" echo " • Memory: $(free | awk 'NR==2{printf "%.1f%%", $3*100/$2 }')" # Agent status echo "" echo "🤖 Agent Status:" echo " • Sessions today: $(find ~/.clawdbot/agents/main/sessions -name "*$(date +%Y%m%d)*" | wc -l)" echo " • Errors today: $(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -1 | xargs grep -c '"level":"ERROR"' 2>/dev/null || echo "0")" echo " • Memory health: $(ls memory/$(date +%Y-%m-%d).md >/dev/null 2>&1 && echo "✅ Current" || echo "❌ Missing")" # Tools status echo "" echo "🛠️ Tools Status:" TOOLS_OK=0 TOOLS_TOTAL=0 for tool in git docker curl; do TOOLS_TOTAL=$((TOOLS_TOTAL + 1)) if command -v "$tool" >/dev/null; then echo " • $tool: ✅" TOOLS_OK=$((TOOLS_OK + 1)) else echo " • $tool: ❌" fi done echo " • Tools health: $TOOLS_OK/$TOOLS_TOTAL" # Memory status echo "" echo "🧠 Memory Status:" echo " • Files today: $(ls memory/$(date +%Y-%m-%d).md 2>/dev/null | wc -l)" echo " • Files total: $(ls memory/*.md 2>/dev/null | wc -l)" echo " • Index entries: $(grep -c '^\[' memory/INDEX.md 2>/dev/null || echo "0")" echo " • Long-term size: $(wc -w < MEMORY.md 2>/dev/null || echo "0") words" ``` ### Alerting System ```bash #!/bin/bash # agent-alerting.sh # Simple alerting for critical issues ALERT_EMAIL="[email protected]" ALERT_THRESHOLD_ERRORS=10 ALERT_THRESHOLD_RESPONSE_TIME=30000 # 30 seconds # Count errors in last hour ERRORS_LAST_HOUR=$(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mmin -60 \ | xargs grep -c '"level":"ERROR"' 2>/dev/null | paste -sd+ - | bc 2>/dev/null || echo "0") # Check average response time AVG_RESPONSE_TIME=$(find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mmin -60 \ | xargs jq -r 'select(.duration) | .duration' 2>/dev/null \ | awk '{sum+=$1; count++} END {if(count>0) print sum/count; else print 0}') # Alert conditions if (( ERRORS_LAST_HOUR > ALERT_THRESHOLD_ERRORS )); then echo "🚨 HIGH ERROR RATE: $ERRORS_LAST_HOUR errors in last hour" | mail -s "Agent Alert: High Error Rate" "$ALERT_EMAIL" fi if (( $(echo "$AVG_RESPONSE_TIME > $ALERT_THRESHOLD_RESPONSE_TIME" | bc -l) )); then echo "🐌 SLOW RESPONSES: Average ${AVG_RESPONSE_TIME}ms in last hour" | mail -s "Agent Alert: Slow Responses" "$ALERT_EMAIL" fi ``` ## Checklist de Debugging ### Cuando algo va mal: 1. **🔍 Gather Information** - [ ] ¿Qué estaba intentando hacer el agente? - [ ] ¿Cuál fue el input exacto del usuario? - [ ] ¿Qué outputs/errores se produjeron? - [ ] ¿Cuándo empezó el problema? 2. **📊 Check Basics** - [ ] ¿Están cargados los archivos de contexto? - [ ] ¿Las herramientas están disponibles? - [ ] ¿Hay suficiente espacio/memoria? - [ ] ¿La conectividad de red es buena? 3. **🔬 Deep Diagnosis** - [ ] Revisar logs de la sesión problemática - [ ] Reproducir el problema manualmente - [ ] Comparar con sesiones que funcionaron - [ ] Verificar cambios recientes en configuración 4. **🛠️ Fix and Verify** - [ ] Aplicar fix específico al problema - [ ] Documentar la solución en memory/ - [ ] Testear que el fix funciona - [ ] Añadir prevención si es posible 5. **📝 Document and Learn** - [ ] Actualizar MEMORY.md con el learning - [ ] Añadir entry a memory/INDEX.md - [ ] Considerar si needs mejora en TOOLS.md o SOUL.md - [ ] Share insights con otros agentes si relevante