Production Checklist
Ensure your agent is ready for the real world. Don't deploy without checking these boxes.
Security#
- API Keys are loaded from environment variables (never hardcoded).
- Guardrails are active for Input (PII detection) and Output (Topic validation).
- Rate limiting is configured for your API endpoints.
- User permissions are checked inside Tools (don't trust the agent to check auth).
Reliability#
- Retry logic is implemented for LLM API calls (exponential backoff).
- Fallback models are configured (e.g., if GPT-4 is down, try Claude 3).
- Timeout limits are set for all Tool executions.
- Structured logging (Observability) is enabled.
Performance#
- System Prompts are optimized (short, clear, no fluff).
- Max tokens are capped to prevent runaway costs.
- Caching is enabled for frequent queries (semantic cache).
- Dependencies are minimized in the Docker image.
UX#
- Streaming is enabled for long-running responses.
- Clear error messages are displayed to the user (not raw stack traces).
- Citation links are provided for RAG responses.
- User feedback mechanism (Thumbs Up/Down) is in place.