Skip to main content
Version: 0.3.0

Fine-Tuning Guide

Fine-tuning is like tweaking a recipe to make it just right—it’s about refining your AI agent’s instructions based on real-world performance. If the agent acts oddly, takes too long, or misses the mark, fine-tuning helps it deliver better results.


Why Fine-Tune?

You should fine-tune your agent when you notice:

  • Unexpected Behavior - Wrong choices, asking for help too often, skipping steps, inconsistent results.
  • Performance Issues - Slow tasks, frequent failures, too much human intervention, uneven quality.
  • Changing Needs - Business rules evolve, new scenarios pop up, or quality standards shift.

The Fine-Tuning Process

Follow these 5 steps to improve your AI agent.

1. Check Performance

Start by evaluating how well your agent is operating:

  • Review task history or the analytics dashboard.

  • Focus on key metrics:

    • Success rate (aim for >90%)
    • Average task duration
    • Human intervention rate
    • Error frequency
    • Output quality
  • Look for recurring patterns such as repeated failures or inconsistent results.


2. Find the Root Cause

Ask: Why isn’t this working? Common reasons:

  • Unclear Instructions → Vague, ambiguous, or conflicting rules.
  • Unhandled Scenarios → Edge cases, unexpected data, or missing error handling.
  • Agent Limits → Overly complex tasks, missing permissions, or resource constraints.

3. Refine Instructions

Make instructions clear, specific, and thorough.

Before (Vague):

"Process the customer order and send confirmation."

After (Specific):

Example

Process the customer order by following these steps:

  1. VALIDATE ORDER:
    • Check email format, required fields, and inventory
    • If validation fails → create support ticket, stop processing
  2. PROCESS PAYMENT:
    • Use Stripe integration
    • If fails → send "payment_failed" email, stop
    • If succeeds → update order to "paid"
  3. SEND CONFIRMATION:
    • Use "order_confirmation" template
    • Include order details + delivery estimate
    • Log confirmation in order notes

Notice how this example breaks down the task into clear, sequential steps, includes validation and error handling, and ensures nothing is left ambiguous.

tip

Don’t just refine the steps, add context, examples, and error handling.

  • Context helps the agent understand business rules and priorities.
  • Examples show what “good” output looks like.
  • Error handling ensures smooth recovery when things go wrong.

4. Test and Validate

Treat this like a quality check before deployment:

  • Design test cases that cover both everyday scenarios and tricky edge cases.
  • Run A/B comparisons between the old and updated instructions.
  • Measure improvements in accuracy, speed, and reliability.
  • Use a simple validation checklist:
    • Clarity → Are the instructions easy to follow, with no room for confusion?
    • Coverage → Do they account for both normal situations and rare edge cases?
    • Impact → Do test results show real improvements (fewer errors, faster results, higher success rate)?
tip

Don’t just test in perfect conditions, try to break the system on purpose.
Unexpected inputs, missing data, or unusual workflows often reveal weaknesses you’d miss in “happy path” testing.


5. Keep Improving

Optimization isn’t a one-time task, it’s a continuous feedback loop:

  • Monitor performance → Set up alerts to quickly detect dips or anomalies.
  • Review regularly → Analyze results on a schedule (weekly, monthly) to spot trends.
  • Iterate in small steps → Apply incremental tweaks rather than massive overhauls.
  • Document & share → Record changes, lessons, and outcomes so the whole team benefits.

Consistent monitoring and refinement ensure long-term reliability and performance.


Common Scenarios + Fixes

Agent Asks for Help Too Often

Fix: Add decision criteria.

Example
Decision criteria
  • <$100 → Auto-approve
  • $100–$1000 → Check history, approve if good standing
  • >$1000 → Require human approval
  • New customer >$500 → Human approval
  • Existing customer good history → Auto-approve up to $2000
Ask for help only if:
  • Disputed charges in last 6 months
  • Restricted items
  • Shipping ≠ billing address
  • Payment fails multiple times

Inconsistent Output Quality

Fix: Define a quality checklist.

Example
OUTPUT REQUIREMENTS
  • Subject line ≤ 50 chars, include order number
  • Body: professional tone, all required info
  • Must include: order details, shipping info, tracking number, support contact

QUALITY CHECKLIST:
  • [ ] Customer info accurate
  • [ ] Address formatted correctly
  • [ ] Tracking number valid
  • [ ] Personalized greeting
  • [ ] No typos or grammar errors

Poor Error Handling

Fix: Add error-handling flows.

Error Handling Flows
PAYMENT FAILURES:
  1. Retry after 5 min
  2. If still fails → email "payment_failed", create support ticket
INVENTORY ISSUES:
  1. Suggest alternatives
  2. If none → email "out_of_stock", alert procurement
SYSTEM ERRORS:
  1. Retry after 2 min
  2. If still down → manual task, notify admin, email "processing_delay"

Best Practices

  1. Write Clearly → Use simple, precise language
  2. Give Examples → Show good vs bad outputs
  3. Test Thoroughly → Include edge cases
  4. Measure Results → Always track metrics
  5. Collaborate → Share learnings, use version control

Troubleshooting Tips

Instructions Ignored?

This usually happens if instructions are too complex or ambiguous. Try to:

  • Add more context so the system understands intent better.
  • Simplify language to remove unnecessary complexity.
  • Run small test cases to check comprehension before scaling up.
Slow Performance?

Performance can degrade if prompts are too long or if the system is resource-limited. Consider:

  • Breaking down large tasks into smaller, more efficient steps.
  • Checking the complexity of the workflow—some processes may be unnecessarily heavy.

  • Ensuring system resources are sufficient (CPU, memory, API limits).
Inconsistent Results?

Inconsistency often comes from unstructured inputs or lack of checks. Best practices:

  • Standardize inputs (consistent formats, terminology, units).
  • Add validation checks to catch errors before execution.
  • Use guardrails like schema validation or strict parsing rules.

Next Steps