> ## Documentation Index
> Fetch the complete documentation index at: https://docs.asapp.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Manual Annotation

> Human-in-the-loop evaluation through manual conversation review

Manual Annotations enable human reviewers to mark, classify, and provide context on turns within GenerativeAgent conversations. When automated evaluation identifies potential issues or when you discover conversations that aren't being handled as expected, manual annotations allow QA teams, supervisors, and AI specialists to apply human judgment, validate decisions, and deliver actionable insights for model tuning and quality processes.

Human judgment complements automated evaluation, enabling accurate diagnosis of model behavior, evaluator training, and continuous GenerativeAgent improvements.

Manual flagging includes:

<AccordionGroup>
  <Accordion title="Flagging a GenerativeAgent Turn">
    Reviewers will have the ability to flag specific GenerativeAgent turns and provide additional comments explaining the reason for the flag or if further action is needed.

    They also have the option to update Conversation Monitoring rationale to add more context for review and fine-tuning the classification.
  </Accordion>

  <Accordion title="Issue Type Classification">
    Assign a tag to Conversation Monitoring or manual flags to classify and investigate the type of issue identified in the flagged turn. This helps in categorizing issues for analysis and resolution.
  </Accordion>

  <Accordion title="Flag Refinement">
    When a turn is flagged, reviewers can:

    * Confirm the flag if they agree with the automated assessment
    * Recategorize the flag as needed
    * Dismiss the flag if they believe the turn is false positive
    * Add nuance or rationale to the flag for better context
  </Accordion>
</AccordionGroup>

## When to Use Manual Annotations

Use manual annotations to:

* **Control conversation quality**
  * Catch issues automated systems miss: Flag incorrect, incomplete, or risky model responses that automated evaluation didn't detect
  * Ensure compliance: Highlight policy, tone, or guideline violations that require human judgment

* **Improve automated evaluators**
  * Validate automated flags: Confirm or override automated evaluator flags to improve evaluator accuracy and reduce false positives
  * Train evaluators: Use confirmed or corrected flags as labeled training data to improve your automated evaluation systems

* **Enhance your GenerativeAgent system**
  * Identify knowledge gaps: Discover missing information in knowledge bases, APIs, or domain-specific instructions
  * Improve orchestration: Use annotated examples to refine conversation flows and routing logic
  * Enhance test suites: Add real-world scenarios from annotations to verify GenerativeAgent behavior

* **Manage review workflows**
  * Track follow-ups: Flag conversations requiring investigation, escalation, or additional review

## Adding Manual Annotations

<Steps>
  <Step title="Navigate to a conversation">
    Navigate to the [Conversations](/generativeagent/configuring/conversations) interface and select a conversation to review.
  </Step>

  <Step title="Identify the turn to annotate">
    Identify the GenerativeAgent turn you want to annotate.
  </Step>

  <Step title="Click Add Flag">
    Hover over the turn and click the **Add Flag** button.
  </Step>

  <Step title="Add comments and rationale">
    Add any supporting **comments** or **rationale** for the flag. You can also update Conversation Monitoring rationale if the turn was already flagged by automated systems.
  </Step>

  <Step title="Categorize the issue (optional)">
    Select a category for the issue from the dropdown menu. This helps classify the type of issue for better organization and analysis.
  </Step>

  <Step title="Add tags (optional)">
    Assign tags to classify the issue type. Common classifications include:

    * Policy violations
    * Tone or guideline issues
    * Incorrect information
    * Incomplete responses
    * Escalation required

    Tags help organize and filter issues across all your conversations for systematic analysis and resolution.
  </Step>

  <Step title="Save the annotation">
    Click **Save** to apply the annotation.

    <Frame>
      <img src="https://mintcdn.com/asapp/ICz__wGhBuXDmBSc/images/generativeagent/manual-annotation-add.gif?s=fd3ca18fe0c6eed5015282803c43b329" alt="Add Manual Annotation" style={{maxWidth: "800px", maxHeight: "600px"}} width="1038" height="1550" data-path="images/generativeagent/manual-annotation-add.gif" />
    </Frame>
  </Step>
</Steps>

<Note>
  You can edit existing flags at any time.

  For turns already flagged by automated Conversation Monitoring or other reviewers, you can confirm, recategorize, dismiss, or update the rationale to refine the annotation and improve evaluator training data.
</Note>

## Reviewing and Filtering annotations

The **Quality** tab in the [Conversations](/generativeagent/configuring/conversations) interface provides a consolidated view of all annotations. For each conversation, you can see:

* All flags (both manual and automated) with their categories and severity levels
* Reviewer comments and rationale for each flag
* The associated GenerativeAgent transcripts

Use the filters to narrow down conversations by:

* Flag type (Manual vs. Automated)
* Tags assigned to flags
* Annotation severity level (critical or major)

<Frame>
  <img src="https://mintcdn.com/asapp/bSPknm73NAzIX3Ak/images/generativeagent/reporting/ce-flags-filter.png?fit=max&auto=format&n=bSPknm73NAzIX3Ak&q=85&s=a35cbde943efe15ff8d38394f91c43ad" alt="Flags filter" width="368" height="190" data-path="images/generativeagent/reporting/ce-flags-filter.png" />
</Frame>

## Best Practices

When creating manual annotations, follow these best practices:

* Write specific rationale that describes what went wrong and why it matters
* Apply consistent categorization to support downstream training and evaluation
* Clearly state your reasoning when overriding conversation monitoring flags
* Create concise but actionable annotations

## Next Steps

The insights gained from manual annotations can be used to improve GenerativeAgent's performance in several ways:

* **Evaluator training**: Train automated evaluators using confirmed or corrected flags as labeled data to reduce false positives and negatives
* **Configuration improvements**: Use annotated examples to detect knowledge gaps and improve orchestration flows
* **Knowledge updates**: Update knowledge bases, APIs, and domain-specific instructions based on feedback that identifies missing or incorrect information
* **Test suite enhancement**: Update test suites with this feedback to verify specific GenerativeAgent behavior when deploying new tasks and configurations

Consider exploring the following evaluator for more in-depth analysis:

<CardGroup>
  <Card title="Goal Completion Evaluator" href="/generativeagent/observe/evaluators/goal-completion">
    Spot and review customer goals not being met during GenerativeAgent interactions.
  </Card>

  <Card title="Conversation Monitoring" href="/generativeagent/observe/evaluators/conversation-monitoring">
    Monitor and review conversations for compliance and quality assurance using GenerativeAgent.
  </Card>
</CardGroup>
