> ## Documentation Index > Fetch the complete documentation index at: https://docs.asapp.com/llms.txt > Use this file to discover all available pages before exploring further. # Manual Annotation > Human-in-the-loop evaluation through manual conversation review Manual Annotations enable human reviewers to mark, classify, and provide context on turns within GenerativeAgent conversations. When automated evaluation identifies potential issues or when you discover conversations that aren't being handled as expected, manual annotations allow QA teams, supervisors, and AI specialists to apply human judgment, validate decisions, and deliver actionable insights for model tuning and quality processes. Human judgment complements automated evaluation, enabling accurate diagnosis of model behavior, evaluator training, and continuous GenerativeAgent improvements. Manual flagging includes: Reviewers will have the ability to flag specific GenerativeAgent turns and provide additional comments explaining the reason for the flag or if further action is needed. They also have the option to update Conversation Monitoring rationale to add more context for review and fine-tuning the classification. Assign a tag to Conversation Monitoring or manual flags to classify and investigate the type of issue identified in the flagged turn. This helps in categorizing issues for analysis and resolution. When a turn is flagged, reviewers can: * Confirm the flag if they agree with the automated assessment * Recategorize the flag as needed * Dismiss the flag if they believe the turn is false positive * Add nuance or rationale to the flag for better context ## When to Use Manual Annotations Use manual annotations to: * **Control conversation quality** * Catch issues automated systems miss: Flag incorrect, incomplete, or risky model responses that automated evaluation didn't detect * Ensure compliance: Highlight policy, tone, or guideline violations that require human judgment * **Improve automated evaluators** * Validate automated flags: Confirm or override automated evaluator flags to improve evaluator accuracy and reduce false positives * Train evaluators: Use confirmed or corrected flags as labeled training data to improve your automated evaluation systems * **Enhance your GenerativeAgent system** * Identify knowledge gaps: Discover missing information in knowledge bases, APIs, or domain-specific instructions * Improve orchestration: Use annotated examples to refine conversation flows and routing logic * Enhance test suites: Add real-world scenarios from annotations to verify GenerativeAgent behavior * **Manage review workflows** * Track follow-ups: Flag conversations requiring investigation, escalation, or additional review ## Adding Manual Annotations Navigate to the [Conversations](/generativeagent/configuring/conversations) interface and select a conversation to review. Identify the GenerativeAgent turn you want to annotate. Hover over the turn and click the **Add Flag** button. Add any supporting **comments** or **rationale** for the flag. You can also update Conversation Monitoring rationale if the turn was already flagged by automated systems. Select a category for the issue from the dropdown menu. This helps classify the type of issue for better organization and analysis. Assign tags to classify the issue type. Common classifications include: * Policy violations * Tone or guideline issues * Incorrect information * Incomplete responses * Escalation required Tags help organize and filter issues across all your conversations for systematic analysis and resolution. Click **Save** to apply the annotation. Add Manual Annotation

You can edit existing flags at any time. For turns already flagged by automated Conversation Monitoring or other reviewers, you can confirm, recategorize, dismiss, or update the rationale to refine the annotation and improve evaluator training data. ## Reviewing and Filtering annotations The **Quality** tab in the [Conversations](/generativeagent/configuring/conversations) interface provides a consolidated view of all annotations. For each conversation, you can see: * All flags (both manual and automated) with their categories and severity levels * Reviewer comments and rationale for each flag * The associated GenerativeAgent transcripts Use the filters to narrow down conversations by: * Flag type (Manual vs. Automated) * Tags assigned to flags * Annotation severity level (critical or major) Flags filter

## Best Practices When creating manual annotations, follow these best practices: * Write specific rationale that describes what went wrong and why it matters * Apply consistent categorization to support downstream training and evaluation * Clearly state your reasoning when overriding conversation monitoring flags * Create concise but actionable annotations ## Next Steps The insights gained from manual annotations can be used to improve GenerativeAgent's performance in several ways: * **Evaluator training**: Train automated evaluators using confirmed or corrected flags as labeled data to reduce false positives and negatives * **Configuration improvements**: Use annotated examples to detect knowledge gaps and improve orchestration flows * **Knowledge updates**: Update knowledge bases, APIs, and domain-specific instructions based on feedback that identifies missing or incorrect information * **Test suite enhancement**: Update test suites with this feedback to verify specific GenerativeAgent behavior when deploying new tasks and configurations Consider exploring the following evaluator for more in-depth analysis: Spot and review customer goals not being met during GenerativeAgent interactions. Monitor and review conversations for compliance and quality assurance using GenerativeAgent.