Manual Annotation

Manual Annotations enable human reviewers to mark, classify, and provide context on turns within GenerativeAgent conversations. When automated evaluation identifies potential issues or when you discover conversations that aren’t being handled as expected, manual annotations allow QA teams, supervisors, and AI specialists to apply human judgment, validate decisions, and deliver actionable insights for model tuning and quality processes. Human judgment complements automated evaluation, enabling accurate diagnosis of model behavior, evaluator training, and continuous GenerativeAgent improvements. Manual flagging includes:

Flagging a GenerativeAgent Turn

Reviewers will have the ability to flag specific GenerativeAgent turns and provide additional comments explaining the reason for the flag or if further action is needed.They also have the option to update Conversation Monitoring rationale to add more context for review and fine-tuning the classification.

Issue Type Classification

Assign a tag to Conversation Monitoring or manual flags to classify and investigate the type of issue identified in the flagged turn. This helps in categorizing issues for analysis and resolution.

Flag Refinement

When a turn is flagged, reviewers can:

Confirm the flag if they agree with the automated assessment
Recategorize the flag as needed
Dismiss the flag if they believe the turn is false positive
Add nuance or rationale to the flag for better context

When to Use Manual Annotations

Use manual annotations to:

Control conversation quality
- Catch issues automated systems miss: Flag incorrect, incomplete, or risky model responses that automated evaluation didn’t detect
- Ensure compliance: Highlight policy, tone, or guideline violations that require human judgment
Improve automated evaluators
- Validate automated flags: Confirm or override automated evaluator flags to improve evaluator accuracy and reduce false positives
- Train evaluators: Use confirmed or corrected flags as labeled training data to improve your automated evaluation systems
Enhance your GenerativeAgent system
- Identify knowledge gaps: Discover missing information in knowledge bases, APIs, or domain-specific instructions
- Improve orchestration: Use annotated examples to refine conversation flows and routing logic
- Enhance test suites: Add real-world scenarios from annotations to verify GenerativeAgent behavior
Manage review workflows
- Track follow-ups: Flag conversations requiring investigation, escalation, or additional review

Adding Manual Annotations

Navigate to a conversation

Navigate to the Conversations interface and select a conversation to review.

Identify the turn to annotate

Identify the GenerativeAgent turn you want to annotate.

Click Add Flag

Hover over the turn and click the Add Flag button.

Add comments and rationale

Add any supporting comments or rationale for the flag. You can also update Conversation Monitoring rationale if the turn was already flagged by automated systems.

Categorize the issue (optional)

Select a category for the issue from the dropdown menu. This helps classify the type of issue for better organization and analysis.

Add tags (optional)

Assign tags to classify the issue type. Common classifications include:

Policy violations
Tone or guideline issues
Incorrect information
Incomplete responses
Escalation required

Tags help organize and filter issues across all your conversations for systematic analysis and resolution.

Save the annotation

Click Save to apply the annotation.

You can edit existing flags at any time.For turns already flagged by automated Conversation Monitoring or other reviewers, you can confirm, recategorize, dismiss, or update the rationale to refine the annotation and improve evaluator training data.

Reviewing and Filtering annotations

The Quality tab in the Conversations interface provides a consolidated view of all annotations. For each conversation, you can see:

All flags (both manual and automated) with their categories and severity levels
Reviewer comments and rationale for each flag
The associated GenerativeAgent transcripts

Use the filters to narrow down conversations by:

Flag type (Manual vs. Automated)
Tags assigned to flags
Annotation severity level (critical or major)

Best Practices

When creating manual annotations, follow these best practices:

Write specific rationale that describes what went wrong and why it matters
Apply consistent categorization to support downstream training and evaluation
Clearly state your reasoning when overriding conversation monitoring flags
Create concise but actionable annotations

Next Steps

The insights gained from manual annotations can be used to improve GenerativeAgent’s performance in several ways:

Evaluator training: Train automated evaluators using confirmed or corrected flags as labeled data to reduce false positives and negatives
Configuration improvements: Use annotated examples to detect knowledge gaps and improve orchestration flows
Knowledge updates: Update knowledge bases, APIs, and domain-specific instructions based on feedback that identifies missing or incorrect information
Test suite enhancement: Update test suites with this feedback to verify specific GenerativeAgent behavior when deploying new tasks and configurations

Consider exploring the following evaluator for more in-depth analysis:

Goal Completion Evaluator

Spot and review customer goals not being met during GenerativeAgent interactions.

Conversation Monitoring

Monitor and review conversations for compliance and quality assurance using GenerativeAgent.

Getting Started

Build

Tasks & Functions

Test

Observe

Integrate

Reporting & Security

When to Use Manual Annotations

Adding Manual Annotations

Reviewing and Filtering annotations

Best Practices

Next Steps

Goal Completion Evaluator

Conversation Monitoring

Getting Started

Build

Tasks & Functions

Test

Observe

Integrate

Reporting & Security

​When to Use Manual Annotations

​Adding Manual Annotations

​Reviewing and Filtering annotations

​Best Practices

​Next Steps

Goal Completion Evaluator

Conversation Monitoring

When to Use Manual Annotations

Adding Manual Annotations

Reviewing and Filtering annotations

Best Practices

Next Steps