Learn how to test GenerativeAgent to ensure it handles customer scenarios correctly before production launch.
Functional Testing is a critical step in evaluating GenerativeAgent after setting requirements for Tasks and Functions. Given the dynamic nature of Large Language Models (LLMs), it’s essential to validate that GenerativeAgent works as expected in various scenarios. Testing is the best strategy to ensure reliability and performance before launching any task into production.This testing phase is a crucial part of your integration process. We strongly recommend completing thorough functional testing, with assistance from the ASAPP team, before deploying GenerativeAgent in a live environment. This process involves verifying, validating, and confirming that GenerativeAgent functions as expected across a wide range of potential user interactions.It’s helpful to have a high-level overview of how GenerativeAgent works while planning your testing. GenerativeAgent assumes it is engaging with a customer who has a problem it can help resolve. GenerativeAgent uses a combination of:
Task Instructions
API Response Data
Retrieved Knowledge Base Articles
If GenerativeAgent cannot help the customer or is unsure about what to do, it will offer to escalate to a live agent.
Functional Testing is performed after your ASAPP Team has configured GenerativeAgent Tasks and Functions. You will be able to fully integrate GenerativeAgent into your apps after the tests are passed.
In the pretesting phase, keep in mind cases like the following use case scenarios:Reading a sample of production scenarios for this task:
Read summaries for 100 sample conversations to understand typical conversations within this use case across both the virtual agent and those that escalate to a live agent
Have clear should/must-dos for each task
Have a clear idea of the things that GenerativeAgent should do vs. must do within each task
Keep in mind the common scenarios you expect users to go through based on the sample of real conversations
Clear test users to do the testing
Consider the permutations of test data that are important to cover. For example:
Someone with a flight canceled a few minutes ago
Someone with two flights, one which is canceled and one which is not
Someone with elite status vs. someone with no status
Once you’ve completed the pretesting phase, you’re ready to start testing GenerativeAgent itself. This phase involves simulating real-world scenarios and interactions to ensure GenerativeAgent performs as expected. Here are some key points to keep in mind:
Aim to test approximately 100 conversations per use case
Go through the expected conversation scenarios, as relevant, for each of the test users
Make sure to operate in a manner that is consistent with the data in the test account you are using
Formulate questions, based on the sample of conversations, that aim to test the knowledge articles available to GenerativeAgent
Plan to repeat some scenarios with slight variation to ensure GenerativeAgent responses are consistent (though no response is likely to ever be exactly the same due to its generative nature)
Confirm that GenerativeAgent correctly invokes the flight_status task
Verify that GenerativeAgent identifies the necessary information from the customer to verify the flight
Ensure that GenerativeAgent requests the required information (confirmation number and last name)
Check that the appropriate API is called
Validate the information provided by the customer through the API
Ensure GenerativeAgent gathers the necessary flight status information
Confirm GenerativeAgent accurately communicates the flight status to the customer
This example illustrates the “happy path.” But there are other scenarios such as: what if the customer only provides a confirmation number? Can they provide alternative information? What if the customer doesn’t have a confirmation number? Consider other potential scenarios and instructions to test against.