AutoTranscribe Websocket
Integrate AutoTranscribe for real-time speech-to-text transcription
Your organization can use AutoTranscribe to transcribe voice interactions between contact center agents and their customers, supporting various use cases including analysis, coaching, and quality management.
ASAPP AutoTranscribe is a streaming speech-to-text transcription service that works with both live streams and audio recordings of completed calls. Integrating your voice system with GenerativeAgent using the AutoTranscribe Websocket enables real-time communication, allowing for seamless interaction between your voice platform and GenerativeAgent’s services.
AutoTranscribe is powered by a speech recognition model that transforms spoken form to written forms in real-time, including punctuation and capitalization. The model can be customized to support domain-specific needs by training on historical call audio and adding custom vocabulary to further boost recognition accuracy.
How it works
-
Create SSE Stream: The Event Handler (which may exist on the IVR or be a dedicated service) creates a Server-Sent Events (SSE) stream with GenerativeAgent.
-
Audio Stream: The IVR sends the audio stream from the end user to AutoTranscribe.
-
Create Conversation: The IVR creates a conversation and adds messages to the Conversation Data.
-
Request Analysis: The IVR requests GenerativeAgent to analyze the conversation.
The Event Handler then handles events sent via SSE, including GenerativeAgent’s reply, which is sent back to the user through the IVR.
Benefits of using Websocket to Stream events
- Persistent connection between your voice system and the GenerativeAgent server
- API streaming for audio, call signaling, and returned transcripts
- Real-time data exchange for quick responses and efficient handling of user queries
- Bi-directional communication for smooth and responsive interaction
Before you Begin
Before you start integrating to GenerativeAgent, you need to:
- Get your API Key Id and Secret
- Ensure your API key has been configured to access AutoTranscribe and GenerativeAgent APIs. Reach out to your ASAPP team if you unsure.
- Configure Tasks and Functions.
Implementation Steps
- Create AutoTranscribe Streaming URL
- Listen and Handle GenerativeAgent Events
- Open a Connection
- Start an Audio Stream
- Send the Audio Stream
- Analyze the conversation with GenerativeAgent
- Stop the Audio Stream
Step 1: Create AutoTranscribe Streaming URL
First, you need to create a streaming URL that will be the WebSocket connection to AutoTranscribe.
A successful response returns a 200 and a secure WebSocket short-lived access URL (TTL: 5 minutes):
Step 2: Listen and Handle GenerativeAgent Events
GenerativeAgent sends events for all conversations through a single Server-Sent-Event (SSE) stream. Listen and handle these events to enable GenerativeAgent interaction with your users.
Step 3: Open a Connection
Create the WebSocket connection using the access URL:
wss://<internal-voice-gateway-ingress>?token=<short_lived_access_token>
Step 4: Start a stream audio message
Start streaming audio into the AutoTranscribe Websocket using this message sequence:
Your Stream Request | ASAPP Response |
---|---|
startStream message | startResponse message |
Stream audio - audio-in | transcript message |
finishStream message | finalResponse message |
Format WebSocket protocol request messages as text (UTF-8 encoded string data); only the audio stream should be in binary format. All response messages will be formatted as text.
Send a startStream
message:
You’ll receive a startResponse
:
Step 5: Send the audio stream
Stream audio as binary data:
ws.send(<binary_blob>)
You’ll receive transcript
messages:
Step 6: Analyze conversations with GenerativeAgent
Call the /analyze
endpoint to evaluate the conversation:
You can also include a message when calling analyze:
Step 7: Stop the streaming audio message
Send a finishStream
message:
You’ll receive a finalResponse
:
Next Steps
With your system integrated into GenerativeAgent, you’re ready to use it. You may find these other pages helpful:
Was this page helpful?