AutoTranscribe Websocket

Your organization can use AutoTranscribe to transcribe voice interactions between contact center agents and their customers, supporting various use cases including analysis, coaching, and quality management. ASAPP AutoTranscribe is a streaming speech-to-text transcription service that works with both live streams and audio recordings of completed calls. Integrating your voice system with GenerativeAgent using the AutoTranscribe Websocket enables real-time communication, allowing for seamless interaction between your voice platform and GenerativeAgent’s services. AutoTranscribe is powered by a speech recognition model that transforms spoken form to written forms in real-time, including punctuation and capitalization. The model can be customized to support domain-specific needs by training on historical call audio and adding custom vocabulary to further boost recognition accuracy.

How it works

Create SSE Stream: The Event Handler (which may exist on the IVR or be a dedicated service) creates a Server-Sent Events (SSE) stream with GenerativeAgent.
Audio Stream: The IVR sends the audio stream from the end user to AutoTranscribe.
Create Conversation: The IVR creates a conversation and adds messages to the Conversation Data.
Request Analysis: The IVR requests GenerativeAgent to analyze the conversation.

The Event Handler then handles events sent via SSE, including GenerativeAgent’s reply, which is sent back to the user through the IVR.

Benefits of using Websocket to Stream events

Persistent connection between your voice system and the GenerativeAgent server
API streaming for audio, call signaling, and returned transcripts
Real-time data exchange for quick responses and efficient handling of user queries
Bi-directional communication for smooth and responsive interaction

Before you Begin

Before you start integrating to GenerativeAgent, you need to:

Get your API Key Id and Secret
Ensure your API key has been configured to access AutoTranscribe and GenerativeAgent APIs. Reach out to your ASAPP team if you unsure.
Configure Tasks and Functions.

Implementation Steps

Create AutoTranscribe Streaming URL
Listen and Handle GenerativeAgent Events
Open a Connection
Start an Audio Stream
Send the Audio Stream
Analyze the conversation with GenerativeAgent
Stop the Audio Stream

Step 1: Create AutoTranscribe Streaming URL

First, you need to create a streaming URL that will be the WebSocket connection to AutoTranscribe.

curl -X GET 'https://api.sandbox.asapp.com/autotranscribe/v1/streaming-url' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "externalId": "<unique conversation id>"
}'

A successful response returns a 200 and a secure WebSocket short-lived access URL (TTL: 5 minutes):

{
    "streamingUrl": "<short-lived access URL>"
}

Step 2: Listen and Handle GenerativeAgent Events

GenerativeAgent sends events for all conversations through a single Server-Sent-Event (SSE) stream. Listen and handle these events to enable GenerativeAgent interaction with your users.

Step 3: Open a Connection

Create the WebSocket connection using the access URL: wss://<internal-voice-gateway-ingress>?token=<short_lived_access_token>

Step 4: Start a stream audio message

Start streaming audio into the AutoTranscribe Websocket using this message sequence:

Your Stream Request	ASAPP Response
`startStream` message	`startResponse` message
Stream audio - audio-in	`transcript` message
`finishStream` message	`finalResponse` message

Format WebSocket protocol request messages as text (UTF-8 encoded string data); only the audio stream should be in binary format. All response messages will be formatted as text.

Send a startStream message:

{
   "message":"startStream",
   "sender": {
          "role": "customer",
          "externalId": "JD232442"
   }
}

You’ll receive a startResponse:

{
   "message": "startResponse",
   "streamID": "128342213",
   "status": {
          "code": "1000",
          "description": "OK"
   }
}

Step 5: Send the audio stream

Stream audio as binary data: ws.send(<binary_blob>) You’ll receive transcript messages:

{
   "message": "transcript",
   "start": 0,
   "end": 1000,
   "utterance":
   [
      {"text": "Hi, my ID is 123."}
   ]
}

Step 6: Analyze conversations with GenerativeAgent

Call the /analyze endpoint to evaluate the conversation:

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM"
}'

You can also include a message when calling analyze:

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM",
    "message": {
        "text": "hello, can I see my bill?",
        "sender": {
            "externalId": "321",
            "role": "customer"
        },
        "timestamp": "2024-01-23T11:50:50Z"
    }
}'

As the conversation goes, it is possible to give GenerativeAgent more context of the conversation by using thetaskName and inputVariables attributes. You can also simulate Tasks and Input Variables in the Previewer

curl --request POST \
  --url https://api.sandbox.asapp.com/generativeagent/v1/analyze \
  --header 'Content-Type: application/json' \
  --header 'asapp-api-id: <api-key>' \
  --header 'asapp-api-secret: <api-key>' \
  --data '{
  "conversationId": "01BX5ZZKBKACTAV9WEVGEMMVS0",
  "message": {
    "text": "Hello, I would like to upgrade my internet plan to GOLD.",
    "sender": {
      "role": "agent",
      "externalId": 123
    },
    "timestamp": "2021-11-23T12:13:14.555Z"
  },
  "taskName": "UpgradePlan",
  "inputVariables": {
    "context": "Customer called to upgrade their current plan to GOLD",
    "customer_info": {
      "current_plan": "SILVER",
      "customer_since": "2020-01-01"
    }
  }
}'

Step 7: Stop the streaming audio message

Send a finishStream message:

{
   "message": "finishStream"
}

You’ll receive a finalResponse:

{
   "message": "finalResponse",
   "streamId": "128342213",
   "status": {
       "code": "1000",
       "description": "OK"
   },
   "summary": {
       "totalAudioBytes": 300,
       "audioDurationMs": 6000,
       "streamingSeconds": 6,
       "transcripts": 10
   }
}

Next Steps

With your system integrated into GenerativeAgent, you’re ready to use it. You may find these other pages helpful:

Getting Started

Products

Additional Features

AutoTranscribe Websocket

How it works

Benefits of using Websocket to Stream events

Before you Begin

Implementation Steps

Step 1: Create AutoTranscribe Streaming URL

Step 2: Listen and Handle GenerativeAgent Events

Step 3: Open a Connection

Step 4: Start a stream audio message

Step 5: Send the audio stream

Step 6: Analyze conversations with GenerativeAgent

Step 7: Stop the streaming audio message

Next Steps

Configuring GenerativeAgent

Safety and Troubleshooting

Going Live

Getting Started

Products

Additional Features

​How it works

​Benefits of using Websocket to Stream events

​Before you Begin

​Implementation Steps

​Step 1: Create AutoTranscribe Streaming URL

​Step 2: Listen and Handle GenerativeAgent Events

​Step 3: Open a Connection

​Step 4: Start a stream audio message

​Step 5: Send the audio stream

​Step 6: Analyze conversations with GenerativeAgent

​Step 7: Stop the streaming audio message

​Next Steps

Configuring GenerativeAgent

Safety and Troubleshooting

Going Live

How it works

Benefits of using Websocket to Stream events

Before you Begin

Implementation Steps

Step 1: Create AutoTranscribe Streaming URL

Step 2: Listen and Handle GenerativeAgent Events

Step 3: Open a Connection

Step 4: Start a stream audio message

Step 5: Send the audio stream

Step 6: Analyze conversations with GenerativeAgent

Step 7: Stop the streaming audio message

Next Steps