Transmitting Data via S3
S3 is the supported mechanism for ongoing data transmissions, though can also be used for one-time transfers where needed. ASAPP customers can transmit the following types of data to S3:
- Call center data attributes
- Conversation transcripts from messaging or voice interactions
- Recorded call audio files
- Sales records with attribution metadata
Getting Started
Your Target S3 Buckets
ASAPP will provide you with a set of S3 buckets to which you may securely upload your data files, as well as a dedicated set of credentials authorized to write to those buckets. See the next section for more on those credentials.
For clarity, ASAPP name buckets use the following convention:
s3://asapp-\{env\}-\{company_name\}-imports-\{aws-region\}
Key | Description |
---|---|
env | Environment (prod, pre_prod, test) |
company_name | The company name: acme, duff, stark_industries, etc. Note: company name should not have spaces within. |
aws-region | us-east-1 Note: this is the current region supported for your ASAPP instance. |
So, for example, an S3 bucket set up to receive pre-production data from ACME would be named:
s3://asapp-pre_prod-acme-imports-us-east-1
S3 Target for Historical Transcripts
ASAPP has a distinct target location for sending historical transcripts for AI Services and will provide an exclusive access folder to which transcripts should be uploaded. The S3 bucket location follows this naming convention:
asapp-customers-sftp-\{env\}-\{aws-region\}
Values for env
and aws-region
are set in the same way as above. As an example, an S3 bucket to receive transcripts for use in production is named:
asapp-customers-sftp-prod-us-east-1
See the Historical Transcript File Structure section more information on how to format transcript files for transmission.
Encryption
ASAPP ensures that the data you write to your dedicated S3 buckets is encrypted in transit using TLS/SSL and encrypted at rest using AES256.
Your Dedicated Export AWS Credentials
ASAPP will provide you with a set of AWS credentials that allow you to securely upload data to your designated S3 buckets. (Since you need write access in order to upload data to S3, you’ll need to use a different set of credentials than the read-only credentials you might already have.)
In order for ASAPP to securely send credentials to you, you must provide ASAPP with a public GPG key that we can use to encrypt a file containing those credentials.
GitHub provides one of many good available tutorials on GPG key generation here: https://help.github.com/en/articles/generating-a-new-gpg-key .
It’s safe to send your public GPG key to ASAPP using any available channel. Please do NOT provide ASAPP with your private key.
Once you’ve provided ASAPP with your public GPG key, we’ll forward to you an expiring https link pointing to an S3-hosted file containing credentials that have permissions to write to your dedicated S3 target buckets.
The file itself will be encrypted using your public GPG key. Once you decrypt the provided file using your private GPG key, your credentials will be contained within a tab delimited file with the following structure:
id secret bucket sub-folder (if any)
Data File Formatting and Preparation
General Requirements:
- Files should be UTF-8 encoded.
- Control characters should be escaped.
- You may provide files as CSV or JSONL format, but we strongly recommend JSONL where possible. (CSV files are just too fragile.)
- If you send a CSV file, ASAPP recommends that you include a header. Otherwise, your CSV must provide columns in the exact order listed below.
- When providing a CSV file, you must provide an explicit null value (as the unquoted string:
NULL
) for missing or empty values.
Call Center Data File Structure
The table below shows the required fields to include in your uploaded call center data.
FIELD NAME | REQUIRED? | FORMAT | EXAMPLE | NOTES |
---|---|---|---|---|
customer_id | Yes | String | 347bdddb-d3a1-45fc-bbcd-dbd3a175fc1c | External User ID. This is a hashed version of the client ID. |
conversation_id | No | String | 21352352 | If filled in, should map to ASAPP’s system. May be empty, if the customer has not had a conversation with ASAPP. |
call_start | Yes | Timestamp | 2020-01-03T20:02:13Z | ISO 8601 formatted UTC timestamp. Time/date call is received by the system. |
call_end | Yes | Timestamp | 2020-01-03T20:02:13Z | ISO 8601 formatted UTC timestamp. Time/date call ends. Note: duration of call should be Call End - Call Start. |
call_assigned_to_agent | No | Timestamp | 2020-01-03T20:02:13Z | ISO 8601 formatted UTC timestamp. The date/time the call was answered by the agent. |
customer_type | No | String | Wireless Premier | Customer account classification by client. |
survey_offered | No | Bool | true/false | Whether a survey was offered or not. |
survey_taken | No | Bool | true/false | When a survey was offered, whether it was completed or not. |
survey_answer | No | String | Survey answer | |
toll_free_number | No | String | 888-929-1467 | Client phone number (toll free number) used to call in that allows for tracking different numbers, particularly ones referred directly by SRS. If websource or click to call, the web campaign is passed instead of TFN. |
ivr_intent | No | String | Power Outage | Phone pathing logic for routing to the appropriate agent group or providing self-service resolution. Could be multiple values. |
ivr_resolved | No | Bool | true/false | Caller triggered a self-service response from the IVR and then disconnected. |
ivr_abandoned | No | Bool | true/false | Caller disconnected without receiving a self-service response from IVR nor being placed in live agent queue. |
agent_queue_assigned | No | String | Wireless Sales | Agent group/agent skill group (aka queue name) |
time_in_queue | No | Integer | 600 | Seconds caller waits in queue to be assigned to an agent. |
queue_abandoned | No | Bool | true/false | Caller disconnected after being assigned to a live agent queue but before being assigned to an agent. |
call_handle_time | No | Integer | 650 | Call duration in seconds from call assignment event to call disconnect event. |
call_wrap_time | No | Integer | 30 | Duration in seconds from call disconnect event to end of agent wrap event. |
transfer | No | String | Sales Group | Agent queue name if call was transferred. NA or Null value for calls not transferred. |
disposition_category | No | String | Change plan | Categorical outcome selection from agent. Alternatively, could be category like ‘Resolved’, ‘Unresolved’, ‘Transferred’, ‘Referred’. |
disposition_notes | No | String | Notes from agent regarding the disposition of the call. | |
transaction_completed | No | String | Upgrade Completed, Payment Processed | Name of transaction type completed by call agent on behalf of customer. Could contain multiple delimited values. May not be available for all agents. |
caller_account_value | No | Decimal | 129.45 | Current account value of customer. |
Historical Transcript File Structure
ASAPP accepts uploads for historical conversation transcripts for both voice calls and chats.
The fields described below must be the columns in your uploaded .CSV table.
Each row in the uploaded .CSV table should correspond to one sent message.
FIELD NAME | REQUIRED? | FORMAT | EXAMPLE | NOTES |
---|---|---|---|---|
conversation_externalId | Yes | String | 3245556677 | Unique identifier for the conversation |
sender_externalId | Yes | String | 6433421 | Unique identifier for the sender of the message |
sender_role | Yes | String | agent | Supported values are ‘agent’, ‘customer’ or ‘bot’ |
text | Yes | String | Happy to help, one moment please | Message from sender |
timestamp | Yes | Timestamp | 2022-03-16T18:42:24.488424Z | ISO 8601 formatted UTC timestamp |
Proper transcript formatting and sampling ensures data is usable for model training. Please ensure transcripts conform to the following:
Formatting
- Each utterance is clearly demarcated and sent by one identified sender
- Utterances are in chronological order and complete, from beginning to very end of the conversation
- Where possible, transcripts include the full content of the conversation rather than an abbreviated version. For example, in a digital messaging conversation:
Full | Abbreviated |
---|---|
Agent: Choose an option from the list below Agent: (A) 1-way ticket (B) 2-way ticket (C) None of the above Customer: (A) 1-way ticket | Agent: Choose an option from the list below Customer: (A) |
Sampling
- Transcripts are from a wide range of dates to avoid seasonality effects; random sampling over a 12-month period is recommended
- Transcripts mimic the production conversations on which models will be used - same types of participants, same channel (voice, messaging), same business unit
- There are no duplicate transcripts
Transmitting Transcripts to S3
Historical transcripts are sent to a distinct S3 target separate from other data imports.
Please refer to the S3 Target for Historical Transcripts section for details.
Sales Methods & Attribution Data File Structure
The table below shows the required fields to be included in your uploaded sales methods and attribution data.
FIELD NAME | REQUIRED? | FORMAT | EXAMPLE | NOTES |
---|---|---|---|---|
transaction_id | Yes | String | 1d71dce2-a50c-11ea-bb37-0242ac130002 | An identifier which is unique within the customer system to track this transaction. |
transaction_time | Yes | Timestamp | 2007-04-05T14:30:05.123Z | ISO 8601 formatted UTC timestamp. Details potential duplicates and also attribute to the right period of time |
transaction_value_one_time | No | Float | 65.25 | Single value of initial purchase. |
transaction_value_recurring | No | Float | 7.95 | Recurring value of subscription purchase. |
customer_category | No | String | US | Custom category value per client. |
customer_subcategory | No | String | wireless | Custom subcategory value per client. |
external_customer_id | No | String | 34762720001 | External User ID. This is hashed version of the client ID. In order to attribute to ASAPP metadata, one of these will be required (Customer ID or Conversation ID) |
issue_id | No | String | 1E10412200CC60EEABBF32 | IF filled in, should map to ASAPP’s system. May be empty, if the customer has not had a conversation with ASAPP. In order to attribute to ASAPP metadata, one of these will be required (Customer ID or Conversation ID) |
external_session_id | Yes | String | 1a09ff6d-3d07-45dc-8fa9-4936bfc4e3e5 | External session id so we can track a customer |
product_category | No | String | Wireless Internet | Category of product purchased. |
product_subcategory | No | String | Broadband | Subcategory of product purchased. |
product_name | No | String | Broadband Gold Package | The name of the product. |
product_id | No | String | WI-BBGP | The identifier of the product. |
product_quantity | Yes | Integer | 1 | A number indicating the quantity of the product purchased. |
product_value_one_time | No | Float | 60.00 | Value of the product for one time purchase. |
product_value_recurring | No | Float | 55.00 | Value of the product for recurring purchase. |
Uploading Data to S3
At a high level, uploading your data is a three step process:
- Build and format your files for upload, as detailed above.
- Construct a “target path” for those files following the convention in the section “Constructing your Target Path” below.
- Signal the completion of your upload by writing an empty _SUCCESS file to your “target path”, as described in the section “Signaling that your upload is complete” below.
Constructing your target path
ASAPP’s automation will use the S3 filename of your upload when deciding how to process your data file, where the filename is formatted as follows:
s3://BUCKET_NAME/FEED_NAME/version=VERSION_NUMBER/format=FORMAT_NAME/dt=DATE/hr=HOUR/mi=MINUTE/DATAFILE_NAME(S)
The following table details the convention that ASAPP follows when handling uploads:
Signaling that Your Upload Is Complete
Upon completing a data upload, you must upload an EMPTY file named _SUCCESS to the same path as your uploaded file, as a flag that indicates your data upload is complete. Until this file is uploaded, ASAPP will assume that the upload is in progress and will not import the associated data file.
As an example, let’s say you’re uploading one day of call center data in a set of files.
Incremental and Snapshot Modes
You may provide data to ASAPP as either Incremental or Snapshot data. The value you provide us in the format
field discussed above, tells ASAPP whether to treat the data you provide as Incremental or Snapshot data.
When importing data using Incremental mode, ASAPP will append the given data to the existing data imported for that FEED_NAME
. When you specify Incremental mode, you are telling ASAPP that for a given date, the data which was uploaded is for that day only. If you use the value dt=2018-09-02
in your constricted filename, you are indicating that the data contained in that file includes records from 2018-09-02 00:00:00 UTC
→ 2018-09-02 23:59:59 UTC
.
When importing data using Snapshot mode, ASAPP will replace any existing data for the indicated FEED_NAME
with the contents of the uploaded file. When you specify Snapshot mode, ASAPP treats the uploaded data as a complete record from “the time history started” until that particular day end. A date of 2018-09-02
means the data includes, effectively, all things from 1970-01-01 00:00:00 UTC
→ 2018-09-02 23:59:59 UTC
.
Other Upload Notes and Tips
- Make sure the structure for the imported file (whether columnar or json formatted) matches the current import standards (see below for details)
- Data imports are scheduled daily, 4 hours after UTC midnight (for the previous day’s data)
- In the event that you upload historical data (i.e., from older dates than are currently in the system), please inform your ASAPP team so a complete re-import can be scheduled.
- Snapshot data must go into a format=snapshot_{type} folder.
- Providing a Snapshot allows you to provide all historical data at once. In effect, this reloads the entire table rather than appending data as in the non-snapshot case.
Upload Example
The example below assumes a shell terminal with python 2.7+ installed.