Retrieving Data via S3

ASAPP customers can securely retrieve application data for their internal consumption from ASAPP via S3, using credentials dedicated to this download function.

If you’re retrieving data from ASAPP’s AI Services, use File Exporter instead.

The following provides details on how you can retrieve data from ASAPP.

Data Retrieval Prerequisites: Describes prerequisites for secure data exchange. This includes key exchange and authentication, as well as ASAPP’s standards for bucket naming and S3 folder structure.
Downloading Data from S3: Describes ASAPP’s conventions for constructing an S3 path to exported data and some general data formatting conventions for ASAPP exports. Additionally, provides a sample download script.

Data Retrieval Prerequisites

ASAPP will share data with you via a set of S3 buckets. These S3 buckets will be encrypted at rest using AES256 and access-protected behind AWS authentication.

At the customer’s option, the individual download files can be encrypted using a scheme mutually agreed upon between ASAPP and the customer. This is not usually required, but in the event that your enterprise requires ASAPP export files to be doubly encrypted, contact your customer success manager to work out the details.

Authentication Credentials

Each run environment (typically prod) will be allocated a dedicated set of AWS credentials that are authorized to read from that environment’s respective S3 endpoint. These credentials will be different from other AWS credentials ASAPP might have provided you.

ASAPP will provide credentials to you within a GPG encrypted file. To initiate the credential exchange, you must provide ASAPP with a public GPG key that we may use to encrypt the files containing your credentials. It’s safe to send your public GPG key to ASAPP using any available channel. Please do NOT provide ASAPP with your private GPG key.

GitHub provides one of many good available tutorials on GPG key generation here: https://help.github.com/en/articles/generating-a-new-gpg-key .

Once you’ve provided ASAPP with your public GPG key, we will provide your credentials in a file hosted on S3 that you will retrieve using an expiring https link. ASAPPs standard practice is to have those links expire after 24 hours.

The file will be encrypted using your public GPG key. Once you decrypt the provided file using your private GPG key, your credentials will be contained within a tab delimited file with the following structure:

id secret bucket sub-folder (if any)

Please let us know when you’ve successfully retrieved and decrypted your credentials file.

Your Dedicated S3 Buckets

Once you’ve securely received credentials that allow you to read from your dedicated S3 buckets, ASAPP will provide you with URIs representing the set of S3 buckets from which you may securely download data files.

ASAPP can make a broad range of data available to you. You’ll work with your Customer Success and Implementation teams to schedule specific feeds for export.

Each separate feed will have a unique S3 endpoint and, potentially, a unique strategy for data partitioning. While the specific details vary according to the feed at hand, ASAPP follows a consistent set of conventions for naming S3 buckets, paths, and partitions.

The following provides details about the conventions.

Bucket Naming

For clarity, ASAPP names buckets using the following convention:

s3://asapp-\{env\}-\{company_name\}-exports-\{aws-region\}

Key	Description
env	Environment (prod)
company_name	Your company name, without spaces
aws-region	us-east-1 Note: this is the region currently supported for your ASAPP implementations.

Key

Description

env

Environment (prod)

company_name

Your company name, without spaces

aws-region

us-east-1

Note: this is the region currently supported for your ASAPP implementations.

Folder Structure

Within the named bucket, ASAPP uses the following conventions to construct an S3 path that corresponds to the name of the exported feed and to partitions of the exported feed:

s3://BUCKET_NAME/FEED_NAME/version=VERSION_NUMBER/format=FORMAT_NAME/ dt=DATE/hr=HOUR/mi=MINUTE/DATAFILE(S)

s3 PATH ELEMENT	Description
BUCKET_NAME	(Described above)
FEED_NAME	The name of the table, extract, feed, etc.
version	The version of the feed at hand. Changes whenever the schema, meaning of a column, etc., changes in a way that could break existing integrations.
format	The format of the exported data. Almost always, this will be JSON Lines.*
dt	The YYYY-MM-DD formatted date corresponding to the exported data.
hr	(optional) The HH formatted hour corresponding to the exported data.
mi	(optional) The MM formatted minute corresponding to the exported data. Note: This might be either the exact minute for a minutely partitioned data export, or the first minute of a bucket representing a range of minutes in the exported data. For example, 00, 15, 30, 45 would be the `mi` values for four partitions across a single hour. The 00 partition would contain all data from the start of the hour up to but not including the 15^th^ minute of the hour.
DATAFILE(s)	The filename or filenames of the exported feed partition. See the File Names below for additional information about file naming.

*By default, and by very strong preference, ASAPP will output files in JSON Lines format. JSON Lines is robust, unambiguous, and will save us all from the heartbreak CSV. The format is described at http://jsonlines.org. | |

Note that hr and mi partitions will not always apply. In the event that a particular feed is not partitioned on hour or minute, the hr and/or mi subfolders will not be present for that feed.

File Naming

File names that correspond to an exported feed partition will have names in the following form:

\{FEED_NAME\}\{FORMAT\}\{SPLIT_NUMBER\}.\{COMPRESSION\}.\{ENCRYPTION\}

File name element	Description
FEED_NAME	The feed name from which this partition is exported.
FORMAT	.jsonl
SPLIT_NUMBER	(optional) In the event that a particular partition’s export needs to be split across multiple physical files in order to accommodate file size constraints, each split file will be suffixed with a dot followed by a two-digit incrementing sequence. If the whole partition can fit in a single file, no SPLIT_NUMBER will be present in the file name.
COMPRESSION	(optional) .gz will be appended to the file name if the file is gzip compressed.
ENCRYPTION	(optional) In the atypical case where a file written to the s3 store is doubly encrypted, the filename will have a .enc extension.

Downloading Data from S3

General Data Formatting Notes

ASAPP export files are UTF-8 encoded.
Control characters are escaped.
Files are formatted with Unix-style line endings.

Verifying the Data Export is Complete

Upon completing the upload for a particular partition, ASAPP will upload an EMPTY file named _SUCCESS to the same path as the uploaded file or files. This _SUCCESS file acts as a flag indicating that the upload for the associated partition is complete. A _SUCCESS file will be written even if there is no available data selected for export for the partition at hand.

Until the _SUCCESS file is uploaded, ASAPP’s export/upload is in progress and you should not import the associated data file. You should check for this file before downloading any data partition.

Download Example

The example below assumes a shell terminal with python 2.7+ installed.

# install aws cli (assumes python)
pip install awscli
# configure your S3 credentials if not already done
aws configure 
# pull down all the files for 2019-01-20 for the convos_delta_ended export
# for a company named `acme` to your local drive in pre-production
aws s3 cp --recursive s3://asapp-preprod-acme-exports-us-east-1/convos_delta_ended/version=1/format=jsonl/dt=2019-01-20/ /path/to/put/files/
# you should see some files now in the local path.
# NOTE: if the export is hourly (like the above) you will see subfolders
ls -ltr /path/to/put/files/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=01/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=08/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=04/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=03/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=00/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=02/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=06/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=07/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=05/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=09/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=10/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=12/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=11/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=16/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=15/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=13/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=14/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=19/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=18/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=22/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=17/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=23/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=21/
drwxr-xr-x   3 myuser  mygroup    96 Jan 28 08:58 hr=20/

Deduping S3 Exports Data

All tables need to be deduped in order to get the most recent data and discard any past data that is being duplicated as part of this process.

To dedupe tables, get the latest instance and the latest run per partition key for each table, with some exceptions which are noted in the table below.

Example

In order to dedupe the table convos_metrics ,use this query:

SELECT *
FROM
    (SELECT
    *,
    ROW_NUMBER() OVER (partition by {{ primary_key }} order by {{ logical_timestamp}} DESC, {{ insertion_timestamp }} DESC) as row_idx
    FROM convos_metrics
    )
WHERE row_idx = 1

We partition by the primary_key for that table and get the latest data using order by logical_timestampDESC in the subquery. Then we only select where row_idx = 1 to only pull the latest information we have for each issue_id.

Deduping Guide by Table

Open the expandable table to see the full reference to deduping by table, which includes the fields to partition and order by respectively.

Deduping Reference

Table	Partition By	Order By
admin_activity	company_id, rep_id, status_description, status_start_ts	status_end_ts DESC
agent_journey_rep_event_frequency	rep_id, event_type, company_marker, instance_ts	instance_ts DESC
autopilot_flow	company_id, issue_id, form_start_ts	instance_ts DESC
cobrowsing	issue_id, cob_start_ts	instance_ts DESC
convos_intents	issue_id	first_utterance_ts DESC
convos_intents_ended	issue_id	first_utterance_ts DESC
convos_metadata	issue_id	last_event_ts DESC
convos_metadata_ended	issue_id	conversation_end_ts DESC
convos_metrics	issue_id	first_utterance_ts DESC
convos_metrics_ended	issue_id	first_utterance_ts DESC
convos_summary_tags	issue_id, company_id, summary_tag_presented	instance_ts DESC
csid_containment	company_id, csid	instance_ts DESC
csid_containment_1d	company_id, csid	instance_ts DESC
customer_feedback	issue_id, company_subdivision, company_segments, last_rep_id, question, instance_ts	instance_ts DESC
customer_params	event_id, param_key	instance_ts DESC
dim_queues	queue_key	—
export_row_counts	export_date, export_job, export_interval	—
fact_convos_missed_ends	company_id, issue_id, conversation_end_ts	instance_ts DESC
flow_completions	issue_id	negation_event_ts,success_event_ts DESC
flow_detail	issue_id, event_ts	event_ts DESC
intents	code, name, intent_type	—
issue_callback_3d	issue_id	instance_ts DESC
issue_omnichannel	company_id, issue_id, third_party_customer_id	instance_ts DESC
issue_queues	issue_id, queue_id, enter_queue_ts	instance_ts DESC
issue_sentiment	issue_id, created_ts	instance_ts DESC
issue_session_merge	company_id, issue_id, session_id	instance_ts DESC
issue_type	company_id, customer_id, issue_id	instance_ts DESC
knowledge_base	issue_id, company_id, article_id, event_ts	event_ts DESC
live_agent_opportunities	issue_id, customer_id, opportunity_ts	instance_ts DESC
queue_check_ins	issue_id, customer_id, check_in_ts	instance_ts DESC
queue_membership_wfm	instance_ts, company_id, rep_id, queue_id	instance_ts DESC
queue_wfm	instance_ts, queue_name, rep_id	instance_ts DESC
quick_reply_buttons (deprecated)	final_intent_code, quick_reply_button_text, escalated_to_chat, escalation_requested, quick_reply_button_index	instance_ts DESC
reps	rep_id	created_ts DESC
rep_activity	company_id, instance_ts, rep_id, status_id, in_status_starting_ts	instance_ts DESC
rep_assignment_disposition	company_id, issue_id, rep_id, rep_assigned_ts	rep_assigned_ts DESC
rep_attributes	rep_attribute_id, rep_id, created_ts	created_ts DESC
rep_augmentation	issue_id	instance_ts DESC
rep_convos	issue_id, rep_id, issue_assigned_ts	instance_ts DESC
rep_hierarchy	subordinate_agent_id,superior_agent_id,	—
rep_utilized	instance_ts, rep_id, desk_mode	instance_ts DESC
sms_events	sms_flow_id	—
transfers	issue_id, rep_id, timestamp_req	instance_ts DESC
utterances	issue_id, sender_id, created_ts	instance_ts DESC
voice_intents	issue_id, company_id	instance_ts DESC

Schema Adjustments

You should enable automated schema evolution detection and identify any changes using export_docs.yaml, which is generated each day and sent via the S3 feed. By incorporating this into the workflows, you can maintain a proactive stance, ensuring uninterrupted service and a smooth transition in the event of schema adjustments.

Export Schema

Click here to view the full schema for each feed table.

If you are retrieving data from Standalone Services, you need to use File Exporter.

Workforce Management (WFM) Data

In addition to the primary data schema (above), ASAPP also offers separate workforce management data for use in intraday workforce management applications.

The data is structured in two tables and published every 20-30 minutes. This data refresh cadence is intended to support continuous monitoring use cases rather than for historical reporting.

Each table’s metrics are calculated for a 15-minute window.

For example, the tables published at 1:35AM would reflect specific queue or rep activity from 1:00AM - 1:15AM.

queue_wfm

For each queue in your ASAPP implementation, this table contains queue-related metrics during a 15-minute period:

Total assignments
Customer queue abandons
Average handle time
Average wrap time
Average customer wait time in queue
Customer wait times under different SLA thresholds
Reps in the queue

queue_membership_wfm

For each queue in your ASAPP implementation, this table contains rep-level information about availability and status of the rep during a 15-minute period (or the period of the 15 minutes during which the rep was logged in):

Linear logged-in time
Linear available time
Linear busy time
Total linear utilized time
Linear utilized time while available
Linear utilized time while busy

WFM data is not available by default. Reach out to your ASAPP account team to request these additional data tables and to discuss your use case.

Getting Started

Products

Additional Features

Retrieving Data via S3

Data Retrieval Prerequisites

Authentication Credentials

Your Dedicated S3 Buckets

Bucket Naming

Folder Structure

File Naming

Downloading Data from S3

General Data Formatting Notes

Verifying the Data Export is Complete

Download Example

Deduping S3 Exports Data

Example

Deduping Guide by Table

Deduping Reference

Schema Adjustments

Export Schema

Workforce Management (WFM) Data

Getting Started

Products

Additional Features

​Data Retrieval Prerequisites

​Authentication Credentials

​Your Dedicated S3 Buckets

​Bucket Naming

​Folder Structure

​File Naming

​Downloading Data from S3

​General Data Formatting Notes

​Verifying the Data Export is Complete

​Download Example

​Deduping S3 Exports Data

​Example

​Deduping Guide by Table

​Deduping Reference

​Schema Adjustments

​Export Schema

​Workforce Management (WFM) Data

Data Retrieval Prerequisites

Authentication Credentials

Your Dedicated S3 Buckets

Bucket Naming

Folder Structure

File Naming

Downloading Data from S3

General Data Formatting Notes

Verifying the Data Export is Complete

Download Example

Deduping S3 Exports Data

Example

Deduping Guide by Table

Deduping Reference

Schema Adjustments

Export Schema

Workforce Management (WFM) Data