Retrieving Data via S3
Learn how to retrieve data from ASAPP Messaging via S3.
ASAPP customers can securely retrieve application data for their internal consumption from ASAPP via S3, using credentials dedicated to this download function.
If you’re retrieving data from ASAPP’s AI Services, use File Exporter instead.
The following provides details on how you can retrieve data from ASAPP.
- Data Retrieval Prerequisites: Describes prerequisites for secure data exchange. This includes key exchange and authentication, as well as ASAPP’s standards for bucket naming and S3 folder structure.
- Downloading Data from S3: Describes ASAPP’s conventions for constructing an S3 path to exported data and some general data formatting conventions for ASAPP exports. Additionally, provides a sample download script.
Data Retrieval Prerequisites
ASAPP will share data with you via a set of S3 buckets. These S3 buckets will be encrypted at rest using AES256 and access-protected behind AWS authentication.
At the customer’s option, the individual download files can be encrypted using a scheme mutually agreed upon between ASAPP and the customer. This is not usually required, but in the event that your enterprise requires ASAPP export files to be doubly encrypted, contact your customer success manager to work out the details.
Authentication Credentials
Each run environment (typically preprod and prod) will be allocated a dedicated set of AWS credentials that are authorized to read from that environment’s respective S3 endpoint. These credentials will be different from other AWS credentials ASAPP might have provided you.
ASAPP will provide credentials to you within a GPG encrypted file. To initiate the credential exchange, you must provide ASAPP with a public GPG key that we may use to encrypt the files containing your credentials. It’s safe to send your public GPG key to ASAPP using any available channel. Please do NOT provide ASAPP with your private GPG key.
GitHub provides one of many good available tutorials on GPG key generation here: https://help.github.com/en/articles/generating-a-new-gpg-key .
Once you’ve provided ASAPP with your public GPG key, we will provide your credentials in a file hosted on S3 that you will retrieve using an expiring https link. ASAPPs standard practice is to have those links expire after 24 hours.
The file will be encrypted using your public GPG key. Once you decrypt the provided file using your private GPG key, your credentials will be contained within a tab delimited file with the following structure:
id secret bucket sub-folder (if any)
Please let us know when you’ve successfully retrieved and decrypted your credentials file.
Your Dedicated S3 Buckets
Once you’ve securely received credentials that allow you to read from your dedicated S3 buckets, ASAPP will provide you with URIs representing the set of S3 buckets from which you may securely download data files.
ASAPP can make a broad range of data available to you. You’ll work with your Customer Success and Implementation teams to schedule specific feeds for export.
Each separate feed will have a unique S3 endpoint and, potentially, a unique strategy for data partitioning. While the specific details vary according to the feed at hand, ASAPP follows a consistent set of conventions for naming S3 buckets, paths, and partitions.
The following provides details about the conventions.
Bucket Naming
For clarity, ASAPP names buckets using the following convention:
s3://asapp-\{env\}-\{company_name\}-exports-\{aws-region\}
Key | Description |
---|---|
env | Environment (e.g., prod, preprod, test) |
company_name | Your company name, without spaces |
aws-region | us-east-1 Note: this is the region currently supported for your ASAPP implementations. |
For example, an S3 bucket set up to host exports of pre-production data from ACME would be named:
s3://asapp-preprod-acme-exports-us-east-1
Folder Structure
Within the named bucket, ASAPP uses the following conventions to construct an S3 path that corresponds to the name of the exported feed and to partitions of the exported feed:
s3://BUCKET_NAME/FEED_NAME/version=VERSION_NUMBER/format=FORMAT_NAME/ dt=DATE/hr=HOUR/mi=MINUTE/DATAFILE(S)
s3 PATH ELEMENT | Description |
---|---|
BUCKET_NAME | (Described above) |
FEED_NAME | The name of the table, extract, feed, etc. |
version | The version of the feed at hand. Changes whenever the schema, meaning of a column, etc., changes in a way that could break existing integrations. |
format | The format of the exported data. Almost always, this will be JSON Lines.* |
dt | The YYYY-MM-DD formatted date corresponding to the exported data. |
hr | (optional) The HH formatted hour corresponding to the exported data. |
mi | (optional) The MM formatted minute corresponding to the exported data. Note: This might be either the exact minute for a minutely partitioned data export, or the first minute of a bucket representing a range of minutes in the exported data. For example, 00, 15, 30, 45 would be the mi values for four partitions across a single hour. The 00 partition would contain all data from the start of the hour up to but not including the 15^th^ minute of the hour. |
DATAFILE(s) | The filename or filenames of the exported feed partition. See the File Names below for additional information about file naming. |
*By default, and by very strong preference, ASAPP will output files in JSON Lines format. JSON Lines is robust, unambiguous, and will save us all from the heartbreak CSV. The format is described at http://jsonlines.org. | |
Note that hr
and mi
partitions will not always apply. In the event that a particular feed is not partitioned on hour or minute, the hr
and/or mi
subfolders will not be present for that feed.
File Naming
File names that correspond to an exported feed partition will have names in the following form:
\{FEED_NAME\}\{FORMAT\}\{SPLIT_NUMBER\}.\{COMPRESSION\}.\{ENCRYPTION\}
File name element | Description |
---|---|
FEED_NAME | The feed name from which this partition is exported. |
FORMAT | .jsonl |
SPLIT_NUMBER | (optional) In the event that a particular partition’s export needs to be split across multiple physical files in order to accommodate file size constraints, each split file will be suffixed with a dot followed by a two-digit incrementing sequence. If the whole partition can fit in a single file, no SPLIT_NUMBER will be present in the file name. |
COMPRESSION | (optional) .gz will be appended to the file name if the file is gzip compressed. |
ENCRYPTION | (optional) In the atypical case where a file written to the s3 store is doubly encrypted, the filename will have a .enc extension. |
Downloading Data from S3
General Data Formatting Notes
- ASAPP export files are UTF-8 encoded.
- Control characters are escaped.
- Files are formatted with Unix-style line endings.
Verifying the Data Export is Complete
Upon completing the upload for a particular partition, ASAPP will upload an EMPTY file named _SUCCESS
to the same path as the uploaded file or files. This _SUCCESS
file acts as a flag indicating that the upload for the associated partition is complete. A _SUCCESS
file will be written even if there is no available data selected for export for the partition at hand.
Until the _SUCCESS
file is uploaded, ASAPP’s export/upload is in progress and you should not import the associated data file. You should check for this file before downloading any data partition.
Download Example
The example below assumes a shell terminal with python 2.7+ installed.
Deduping S3 Exports Data
All tables need to be deduped in order to get the most recent data and discard any past data that is being duplicated as part of this process.
To dedupe tables, get the latest instance and the latest run per partition key for each table, with some exceptions which are noted in the table below.
Example
In order to dedupe the table convos_metrics
,use this query:
We partition by the primary_key
for that table and get the latest data using order by logical_timestamp
DESC in the subquery. Then we only select where row_idx
= 1 to only pull the latest information we have for each issue_id
.
Deduping Guide by Table
Open the expandable table to see the full reference to deduping by table, which includes the fields to partition and order by respectively.
Deduping Reference
Table | Partition By | Order By |
---|---|---|
admin_activity | company_id, rep_id, status_description, status_start_ts | status_end_ts DESC |
agent_journey_rep_event_frequency | rep_id, event_type, company_marker, instance_ts | instance_ts DESC |
autopilot_flow | company_id, issue_id, form_start_ts | instance_ts DESC |
cobrowsing | issue_id, cob_start_ts | instance_ts DESC |
convos_intents | issue_id | first_utterance_ts DESC |
convos_intents_ended | issue_id | first_utterance_ts DESC |
convos_metadata | issue_id | last_event_ts DESC |
convos_metadata_ended | issue_id | conversation_end_ts DESC |
convos_metrics | issue_id | first_utterance_ts DESC |
convos_metrics_ended | issue_id | first_utterance_ts DESC |
convos_summary_tags | issue_id, company_id, summary_tag_presented | instance_ts DESC |
csid_containment | company_id, csid | instance_ts DESC |
csid_containment_1d | company_id, csid | instance_ts DESC |
customer_feedback | issue_id, company_subdivision, company_segments, last_rep_id, question, instance_ts | instance_ts DESC |
customer_params | event_id, param_key | instance_ts DESC |
dim_queues | queue_key | — |
export_row_counts | export_date, export_job, export_interval | — |
fact_convos_missed_ends | company_id, issue_id, conversation_end_ts | instance_ts DESC |
flow_completions | issue_id | negation_event_ts,success_event_ts DESC |
flow_detail | issue_id, event_ts | event_ts DESC |
intents | code, name, intent_type | — |
issue_callback_3d | issue_id | instance_ts DESC |
issue_omnichannel | company_id, issue_id, third_party_customer_id | instance_ts DESC |
issue_queues | issue_id, queue_id, enter_queue_ts | instance_ts DESC |
issue_sentiment | issue_id, created_ts | instance_ts DESC |
issue_session_merge | company_id, issue_id, session_id | instance_ts DESC |
issue_type | company_id, customer_id, issue_id | instance_ts DESC |
knowledge_base | issue_id, company_id, article_id, event_ts | event_ts DESC |
live_agent_opportunities | issue_id, customer_id, opportunity_ts | instance_ts DESC |
queue_check_ins | issue_id, customer_id, check_in_ts | instance_ts DESC |
queue_membership_wfm | instance_ts, company_id, rep_id, queue_id | instance_ts DESC |
queue_wfm | instance_ts, queue_name, rep_id | instance_ts DESC |
quick_reply_buttons (deprecated) | final_intent_code, quick_reply_button_text, escalated_to_chat, escalation_requested, quick_reply_button_index | instance_ts DESC |
reps | rep_id | created_ts DESC |
rep_activity | company_id, instance_ts, rep_id, status_id, in_status_starting_ts | instance_ts DESC |
rep_assignment_disposition | company_id, issue_id, rep_id, rep_assigned_ts | rep_assigned_ts DESC |
rep_attributes | rep_attribute_id, rep_id, created_ts | created_ts DESC |
rep_augmentation | issue_id | instance_ts DESC |
rep_convos | issue_id, rep_id, issue_assigned_ts | instance_ts DESC |
rep_hierarchy | subordinate_agent_id,superior_agent_id, | — |
rep_utilized | instance_ts, rep_id, desk_mode | instance_ts DESC |
sms_events | sms_flow_id | — |
transfers | issue_id, rep_id, timestamp_req | instance_ts DESC |
utterances | issue_id, sender_id, created_ts | instance_ts DESC |
voice_intents | issue_id, company_id | instance_ts DESC |
Schema Adjustments
You should enable automated schema evolution detection and identify any changes using export_docs.yaml
, which is generated each day and sent via the S3 feed. By incorporating this into the workflows, you can maintain a proactive stance, ensuring uninterrupted service and a smooth transition in the event of schema adjustments.
Export Schema
Click here to view the full schema for each feed table.
If you are retrieving data from Standalone Services, you need to use File Exporter.
Workforce Management (WFM) Data
In addition to the primary data schema (above), ASAPP also offers separate workforce management data for use in intraday workforce management applications.
The data is structured in two tables and published every 20-30 minutes. This data refresh cadence is intended to support continuous monitoring use cases rather than for historical reporting.
Each table’s metrics are calculated for a 15-minute window.
For example, the tables published at 1:35AM would reflect specific queue or rep activity from 1:00AM - 1:15AM.
queue_wfm
For each queue in your ASAPP implementation, this table contains queue-related metrics during a 15-minute period:
- Total assignments
- Customer queue abandons
- Average handle time
- Average wrap time
- Average customer wait time in queue
- Customer wait times under different SLA thresholds
- Reps in the queue
queue_membership_wfm
For each queue in your ASAPP implementation, this table contains rep-level information about availability and status of the rep during a 15-minute period (or the period of the 15 minutes during which the rep was logged in):
- Linear logged-in time
- Linear available time
- Linear busy time
- Total linear utilized time
- Linear utilized time while available
- Linear utilized time while busy
WFM data is not available by default. Reach out to your ASAPP account team to request these additional data tables and to discuss your use case.
Was this page helpful?