Data Import

The Data Import feature allows you to send event data stored in your AWS S3, GCP, or GCS to Hackle. The Data Import feature imports data on a daily basis.

Supported Cloud Storage

The following tasks are required before data extraction

  • Create a storage to store the event data (AWS S3, GCP GCS, etc.).
  • Create and authorize a key to access the storage to store event data.
  • Process the event data into a standardized format and store it by day (e.g. 2023-01-01, 2023-01-02, etc.)

Generating keys and authorizing : GCP GCS

For GCP GCS, you can generate a key by referring to GCP IAM > Generating and Managing Service Account Keys.

The following authorizations are required when creating a key to access GCS.


Generating keys and authorizing : AWS S3

For AWS S3, you can refer to the following documents to create a key and grant the necessary permissions.

  1. Create an AWS IAM User by following the documentation in AWS Docs: Create an IAM User.
  2. follow AWS Docs: Creating an IAM Policy to create a policy and include the IAM Policy policy attached as code below. Then add the IAM Policy policy to the IAM Role created in the previous step.
  3. Follow AWS Docs: Creating an IAM Key to create a key
    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:s3:::<bucket>",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [

Data Import Format

Data import currently supports the Apache Parquet format. Below is a schema of the Parquet format data that is passed. It is processed and stored in the format described in the table below

Column CategoryColumn NameColumn TypeColumn Value (Example)Description
Insert IDinsert_idSTRING8fb8e088-9245-4fce-bb87-7e09d9917ed6Used to verify event duplication with UUID value.
Event Keyevent_keySTRINGpurchaseName of the event
Client TimestamptsTIMESTAMP2023-01-01 00:01:02.333 (UTC)Timestamp based on UTC (cutting below Millis)
Metric Valuemetric_valueDECIMAL(24, 6)0.0Use for value computation in analysis and experiments. (Save '0.0' if not needed)
IdentifiersidentifiersMap<String, String>{ "id": "8fb8e088-9245-4fce-bb87-7e09d9917ed6", "device_id": "89ABCDEF-01234567-89ABCDEF", "user_id": "49591", "session_id": "1659710029.4.1.1659710504.0" }Map containing user identifiers
- (Optional) 'user_id': Login user identifier (value corresponding to userId when sending Hackle SDK)
- (Required) 'id': Device identifier (value corresponding to id at Hackle SDK transmission)
- (Required) 'device_id' device identifier (value corresponding to deviceId at Hackle SDK transmission)
- (Optional, loading when using GA) 'ga_session_id', 'ga_device_id'

Identifiers key values are stored in Lowercase.
Event Propertiesevent_propertiesMap<String, String>{ "product_id: "33537", "product_category": "LEISURE", "order_id": "291994100" }Properties that contain event information

Property key values are stored in Lowercase.
User Propertiesuser_propertiesMap<String, String>{ "grade": "GOLD", "date_signed": "2022-07-01", "date_recent": "2023-01-17" }Properties that contain user information

Property key values are stored in Lowercase.
Platform Propertiesplatform_propertiesMap<String, String>`# Android example
"appversion": "6.9.0",


`# iOS example
"appversion": "6.9.3",

Properties that contain platform information

- - (필수) osname (Android, iOS)
- (Required) version

Property key values are stored in Lowercase.

Below is a summary of the data formats described in the table above.

 |-- ts: timestamp (nullable = false)
 |-- event_key: string (nullable = false)
 |-- identifiers: string (nullable = false)
 |-- insert_id: string (nullable = false)
 |-- metric_value: decimal(24,6) (nullable = false)
 |-- user_properties: map<string, string> (nullable = false)
 |-- event_properties: map<string, string> (nullable = false)
 |-- platform_properties: map<string, string> (nullable = false)

Processing for Data Import

Process data according to the aforementioned Parquet Format and stored daily in the Bucket.

  • When the data processing is complete, create a '_SUCCESS' (Signal) file of 0 Byte.
  • Data processing includes D-1 data. For example, if you run the data import on January 2nd, you can process the data on January 1st.

The following is an example of a saved partition and file.

# 2023-01-01 data

# 2023-01-02 data

How to Request Data Import

Please contact Hackle team for data import requests. Below information is required to import the data

  • Key authorized for access
  • AWS S3, GCS Bucket name and partition path where data in the Bucket will be loaded (e.g. 'gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-01`)
  • Data loading time (e.g. loading completed before 13:00 KST)