Data Import
Data Import
The Data Import feature allows you to send event data stored in your AWS S3, GCP, or GCS to Hackle. The Data Import feature imports data on a daily basis.
Supported Cloud Storage
Cloud | Storage | Supported? |
---|---|---|
AWS | S3 | Yes |
AWS | Redshift | Not yet |
GCP | GCS | Yes |
GCP | BigQuery | Not yet |
Preparation
The following tasks are required before data extraction
- Create a storage to store the event data (AWS S3, GCP GCS, etc.).
- Create and authorize a key to access the storage to store event data.
- Process the event data into a standardized format and store it by day (e.g.
2023-01-01
,2023-01-02
, etc.)
Generating keys and authorizing : GCP GCS
For GCP GCS, you can generate a key by referring to GCP IAM > Generating and Managing Service Account Keys.
The following authorizations are required when creating a key to access GCS.
storage.buckets.get
storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list
Generating keys and authorizing : AWS S3
For AWS S3, you can refer to the following documents to create a key and grant the necessary permissions.
- Create an AWS IAM User by following the documentation in AWS Docs: Create an IAM User.
- follow AWS Docs: Creating an IAM Policy to create a policy and include the IAM Policy policy attached as code below. Then add the IAM Policy policy to the IAM Role created in the previous step.
- Follow AWS Docs: Creating an IAM Key to create a key
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket>",
"Condition": {
"StringLike": {
"s3:prefix": [
"<prefix>/*",
"<prefix>/",
"<prefix>"
]
}
}
}
]
}
Data Import Format
Data import currently supports the Apache Parquet format. Below is a schema of the Parquet format data that is passed. It is processed and stored in the format described in the table below
Column Category | Column Name | Column Type | Column Value (Example) | Description |
---|---|---|---|---|
Insert ID | insert_id | STRING | 8fb8e088-9245-4fce-bb87-7e09d9917ed6 | Used to verify event duplication with UUID value. |
Event Key | event_key | STRING | purchase | Name of the event |
Client Timestamp | ts | TIMESTAMP | 2023-01-01 00:01:02.333 (UTC) | Timestamp based on UTC (cutting below Millis) |
Metric Value | metric_value | DECIMAL(24, 6) | 0.0 | Use for value computation in analysis and experiments. (Save '0.0' if not needed) |
Identifiers | identifiers | Map<String, String> | { "id": "8fb8e088-9245-4fce-bb87-7e09d9917ed6", "device_id": "89ABCDEF-01234567-89ABCDEF", "user_id": "49591", "session_id": "1659710029.4.1.1659710504.0" } | Map containing user identifiers - (Optional) 'user_id': Login user identifier (value corresponding to userId when sending Hackle SDK) - (Required) 'id': Device identifier (value corresponding to id at Hackle SDK transmission) - (Required) 'device_id' device identifier (value corresponding to deviceId at Hackle SDK transmission) - (Optional, loading when using GA) 'ga_session_id', 'ga_device_id' Identifiers key values are stored in Lowercase. |
Event Properties | event_properties | Map<String, String> | { "product_id: "33537", "product_category": "LEISURE", "order_id": "291994100" } | Properties that contain event information Property key values are stored in Lowercase. |
User Properties | user_properties | Map<String, String> | { "grade": "GOLD", "date_signed": "2022-07-01", "date_recent": "2023-01-17" } | Properties that contain user information Property key values are stored in Lowercase. |
Platform Properties | platform_properties | Map<String, String> | `# Android example { "osname":"Android", "appversion": "6.9.0", "language":"ko", "osversion":"12", "devicevendor":"samsung", "versionname":"6.77.0-DEBUG", "platform":"Mobile", "devicemodel":"SM-S908N" }` `# iOS example { "osname":"iOS", "appversion": "6.9.3", "language":"ko-KR", "osversion":"16.0.2", "devicevendor":"Apple", "versionname":"6.77.0", "platform":"Mobile", "devicemodel":"iPhone14,2" }` | Properties that contain platform information - - (필수) osname (Android, iOS) - (Required) version Property key values are stored in Lowercase. |
Below is a summary of the data formats described in the table above.
root
|-- ts: timestamp (nullable = false)
|-- event_key: string (nullable = false)
|-- identifiers: string (nullable = false)
|-- insert_id: string (nullable = false)
|-- metric_value: decimal(24,6) (nullable = false)
|-- user_properties: map<string, string> (nullable = false)
|-- event_properties: map<string, string> (nullable = false)
|-- platform_properties: map<string, string> (nullable = false)
Processing for Data Import
Process data according to the aforementioned Parquet Format and stored daily in the Bucket.
- When the data processing is complete, create a '_SUCCESS' (Signal) file of 0 Byte.
- Data processing includes D-1 data. For example, if you run the data import on January 2nd, you can process the data on January 1st.
The following is an example of a saved partition and file.
# 2023-01-01 data
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-01/_SUCCESS
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-01/000000000000.parquet
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-01/000000000001.parquet
# 2023-01-02 data
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-02/_SUCCESS
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-02/000000000000.parquet
gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-02/000000000001.parquet
How to Request Data Import
Please contact Hackle team for data import requests. Below information is required to import the data
- Key authorized for access
- AWS S3, GCS Bucket name and partition path where data in the Bucket will be loaded (e.g. 'gcs://customer-data-hackle/test/prefix-custom/dt=2023-01-01`)
- Data loading time (e.g. loading completed before 13:00 KST)
Updated almost 2 years ago