- Transfer files from S3 buckets to GCS
- Create / Start a CE Instance
- Mount extra storage as needed to hold the large Adobe zip files
- Copy the files from GCS to compute instance-1. Try one first to test and make sure everything works before copying everything.
- Unzip and untar the Adobe files
- Create the JSON schema reflecting the field names for "hit_data.tsv" with "column_header.tsv". Follow GBQ schema docs for schema file format.
- Load hit data with command: bq load -F "\t" <GBQ DataSet.TableName> <Source Data> <Schema File>