Dynamic partitioning in google cloud dataflow? -
i'm using dataflow process files stored in gcs , write bigquery tables. below requirements:
- input files contain events records, each record pertains 1 eventtype;
- need partition records eventtype;
- for each eventtype output/write records corresponding bigquery table, 1 table per eventtype.
- events in each batch input files vary;
i'm thinking of applying transforms such "groupbykey" , "partition", seems have know number of (and type of) events @ development time needed determine partitions.
do guys have idea partitioning dramatically? meaning partitions can determined @ run time?
why not loading single "raw" bigquery table , using bigquery api determine different number of events , export each event type own table (e.g., via https://cloud.google.com/bigquery/bq-command-line-tool#createtablequery) or api call?
if input format simple, can without using dataflow @ , more cost efficient.
Comments
Post a Comment