Dynamic partitioning in google cloud dataflow? -


i'm using dataflow process files stored in gcs , write bigquery tables. below requirements:

  1. input files contain events records, each record pertains 1 eventtype;
  2. need partition records eventtype;
  3. for each eventtype output/write records corresponding bigquery table, 1 table per eventtype.
  4. events in each batch input files vary;

i'm thinking of applying transforms such "groupbykey" , "partition", seems have know number of (and type of) events @ development time needed determine partitions.

do guys have idea partitioning dramatically? meaning partitions can determined @ run time?

why not loading single "raw" bigquery table , using bigquery api determine different number of events , export each event type own table (e.g., via https://cloud.google.com/bigquery/bq-command-line-tool#createtablequery) or api call?

if input format simple, can without using dataflow @ , more cost efficient.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -