Dynamic partitioning in google cloud dataflow? -


i'm using dataflow process files stored in gcs , write bigquery tables. below requirements:

  1. input files contain events records, each record pertains 1 eventtype;
  2. need partition records eventtype;
  3. for each eventtype output/write records corresponding bigquery table, 1 table per eventtype.
  4. events in each batch input files vary;

i'm thinking of applying transforms such "groupbykey" , "partition", seems have know number of (and type of) events @ development time needed determine partitions.

do guys have idea partitioning dramatically? meaning partitions can determined @ run time?

why not loading single "raw" bigquery table , using bigquery api determine different number of events , export each event type own table (e.g., via https://cloud.google.com/bigquery/bq-command-line-tool#createtablequery) or api call?

if input format simple, can without using dataflow @ , more cost efficient.


Comments

Popular posts from this blog

node.js - How to mock a third-party api calls in the backend -

node.js - Why do I get "SOCKS connection failed. Connection not allowed by ruleset" for some .onion sites? -

matlab - 0-by-1 sym - What do I need to change in order to get proper symbolic results? -