Dynamic partitioning in google cloud dataflow? -

- April 15, 2010

i'm using dataflow process files stored in gcs , write bigquery tables. below requirements:

input files contain events records, each record pertains 1 eventtype;
need partition records eventtype;
for each eventtype output/write records corresponding bigquery table, 1 table per eventtype.
events in each batch input files vary;

i'm thinking of applying transforms such "groupbykey" , "partition", seems have know number of (and type of) events @ development time needed determine partitions.

do guys have idea partitioning dramatically? meaning partitions can determined @ run time?

why not loading single "raw" bigquery table , using bigquery api determine different number of events , export each event type own table (e.g., via https://cloud.google.com/bigquery/bq-command-line-tool#createtablequery) or api call?

if input format simple, can without using dataflow @ , more cost efficient.

Search This Blog

Print F

Dynamic partitioning in google cloud dataflow? -

Comments

Post a Comment

Popular posts from this blog

node.js - How to mock a third-party api calls in the backend -

node.js - Why do I get "SOCKS connection failed. Connection not allowed by ruleset" for some .onion sites? -

Entity Framework - The property cannot be configured as a navigation property -