+1 to reducing the amount of boilerplate for dealing with side inputs.

I prefer the "NewDoFn" style of side inputs for consistency. The
primary drawback seems to be lambda's incompatibility with
annotations. This is solved in Python by letting all the first
annotated argument of the process method be the main input, and
subsequent ones be the side input. For example

main_pcoll | beam.Map(
    lambda main_input_elem, side_input_value: main_input_elem +

For multiple side inputs they are mapped positionally (though Python
has the advantage that arguments can be passed by keyword as well to
enhance readability when there are many of them, and we allow that for
side inputs). Note that side_input_pvalue is not referenced anywhere
else, so we don't even have to store it and pass it around (one
typically writes pvalue.AsList(some_pcoll) inline here). When the
concrete PCollectionView is used to access the value this means that
it must be passed separately to both the ParDo and the callback
(unless we can infer it, which I don't think we can do in all (many?)

There's no reason we couldn't do this, or something very similar, in
Java as well.

On Wed, Sep 13, 2017 at 10:55 AM, Reuven Lax <[EMAIL PROTECTED]lid> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB