I've been playing around with the LongCombiner on a table that's summing up
the counts of output of a MapReduce job, very similar to the WordCount
example from the user manual.
I started out encoding the values using LongCombiner.FIXED_LEN_ENCODER, but
have noticed that this can lead to some confusion later on downstream. For
example, a co-worker was scanning using the shell and was caught off guard
by the encoded values. Also, out of the box, the StatsCombiner example
works using String values, not Long values so we built a custom piece to
essentially do the same thing with Long values instead.
It looks to me like most of the examples I've seen just store things are
String values, rather than encoding them. What are the tradeoffs? We're
at a point where we could pretty easily switch things to just use strings -
it seems like that might make things more convenient from a maintenance
perspective (human readable values) and would allow us to re-use some
existing components (e.g. StatsCombiner). Any thoughts?