Mike Hugo 2013-05-13, 22:09
Christopher 2013-05-14, 00:47
Mike Hugo 2013-05-14, 02:04
Jared Winick 2013-05-14, 06:09
If it's just as the value, it's really up to your preference. Since it
sounds like you have issues using encoded data as the value for shell
users, you can switch to String representations. A possible alternative is
using the views we have in the shell (transformations? I don't remember the
name, I don't know much about them).
Another concern is you have iterators/combiners running on the values, they
need to be aware of the format. But ultimately, the point is that your
format really doesn't matter, but it's that you're going to have to be
consistent from then on.
On Mon, May 13, 2013 at 6:09 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
> I've been playing around with the LongCombiner on a table that's summing
> up the counts of output of a MapReduce job, very similar to the WordCount
> example from the user manual.
> I started out encoding the values using LongCombiner.FIXED_LEN_ENCODER,
> but have noticed that this can lead to some confusion later on downstream.
> For example, a co-worker was scanning using the shell and was caught off
> guard by the encoded values. Also, out of the box, the StatsCombiner
> example works using String values, not Long values so we built a custom
> piece to essentially do the same thing with Long values instead.
> It looks to me like most of the examples I've seen just store things are
> String values, rather than encoding them. What are the tradeoffs? We're
> at a point where we could pretty easily switch things to just use strings -
> it seems like that might make things more convenient from a maintenance
> perspective (human readable values) and would allow us to re-use some
> existing components (e.g. StatsCombiner). Any thoughts?