Vasco Visser 2012-09-07, 13:32
-Re: POCollectedGroup and LoadFunc indicator interface
Alan Gates 2012-09-13, 02:57
You are correct, this would be better named OrderedCollectableLoadFunc. I suspect the way this happened is that this is usually used on the output of MapReduce jobs. In that case (at least in MR1) the keys are sorted as well as guaranteed to be in a particular part file.
On Sep 7, 2012, at 6:32 AM, Vasco Visser wrote:
> Hi I am new to the list. I've been working on the Pig code base,
> adding my own blocking map side POs (e.g., map side join, map side
> grouping) for when assertions can be made with regard to fragmentation
> of input relations. Partly inspired by the new block placement policy
> possibilities in hadoop-2.
> Anyway, my question to the list is the following. Whilst looking at
> the code for POCollectedGroup I noticed that this PO expects split
> content to be sorted. On the other hand the Collectable loader
> interface only seems to indicate that keys are unique across splits.
> Why is this discrepancy? Is there a good reason not to have a
> indicator interface that captures all input requirements, e.g., smt
> like OrderedCollectableLoadFunc.