-Re: Hadoop sampler related query!
Rahul Bhattacharjee 2013-04-23, 10:42
+ mapred dev
On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
> The question here is -
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
> Is there is any condition that input sample is to be only used for