HDFS, mail # user - Re: Hadoop sampler related query!

Re: Hadoop sampler related query!
Rahul Bhattacharjee 2013-04-16, 15:45
Mighty users@hadoop

anyone on this.
On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <

> Hi,
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
> The question here is -
> Mapper<K,V,OK,OV>
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> mapper.
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
>  Is there is any condition that input sample is to be only used for
> mapper<K,V,K,V1>?
> Thanks,
> Rahul