Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Hadoop sampler related query!


Copy link to this message
-
Re: Hadoop sampler related query!
Mighty users@hadoop

anyone on this.
On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
>
> The question here is -
>
> Mapper<K,V,OK,OV>
>
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> mapper.
>
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
>
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
>
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
>
>  Is there is any condition that input sample is to be only used for
> mapper<K,V,K,V1>?
>
>
> Thanks,
> Rahul
>
>