-Re: What is implemented behind the PIG Joins
Thejas Nair 2011-08-22, 20:58
What version of pig does your distribution use ?
On 8/22/11 3:42 AM, byambaa wrote:
> I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU,
> 1 TB HDD and i am using cloudera distribution CHD4b with Pig. I have two
> Join queries which are a Parallel and a Replicated version of pig Join
> and MapReduce Reduce side and Map side joins.
> Theoretically Replicated Join could be faster than Parallel join but in
> my case Parallel is faster.
> i have a questions :
> 1.I am wondering why the replicated join is so slowly how it works what
> is the behind the replicated join.
> 2. MR reduce side join was faster than parallel pig join, what is
> implemented background the parallel pig join. i guess pig implement also
> MR reduce side join.
> Could you explain me about the Pig joins how it works and what is run
> behind the pig scripts
> Replicated Join in HDFS Replicated Join in Hbase MR Reduce side join MR
> Joins (Singleton pattern)
> obr_wp_annotation 1786MB
> 29 sec 50 sec 36 sec 19
> obr_ct_annotation 5916MB
> 799 sec 523 sec
> 108 sec 69
> obr_pm_annotation 16983MB
> 1794 sec
> 707 sec 248 sec 138
> the relation file is 659MB
> thanks you very much