Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> What is implemented behind the PIG Joins


Copy link to this message
-
Re: What is implemented behind the PIG Joins
Hi Byambajargal,
What version of pig does your distribution use ?
-Thejas

On 8/22/11 3:42 AM, byambaa wrote:
> Hello
> I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU,
> 1 TB HDD and i am using cloudera distribution CHD4b with Pig. I have two
> Pig
> Join queries which are a Parallel and a Replicated version of pig Join
> and MapReduce Reduce side and Map side joins.
>
> Theoretically Replicated Join could be faster than Parallel join but in
> my case Parallel is faster.
> i have a questions :
>
> 1.I am wondering why the replicated join is so slowly how it works what
> is the behind the replicated join.
> 2. MR reduce side join was faster than parallel pig join, what is
> implemented background the parallel pig join. i guess pig implement also
> MR reduce side join.
>
> Could you explain me about the Pig joins how it works and what is run
> behind the pig scripts
>
>
> Replicated Join in HDFS Replicated Join in Hbase MR Reduce side join MR
> Joins (Singleton pattern)
> obr_wp_annotation 1786MB
> 29 sec 50 sec 36 sec 19
> obr_ct_annotation 5916MB
> 799 sec 523 sec
> 108 sec 69
> obr_pm_annotation 16983MB
> 1794 sec
> 707 sec 248 sec 138
>
> the relation file is 659MB
>
> thanks you very much
>
> Byambajargal
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB