Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig optimization rules


Copy link to this message
-
Re: Pig optimization rules
abhishek dodda 2012-10-17, 03:15
Hi Thejas,

Thanks for reply.
Which factor to be considered to set number of number of reducers to
a outer join query with replicated,Will increasing number of reducers in a
outer join query improve the performance ??

Yes, it should increase performance. One thing to watch is for skew among
the reduce runtime. If the reduce runtimes are very skewed, you might want
to consider skew join.

I have some questions

1) I got your answer.But question here is* REDUCER NUMBER * what should i
consider to fire number of reducers.

example :

 A = JOIN B BY ID LEFT OUTER,C BY ID using 'replicated' parallel ??,  *<
------------ How do i select this number.*
*
*
2) What is the importance of this property
*mapred.job.reduce.markreset.buffer.percent
, *How does it effects the performance and what is the optimal value for
this parameter.

3) I have read that *Bloom Filter *in pig 0.10 effects the join
performance, How efficient is Bloom filter compared to Replicated join.Can
Bloom filter be applied for Outer join.

Regards
Abhishek

On Tue, Oct 16, 2012 at 10:04 PM, Thejas Nair <[EMAIL PROTECTED]>wrote:

> On 10/15/12 8:47 PM, abhishek dodda wrote:
>
>> hi all,
>>
>> I am trying to learn and implement pig optimization rules, Can any one
>> help
>> me understanding below properities.
>>
>> The amount of memory allocated to bags is determined by
>> *pig.cachedbag.memusage;
>> the default is set to 20% (0.2) of available memory.* Note that this
>> memory
>>
>> is shared across all large bags used by the application.
>>
>> *Which memory is this ?? 20% which memory is allocated.*
>>
>
> This is 20% of the map/reduce task available memory, ie the jvm maximum
> memory limit.
>
>
>  Which factor to be considered to set number of number of reducers to a
>> outer join query with replicated.
>> Will increasing number of reducers in a outer join query improve the
>> performance ??
>>
>>
> Yes, it should increase performance. One thing to watch is for skew among
> the reduce runtime. If the reduce runtimes are very skewed, you might want
> to consider skew join.
>
> -Thejas
>
>