|
|
+
abhishek dodda 2012-10-16, 03:47
+
Thejas Nair 2012-10-17, 02:04
-
Re: Pig optimization rulesabhishek dodda 2012-10-17, 03:15
Hi Thejas,
Thanks for reply. Which factor to be considered to set number of number of reducers to a outer join query with replicated,Will increasing number of reducers in a outer join query improve the performance ?? Yes, it should increase performance. One thing to watch is for skew among the reduce runtime. If the reduce runtimes are very skewed, you might want to consider skew join. I have some questions 1) I got your answer.But question here is* REDUCER NUMBER * what should i consider to fire number of reducers. example : A = JOIN B BY ID LEFT OUTER,C BY ID using 'replicated' parallel ??, *< ------------ How do i select this number.* * * 2) What is the importance of this property *mapred.job.reduce.markreset.buffer.percent , *How does it effects the performance and what is the optimal value for this parameter. 3) I have read that *Bloom Filter *in pig 0.10 effects the join performance, How efficient is Bloom filter compared to Replicated join.Can Bloom filter be applied for Outer join. Regards Abhishek On Tue, Oct 16, 2012 at 10:04 PM, Thejas Nair <[EMAIL PROTECTED]>wrote: > On 10/15/12 8:47 PM, abhishek dodda wrote: > >> hi all, >> >> I am trying to learn and implement pig optimization rules, Can any one >> help >> me understanding below properities. >> >> The amount of memory allocated to bags is determined by >> *pig.cachedbag.memusage; >> the default is set to 20% (0.2) of available memory.* Note that this >> memory >> >> is shared across all large bags used by the application. >> >> *Which memory is this ?? 20% which memory is allocated.* >> > > This is 20% of the map/reduce task available memory, ie the jvm maximum > memory limit. > > > Which factor to be considered to set number of number of reducers to a >> outer join query with replicated. >> Will increasing number of reducers in a outer join query improve the >> performance ?? >> >> > Yes, it should increase performance. One thing to watch is for skew among > the reduce runtime. If the reduce runtimes are very skewed, you might want > to consider skew join. > > -Thejas > > |