Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - About SpillableMemoryManager


Copy link to this message
-
Re: About SpillableMemoryManager
W W 2012-11-18, 16:32
The problem has been solved, it's related with the Bug  PIG-2923 (PIG-2917,
PIG-2918). ( refer to [EMAIL PROTECTED])

Dmitriy has actually fixed 2 months ago, when I use pig-0.11,  my problem
has gone, and the GC time falls from 80s to 0.5s .

Thanks for your effort,Dmitriy.
Xingbang.Wang

2012/11/7 W W <[EMAIL PROTECTED]>

> Thanks for your help Dmitriy!
>
> I've found the problem of the powerful machine being slow th an weak
> machine.
>
> The heap size is not the answer to the problem of powerful machine being
> slower than week one.
>
> It's the because the GC time on the powerful machine is  more than twice
> on those week ones.
> In my case, JVM by default assign the powerful machine 18 GC threads(there
> are 24 cores on one Node)  while on the weak machine only    4(only 4 cores
> on the Node)  threads of GC.   And the memory are the same, so the overhead
> of GC on the powerful machine dominates.
>
> I think that's the main reason of my problem.
>
> Besides,I think  the SurvivorRatio of Java heap also contributes to that.
>  My guess is for pig, most of the data on the flow will be somehow  garbage
> collected, so if the Survivor area too bigger(given that the New Generation
> in JVM is constant), it means Eden area is smaller. Then more gc is needed.
>  There should be a pivotal point for the SurvivorRation.
>
>
> my solution is add the following to mapred-site.xml.
>         <property>
>                 <name>mapred.child.java.opts</name>
>                 <value> -XX:ParallelGCThreads=4
> -XX:SurvivorRatio=20</value>
>         </property>
>
>
> Thanks
> Regards
> Xingbang Wang
>
> 2012/11/3 Dmitriy Ryaboy <[EMAIL PROTECTED]>
>
>> mapred.child.java.opts should be in the gigabytes, 200M is way too low.
>> Check this stack overflow thread for comments on how to ensure your
>> setting
>> actually takes effect -- it's possible you are not propagating it to the
>> job. If you change it in the hadoop config files, you need to restart the
>> MR daemons (JT and TTs).
>> http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop
>>
>> I'll take a look at your script next time I have a few minutes, but try
>> this first -- 200M is definitely too low to get much done in Hadoop.
>>
>>
>> On Fri, Nov 2, 2012 at 3:17 AM, W W <[EMAIL PROTECTED]> wrote:
>>
>> > hi Dmitriy
>> > Thanks for your explanation!
>> > I think split on $2 is not easy because what I am doing is actually
>> > rolling-up a table,which means they can not be get by join.
>> > Here is the whole script with schema although I omitted many FLATTENs .
>> >
>> > IDF_VALID= LOAD '/user/hadoop/idf.dat'
>> > USING PigStorage('^A') AS (
>> >
>> >   ast_id : int,
>> >   value :chararray,
>> >   pro_id : int,
>> >   pag_id  : int ,
>> >   bgr_id : int,
>> >
>> > );
>> >
>> > grouped_recs= GROUP IDF_VALID BY ast_id PARALLEL 40;
>> >
>> > rollup= FOREACH grouped_recs {
>> >
>> >         bombay_code= FILTER IDF_VALID BY $2 == 76 ;
>> >         singapore_code= FILTER IDF_VALID BY $2 == 90 ;
>> >
>> > GENERATE
>> >
>> >         FLATTEN(group) as nda_id,
>> >         FLATTEN((IsEmpty(bombay_code)?null:bombay_code.$1)) AS
>> bombay_code
>> > ,
>> >   FLATTEN((IsEmpty(singapore_code)?null:singapore_code.$1)) AS
>> > singapore_code;
>> >
>> > }
>> >
>> > STORE rollup INTO 'idf-out-full' USING PigStorage('^A');
>> >
>> >
>> >
>> > Besides,  how can I "  increase the amount of available heap". I've
>> changed
>> > mapred.child.java.opts   from -Xmx200m  to -Xmx1024m .  It seems it
>> doesn't
>> > help. And that threshold value is still the same.
>> > when I monitor the java process by top command, it seems the setting of
>> > mapred.child.java.opts have NO influence on both VIRT and RES, it seems
>> >  mapred.child.java.opts has been overrided by pig.
>> >  Do you have any idea about that ?
>> >
>> > Thanks and Regards
>> > Xingbang
>> >
>> >
>> >
>> > 2012/11/2 Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> >
>> > > Rather than increase memory, rewrite the script so it does not need so