Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> About SpillableMemoryManager


Copy link to this message
-
Re: About SpillableMemoryManager
The problem has been solved, it's related with the Bug  PIG-2923 (PIG-2917,
PIG-2918). ( refer to [EMAIL PROTECTED])

Dmitriy has actually fixed 2 months ago, when I use pig-0.11,  my problem
has gone, and the GC time falls from 80s to 0.5s .

Thanks for your effort,Dmitriy.
Xingbang.Wang

2012/11/7 W W <[EMAIL PROTECTED]>

> Thanks for your help Dmitriy!
>
> I've found the problem of the powerful machine being slow th an weak
> machine.
>
> The heap size is not the answer to the problem of powerful machine being
> slower than week one.
>
> It's the because the GC time on the powerful machine is  more than twice
> on those week ones.
> In my case, JVM by default assign the powerful machine 18 GC threads(there
> are 24 cores on one Node)  while on the weak machine only    4(only 4 cores
> on the Node)  threads of GC.   And the memory are the same, so the overhead
> of GC on the powerful machine dominates.
>
> I think that's the main reason of my problem.
>
> Besides,I think  the SurvivorRatio of Java heap also contributes to that.
>  My guess is for pig, most of the data on the flow will be somehow  garbage
> collected, so if the Survivor area too bigger(given that the New Generation
> in JVM is constant), it means Eden area is smaller. Then more gc is needed.
>  There should be a pivotal point for the SurvivorRation.
>
>
> my solution is add the following to mapred-site.xml.
>         <property>
>                 <name>mapred.child.java.opts</name>
>                 <value> -XX:ParallelGCThreads=4
> -XX:SurvivorRatio=20</value>
>         </property>
>
>
> Thanks
> Regards
> Xingbang Wang
>
> 2012/11/3 Dmitriy Ryaboy <[EMAIL PROTECTED]>
>
>> mapred.child.java.opts should be in the gigabytes, 200M is way too low.
>> Check this stack overflow thread for comments on how to ensure your
>> setting
>> actually takes effect -- it's possible you are not propagating it to the
>> job. If you change it in the hadoop config files, you need to restart the
>> MR daemons (JT and TTs).
>> http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop
>>
>> I'll take a look at your script next time I have a few minutes, but try
>> this first -- 200M is definitely too low to get much done in Hadoop.
>>
>>
>> On Fri, Nov 2, 2012 at 3:17 AM, W W <[EMAIL PROTECTED]> wrote:
>>
>> > hi Dmitriy
>> > Thanks for your explanation!
>> > I think split on $2 is not easy because what I am doing is actually
>> > rolling-up a table,which means they can not be get by join.
>> > Here is the whole script with schema although I omitted many FLATTENs .
>> >
>> > IDF_VALID= LOAD '/user/hadoop/idf.dat'
>> > USING PigStorage('^A') AS (
>> >
>> >   ast_id : int,
>> >   value :chararray,
>> >   pro_id : int,
>> >   pag_id  : int ,
>> >   bgr_id : int,
>> >
>> > );
>> >
>> > grouped_recs= GROUP IDF_VALID BY ast_id PARALLEL 40;
>> >
>> > rollup= FOREACH grouped_recs {
>> >
>> >         bombay_code= FILTER IDF_VALID BY $2 == 76 ;
>> >         singapore_code= FILTER IDF_VALID BY $2 == 90 ;
>> >
>> > GENERATE
>> >
>> >         FLATTEN(group) as nda_id,
>> >         FLATTEN((IsEmpty(bombay_code)?null:bombay_code.$1)) AS
>> bombay_code
>> > ,
>> >   FLATTEN((IsEmpty(singapore_code)?null:singapore_code.$1)) AS
>> > singapore_code;
>> >
>> > }
>> >
>> > STORE rollup INTO 'idf-out-full' USING PigStorage('^A');
>> >
>> >
>> >
>> > Besides,  how can I "  increase the amount of available heap". I've
>> changed
>> > mapred.child.java.opts   from -Xmx200m  to -Xmx1024m .  It seems it
>> doesn't
>> > help. And that threshold value is still the same.
>> > when I monitor the java process by top command, it seems the setting of
>> > mapred.child.java.opts have NO influence on both VIRT and RES, it seems
>> >  mapred.child.java.opts has been overrided by pig.
>> >  Do you have any idea about that ?
>> >
>> > Thanks and Regards
>> > Xingbang
>> >
>> >
>> >
>> > 2012/11/2 Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> >
>> > > Rather than increase memory, rewrite the script so it does not need so
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB