Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> About SpillableMemoryManager


Copy link to this message
-
About SpillableMemoryManager
hello

I just have came across a problem with SpillableMemoryManager.
I've searched lots of discussion contained this key, but they are all
different from my problem.

The problem is

When I run a pig script,it takes longer to finish the same task on the
powerful machine. And the log(the part that is not clear to me )  of the
task node is

Week Node:

2001-06-28 04:04:39,356 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
call - Collection threshold init = 86048768(84032K) used 86048752(84031K) committed = 125304832(122368K) max 139853824(136576K)
2001-06-28 04:04:39,940 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
call- Usage threshold init = 86048768(84032K) used = 98041880(95744K)
committed = 125304832(122368K) max = 139853824(136576K)
2001-06-28 04:06:10,048 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201211010504_0007_r_000018_0 is done. And is in the
process of commiting
Powerful Node:

2012-11-01 06:12:56,801 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
call- Usage threshold init = 139853824(136576K) used 99240424(96914K) committed = 139853824(136576K) max 139853824(136576K)
2012-11-01 06:13:22,733 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
call - Collection threshold init = 139853824(136576K) used 77466824(75651K) committed = 139853824(136576K) max 139853824(136576K)
2012-11-01 06:15:41,178 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201211010504_0007_r_000014_0 is done. And is in the
process of commiting
My question is how to control the number following  those like  the  "Usage
threshold init" , It seems I can't set them in the config files.
Are they default to some hardware parameters?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
The description of the cluster

I have a heterogeneous cluster with
 6 virtual machines with 4-core and 8G memory for each.
 4 physical machines with 24-core and 32Gmemory for each.

The hadoop configs are all the same for all nodes(I assigned the same slots
for M/R to the powerful machines even there is a waste)
The pig script that cause the problem:

grouped_recs= GROUP IDF_VALID BY ast_id PARALLEL 40;

rollup= FOREACH grouped_recs {

        bombay_code= FILTER IDF_VALID BY $2 == 76 ;
        singapore_code= FILTER IDF_VALID BY $2 == 90 ;

GENERATE

        FLATTEN(group) as nda_id,
        FLATTEN((IsEmpty(bombay_code)?null:bombay_code.$1)) AS bombay_code ,
  FLATTEN((IsEmpty(singapore_code)?null:singapore_code.$1)) AS
singapore_code;

}

Thanks&Regards
Xingbang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB