|
|
-
About SpillableMemoryManagerW W 2012-11-01, 09:59
hello
I just have came across a problem with SpillableMemoryManager. I've searched lots of discussion contained this key, but they are all different from my problem. The problem is When I run a pig script,it takes longer to finish the same task on the powerful machine. And the log(the part that is not clear to me ) of the task node is Week Node: 2001-06-28 04:04:39,356 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 86048768(84032K) used 86048752(84031K) committed = 125304832(122368K) max 139853824(136576K) 2001-06-28 04:04:39,940 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call- Usage threshold init = 86048768(84032K) used = 98041880(95744K) committed = 125304832(122368K) max = 139853824(136576K) 2001-06-28 04:06:10,048 INFO org.apache.hadoop.mapred.Task: Task:attempt_201211010504_0007_r_000018_0 is done. And is in the process of commiting Powerful Node: 2012-11-01 06:12:56,801 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call- Usage threshold init = 139853824(136576K) used 99240424(96914K) committed = 139853824(136576K) max 139853824(136576K) 2012-11-01 06:13:22,733 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 139853824(136576K) used 77466824(75651K) committed = 139853824(136576K) max 139853824(136576K) 2012-11-01 06:15:41,178 INFO org.apache.hadoop.mapred.Task: Task:attempt_201211010504_0007_r_000014_0 is done. And is in the process of commiting My question is how to control the number following those like the "Usage threshold init" , It seems I can't set them in the config files. Are they default to some hardware parameters? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~` The description of the cluster I have a heterogeneous cluster with 6 virtual machines with 4-core and 8G memory for each. 4 physical machines with 24-core and 32Gmemory for each. The hadoop configs are all the same for all nodes(I assigned the same slots for M/R to the powerful machines even there is a waste) The pig script that cause the problem: grouped_recs= GROUP IDF_VALID BY ast_id PARALLEL 40; rollup= FOREACH grouped_recs { bombay_code= FILTER IDF_VALID BY $2 == 76 ; singapore_code= FILTER IDF_VALID BY $2 == 90 ; GENERATE FLATTEN(group) as nda_id, FLATTEN((IsEmpty(bombay_code)?null:bombay_code.$1)) AS bombay_code , FLATTEN((IsEmpty(singapore_code)?null:singapore_code.$1)) AS singapore_code; } Thanks&Regards Xingbang |