Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - About SpillableMemoryManager


+
W W 2012-11-01, 09:59
Copy link to this message
-
Re: About SpillableMemoryManager
Dmitriy Ryaboy 2012-11-02, 00:00
Rather than increase memory, rewrite the script so it does not need so much
ram to begin with.
You can split on $2, group and generate what you need, then join things
back.
Hard to tell what exactly you are going for without schemas and expected
inputs/outputs.

If the hadoop configs are the same, the fact that it's the powerful machine
that fails doesn't mean anything -- you are running out of RAM, and you
gave all machines the same amount of RAM for the reduce processes. It just
happens to be the one that a big group is hashing to.

The threshold you are asking about is the threshold after which Pig will
try to spill what it can, since GC is imminent. It's defined as 70% of the
largest memory pool found on the jvm. This threshold itself is not what you
want to increase -- you want to increase the amount of available heap if
possible.

You can set pig.spill.gc.activation.size (invoke GC if we managed to spill
at least this much) and pig.spill.size.threshold (how big a spill must be
before it makes sense to spill anything) if you want.

D
On Thu, Nov 1, 2012 at 2:59 AM, W W <[EMAIL PROTECTED]> wrote:

> hello
>
> I just have came across a problem with SpillableMemoryManager.
> I've searched lots of discussion contained this key, but they are all
> different from my problem.
>
> The problem is
>
> When I run a pig script,it takes longer to finish the same task on the
> powerful machine. And the log(the part that is not clear to me )  of the
> task node is
>
> Week Node:
>
> 2001-06-28 04:04:39,356 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
> call - Collection threshold init = 86048768(84032K) used > 86048752(84031K) committed = 125304832(122368K) max > 139853824(136576K)
> 2001-06-28 04:04:39,940 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
> call- Usage threshold init = 86048768(84032K) used = 98041880(95744K)
> committed = 125304832(122368K) max = 139853824(136576K)
> 2001-06-28 04:06:10,048 INFO org.apache.hadoop.mapred.Task:
> Task:attempt_201211010504_0007_r_000018_0 is done. And is in the
> process of commiting
>
>
> Powerful Node:
>
> 2012-11-01 06:12:56,801 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
> call- Usage threshold init = 139853824(136576K) used > 99240424(96914K) committed = 139853824(136576K) max > 139853824(136576K)
> 2012-11-01 06:13:22,733 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
> call - Collection threshold init = 139853824(136576K) used > 77466824(75651K) committed = 139853824(136576K) max > 139853824(136576K)
> 2012-11-01 06:15:41,178 INFO org.apache.hadoop.mapred.Task:
> Task:attempt_201211010504_0007_r_000014_0 is done. And is in the
> process of commiting
>
>
> My question is how to control the number following  those like  the  "Usage
> threshold init" , It seems I can't set them in the config files.
> Are they default to some hardware parameters?
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
>
>
> The description of the cluster
>
> I have a heterogeneous cluster with
>  6 virtual machines with 4-core and 8G memory for each.
>  4 physical machines with 24-core and 32Gmemory for each.
>
> The hadoop configs are all the same for all nodes(I assigned the same slots
> for M/R to the powerful machines even there is a waste)
>
>
>
>
> The pig script that cause the problem:
>
> grouped_recs= GROUP IDF_VALID BY ast_id PARALLEL 40;
>
> rollup= FOREACH grouped_recs {
>
>         bombay_code= FILTER IDF_VALID BY $2 == 76 ;
>         singapore_code= FILTER IDF_VALID BY $2 == 90 ;
>
> GENERATE
>
>         FLATTEN(group) as nda_id,
>         FLATTEN((IsEmpty(bombay_code)?null:bombay_code.$1)) AS bombay_code
> ,
>   FLATTEN((IsEmpty(singapore_code)?null:singapore_code.$1)) AS
> singapore_code;
>
> }
>
>
>
> Thanks&Regards
> Xingbang
>
+
W W 2012-11-02, 10:17
+
Dmitriy Ryaboy 2012-11-02, 16:07
+
W W 2012-11-07, 10:06
+
W W 2012-11-18, 16:32