Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Replicated join: is there a setting to make this better?


+
Aniket Mokashi 2013-02-20, 21:18
Copy link to this message
-
Re: Replicated join: is there a setting to make this better?
Johnny Zhang 2013-02-22, 02:30
Hi, Aniket:
your image is blank :) not sure if this only happens to me though.

Johnny
On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote:

> I think the email was filtered out. Resending.
>
>
> ---------- Forwarded message ----------
> From: Aniket Mokashi <[EMAIL PROTECTED]>
> Date: Wed, Feb 20, 2013 at 1:18 PM
> Subject: Replicated join: is there a setting to make this better?
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>
>
> Hi devs,
>
> I was looking into limitations of size/records for fragment replicated
> join (map join) in pig. To test that I loaded a map (aka fragment) of longs
> in an alias to join it with other alias which had few other columns. With a
> map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
> mapper to look into whats causing the GC Overheads and found that its the
> memory footprint of fragment itself was high.
>
> [image: Inline image 1]
>
> Note, the hashmap was able to only load about 1.8 million records-
> [image: Inline image 2]
> Reason was that every map record has an overhead of about 1.5kb. Most of
> it is part of retained heap, but it needs to be garbage collected.
> [image: Inline image 3]
>
> So, it turns out-
>
> Size of heap required by a map join from above = 1.5 KB * Number of
> records + Size of input (uncompressed databytearray)... (assuming the key
> is a long).
>
> So, to run your replicated join, you need to satisfy following criteria:
>
> *1.5 KB * Number of records + Size of input (uncompressed) < estimated
> free memory in the mapper (total heap - io.sort.mb - some minor constant
> etc.)*
>
> Is that a right conclusion? Is there a setting/way to make this better?
>
> Thanks,
>
> Aniket
>
> *
> *
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>
+
Aniket Mokashi 2013-02-22, 03:02
+
Prashant Kommireddi 2013-02-22, 03:07
+
Aniket Mokashi 2013-02-22, 08:42
+
Jonathan Coveney 2013-02-22, 09:17