Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> java.lang.OutOfMemoryError while running Pig Job


+
sonia gehlot 2011-05-12, 17:43
Copy link to this message
-
Re: java.lang.OutOfMemoryError while running Pig Job
The stack trace shows that the OOM error is happening when the distinct is
being applied. It looks like in some record(s) of the relation group_it, one
more of the following bags is very large - logic.c_users,  logic.nc_users or
logic.registered_users;

Try setting the property pig.cachedbag.memusage to 0.1 or lower (
-Dpig.cachedbag.memusage=0.1 on java command line). It controls the memory
used by pig internal bags, including those used by distinct.

If that does not work, you can try computing count-distinct for each type of
user separately and then combining the result.
You might want to have a look at this way of optimizing count-distinct
queries where skew can be a problem -
https://issues.apache.org/jira/browse/PIG-1846

-thejas

On 5/12/11 10:43 AM, "sonia gehlot" <[EMAIL PROTECTED]> wrote:

> Hi Guys,
>
> I am running following Pig script in Pig 0.8 version
>
> page_events = LOAD '/user/sgehlot/day=2011-05-10' as
> (event_dt_ht:chararray,event_dt_ut:chararray,event_rec_num:int,event_type:int,
> client_ip_addr:long,hub_id:int,is_cookied_user:int,local_ontology_node_id:int,
> page_type_id:int,content_id:int,product_id:int,referrer_edition_id:int,page_nu
> mber:int,is_iab_robot:int,browser_id:int,os_id:int,dw_pubsys_id:int,refresh:in
> t,asset_id:int,asset_type_id:int,content_type_id:int,product_type_id:int,outbo
> und_email_id:long,gbal_clc:int,mtype:int,user_action_id:int,referring_partner_
> id:int,ontology_node_id:int,content_namespace_id:int,product_namespace_id:int,
> transparent_edition_id:int,default_edition_id:int,event_seq_num:int,is_last_pa
> ge:int,is_new_user:int,page_duration:int,page_seq_num:int,session_id:long,time
> _since_sess_start:int,reg_cookie:chararray,urs_app_id:int,is_reg_user:int,edit
> ion_id:int,user_agent_id:int,page_type_key:int,referrer_id:int,channel_id:int,
> level2_id:int,level3_id:int,brand_id:int,content_key:int,product_key:int,editi
> on_key:int,partner_key:int,business_unit_id:int,anon_cookie:chararray,machine_
> name:chararray,pagehost:chararray,filenameextension:chararray,referrerpath:cha
> rarray,referrerhost:chararray,referring_oid:chararray,referring_legacy_oid:cha
> rarray,ctype:chararray,cval:chararray,link_tag:chararray,link_type:chararray,s
> ticky_tag:chararray,page_url:chararray,search_category:chararray,partner_subje
> ct:chararray,referring_partner_name:chararray,robot_pattern:chararray,browser:
> chararray,browser_major_version:chararray,browser_minor_version:chararray,os:c
> hararray,os_family:chararray,ttag:chararray,dest_oid:chararray,global_id:chara
> rray,hostname:chararray,path:chararray,filename:chararray,extension:chararray,
> query:chararray,user_agent:chararray,xrq:chararray,xref:chararray,page_guid:ch
> ararray,test_name:chararray,test_group:chararray,test_version:chararray,page_v
> ersion:chararray,o_sticky_tag:chararray,new_referring_oid:chararray,day:charar
> ray,network_ip:int,site_id:int,search_phrase:chararray,search_attributes:chara
> rray,web_search_phrase:chararray,ip_address:chararray,is_pattern_match_robot:i
> nt,protocol:chararray,skc_title:chararray,skc_url:chararray,has_site_search_ph
> rase:int,has_site_search_attribs:int,has_web_search_phrase:int,title_id:charar
> ray,url_id:chararray,network_rev:int);
>
> referrer_group_map = LOAD '/user/sgehlot/oozie/db_data/referrer_group_map'
> as
> (referrer_id:int, has_web_search_phrase:int, hostname:chararray,
> referral_type_id:int,
> referral_type_name:chararray,
> referrer_group_id:int,referrer_group_name:chararray,referrer_group_cat_id:int,
> referrer_group_cat:chararray);
>
> filter_pe = FILTER page_events BY is_iab_robot == 0 AND
> is_pattern_match_robot == 0 AND day == '2011-05-10';
>
> select_pe_col = FOREACH filter_pe GENERATE day, is_cookied_user,
> anon_cookie, reg_cookie, referrer_id, has_web_search_phrase,
> business_unit_id;
>
> select_ref_col = FOREACH referrer_group_map GENERATE referrer_id,
> has_web_search_phrase, referral_type_id;
>
> jn = JOIN select_ref_col BY (referrer_id, has_web_search_phrase),
org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:33>
)
+
sonia gehlot 2011-05-20, 19:11
+
Xiaomeng Wan 2011-05-20, 20:08
+
sonia gehlot 2011-05-20, 20:20
+
Xiaomeng Wan 2011-05-20, 20:54
+
sonia gehlot 2011-05-23, 21:17
+
Dmitriy Ryaboy 2011-05-23, 22:03
+
sonia gehlot 2011-05-23, 23:01