Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Job Failure With More Number Of Input Files


Copy link to this message
-
Pig Job Failure With More Number Of Input Files
Hi all,

There is a pig job which is failing.

*Pig Script*

Register
/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/pig/piggybank.jar;

/* READ LAST 30 DAYS DATA */
/* xyz table is partitioned with dt*/

au = LOAD 'xyz' USING org.apache.hcatalog.pig.HCatLoader();

ad = FILTER au by ($a) and $1 == '0100';

dd = limit ad 10;

dump dd;

*Property file*

a="dt == '20140501' or dt == '20140430' or dt == '20140429' or dt ==
'20140428' or dt == '20140427' or dt == '20140426' or dt == '20140425' or
dt == '20140424' or dt == '20140423' or dt == '20140422' or dt ==
'20140421' or dt == '20140420' or dt == '20140419' or dt == '20140418' or
dt == '20140417' or dt == '20140416' or dt == '20140415' or dt ==
'20140414' or dt == '20140413' or dt == '20140412' or dt == '20140411' or
dt == '20140410' or dt == '20140409' or dt == '20140408' or dt ==
'20140407' or dt == '20140406' or dt == '20140405' or dt == '20140404' or
dt == '20140403' or dt == '20140402'"

*Job is successful when running for 3 days, but failing when using for more
than 3 days. For each date there are more than 30 files. We thought that it
is hitting https://issues.apache.org/jira/browse/MAPREDUCE-2779
<https://issues.apache.org/jira/browse/MAPREDUCE-2779>, for which i have
changed **pig.maxCombinedSplitSize to 256mb to reduce the number of
mappers/splits even that did not help*

*Error*
*java.io.IOException: Split class
hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
not found*

at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:641)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: *Class
hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
not found*
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
... 7 more
Any inputs to resolve this issue are appreciated. Thanks for your help

Thanks,
Abhishek
2018509769

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB