Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Errors in one Hive script using LZO compression


Copy link to this message
-
Re: Errors in one Hive script using LZO compression
Ok guys I solved it in not so elegant way but I need to go forward in production and deploy this because of time constraints :-)

I divided the scripts into two stages
Stage 1 : The hive script creates TXT files and writes to HDFS
Stage 2 : I wrote a Lzo file creator and indexer that will convert the TXT files on HDFS to .lzo and .lzo.index

I still don't know what makes this specific hive script throw this error….but I got to keep going ahead….

Perhaps if anyone can shed more light on this error in the future , I will STILL be interested in knowing the root cause

Thanks

sanjay
From: Sanjay Subramanian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, June 17, 2013 11:59 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Errors in one Hive script using LZO compression

Hi

I am using LZO compression in our scripts but one script is still creating errors

Diagnostic Messages for this Task:
Error: java.io.IOException: java.io.EOFException: Premature EOF from inputStream
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:160)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.EOFException: Premature EOF from inputStream
        at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
        at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
        at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
        at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1871)
        at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
        at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)
        ... 9 more
SCRIPT
======set hiveconf mapred.output.compression.type=BLOCK;
set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set mapreduce.map.output.compress=true;
set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;
set mapreduce.output.fileoutputformat.compress=true;
set hive.exec.compress.intermediate=true;
set mapreduce.job.maps=500;
set mapreduce.job.reduces=8;
set mapreduce.tasktracker.map.tasks.maximum=12;
set mapreduce.tasktracker.reduce.tasks.maximum=8;
add jar /home/nextag/sasubramanian/mycode/impressions/jar/impressions-hiveudfs-1.0-20130615-155038.jar;
create temporary function collect  as 'com.wizecommerce.utils.hive.udf.GenericUDAFCollect';
create temporary function isnextagip  as 'com.wizecommerce.utils.hive.udf.IsNextagIP';
create temporary function isfrombot  as 'com.wizecommerce.utils.hive.udf.IsFromBot';
create temporary function processblankkeyword  as 'com.wizecommerce.utils.hive.udf.ProcessBlankKeyword';
create temporary function getValidHiddenSellers as 'com.wizecommerce.utils.hive.udf.GetValidHiddenSellers';
INSERT OVERWRITE DIRECTORY '/user/beeswax/warehouse/keyword_impressions_ptitles_log/2013-03-19'
SELECT
     hp.header_date,
     hp.impression_id,
     hp.header_searchsessionid,
     hp.cached_visit_id,
     split(hp.header_servername,'[\.]')[0],
     hp.cached_ip,
     hp.header_adnode,
     IF (concat_ws(',' , collect_set(concat_ws('|', cast(hp.seller_id as STRING), cast(IF(hp.seller_pricetier IS NULL, -1L, hp.seller_pricetier) as STRING), cast(hp.seller_price as STRING), cast(IF(hp.ptitle_rank IS  NULL, -1L, hp.ptitle_rank) as STRING)))) = '-1|-1',NULL,concat_ws(',' , collect_set(concat_ws('|', cast(hp.seller_id as STRING), cast(IF(hp.seller_pricetier IS NULL, -1L, hp.seller_pricetier) as STRING), cast(hp.seller_price as STRING), cast(IF(hp.ptitle_rank IS  NULL, -1L, hp.ptitle_rank) as STRING))))),
     IF(concat_ws(',' , getValidHiddenSellers(collect_set(concat_ws('|', cast(sh.seller_id as STRING), cast(sh.ptitle_id as STRING), cast(sh.tag_id as STRING), cast(IF(sh.price_tier IS NULL, -1L, sh.price_tier) as STRING))))) = '',NULL, concat_ws(',' , getValidHiddenSellers(collect_set(concat_ws('|', cast(sh.seller_id as STRING), cast(sh.ptitle_id as STRING), cast(sh.tag_id as STRING), cast(IF(sh.price_tier IS NULL, -1L, sh.price_tier) as STRING))))))
FROM
     (SELECT
          h.header_date,
          h.header_servername,
          h.impression_id,
          h.header_searchsessionid,
          h.cached_visit_id,
          h.cached_ip,
          h.hea
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB