Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Issue while quering Hive


Copy link to this message
-
Re: Issue while quering Hive
With regards to splitting an compression there are 2 options really as of now

If u r using Sequence Files , then Snappy
If u r using TXT files then LZO us great (u have to cross a few minor hoops to get LZO to work and I can provide guidance on that)

Please don't use GZ (not splittable) / or worse BZ2 (took slow to compress/decompress) - too slow for comfort

The only compelling reason u want to use GZIP as I am using in production is that my log files are MULTIPLE LINES…so if I use regular TXT files then splitting can happen between records

sanjay
From: Nitin Pawar <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, September 16, 2013 5:07 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Issue while quering Hive

As per my understanding, hadoop 1.x does not provide you any help on processing compressing files in parallel manner. (Atleast this was the case few months back).

This bzip2 splitting etc is added in hadoop2.x as per my understanding.
On Mon, Sep 16, 2013 at 5:18 PM, Garg, Rinku <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Thanks Nitin,

That way it worked, But in that case Hadoop will not be able to split my file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of my cluster's 'mapping' power. Is that rue??

Thanks & Regards,
Rinku Garg

From: Nitin Pawar [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: 16 September 2013 15:57

To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Issue while quering Hive

Does your .gz file contains the data in sequencefile ? or its a plain csv?

I think looking at the filename its a plain csv file, so I would recommend that you create a normal table with TextInputFormat (the default) and load data in the new table and give it a try.
On Mon, Sep 16, 2013 at 3:36 PM, Garg, Rinku <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Nitin,

Yes, I created the table with sequencefile.

Thanks & Regards,
Rinku Garg

From: Nitin Pawar [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: 16 September 2013 14:19
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Issue while quering Hive

Look at the error message

Caused by: java.io.IOException: hdfs://localhost:54310/user/hive/warehouse/cpj_tbl/cpj.csv.gz not a SequenceFile

Did you create table with sequencefile ?

On Mon, Sep 16, 2013 at 1:33 PM, Garg, Rinku <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi All,

I have setup Hadoop, hive setup and trying to load gzip file in hadoop cluster. Files are loaded successfully and can be view on web UI. While executing Select query it gives me the below mentioned error.

ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:nxtbig (auth:SIMPLE) cause:java.io.IOException: java.lang.reflect.InvocationTargetException
2013-09-16 09:11:18,971 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:369)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:316)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:430)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:355)
        ... 10 more
Caused by: java.io.IOException: hdfs://localhost:54310/user/hive/warehouse/cpj_tbl/cpj.csv.gz not a SequenceFile
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
        at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
        at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
        ... 15 more

Can anybody help me on this.

Thanks & Regards,
Rinku Garg
_____________
The information contained in this message
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB