Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> hive - snappy and sequence file vs RC file


+
Chalcy Raja 2012-06-26, 13:05
+
Bejoy Ks 2012-06-26, 13:21
Copy link to this message
-
RE: hive - snappy and sequence file vs RC file
Thanks! Bejoy. I'll let you know which way we are going.

Thanks,
Chalcy

From: Bejoy Ks [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 26, 2012 9:22 AM
To: [EMAIL PROTECTED]
Subject: Re: hive - snappy and sequence file vs RC file

Hi Chalcy

AFAIK, RC File format is good when your queries deal with some specific columns and not on the whole data in a row. For a general purpose, Sequence File is a better choice. Also it is widely adopted, so more tools will have support for Sequence Files.

Regards
Bejoy KS

________________________________
From: Chalcy Raja <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Tuesday, June 26, 2012 6:35 PM
Subject: hive - snappy and sequence file vs RC file

Hi Hive users,

We are going to use snappy for compression.

What is the best file format, sequence file or RC file?  Both are splittable and therefore will work well for us.  RC file performance seems to be better than Sequence file.  Sqoop, looks like, may support --as-sequencefile tag sometime in the future, but RC file is not listed in sqoop import.

Any input on this is highly appreciated.

Thanks,
Chalcy

-----Original Message-----
From: Chalcy Raja [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, June 19, 2012 8:23 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>; '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Describe formatted tablename is a great DDL.

For one table sqoop imported into hive table as sequence file, I see the metadata starts with "SEQ-!".

I created another table like the one which shows SEQ in the metafile and loaded data into this table and I do not see SEQ in the meta data.  I'll try head command and see what is going on.

Thanks,
Chalcy

-----Original Message-----
From: Bejoy KS [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Tuesday, June 19, 2012 2:59 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Hi Chalcy

When you create a table you do specify the format of how the data is stored in hdfs. This value can be determined using , describe extended or describe formatted at any later point.

Try out
Describe formatted <tableName>;

To ensure the file in hdfs is in SequenceFileFormat, you can check the metadata. Meta data contains information like the compression codec used etc, from the first few characters of a Sequence file . Try linux head command on the sequence file to get those details.
Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Chalcy Raja <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Tue, 19 Jun 2012 01:32:52
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>; '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'<[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index  automatically

I did figure out how to compress data from an uncomressed data in hive table.  I also created a table as sequence file format.

Is there a way to know if a hive table (hdfs file underneath) is in sequence file format?  Describe extended table does not give the file format.

Thanks,
Chalcy

-----Original Message-----
From: Chalcy Raja [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, June 18, 2012 3:28 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>; '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Snappy with sequence file works well for us.  We'll have to decide which one suits our needs.

Is there a way to convert exiting hdfs in text format to convert to sequence files?

Thanks for all your input,
Chalcy

From: Chalcy Raja [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, June 18, 2012 1:47 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>; '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

It is there.  I have io.compression.codecs in core-site.xml.  There is not error or warn in the sqoop to hive import which indicates anything.

The only reason we want to go to lzo is because snappy is not splittable.

Thanks,
Chalcy

From: Bejoy KS [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Monday, June 18, 2012 10:39 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

Hi Chalcy

Lzo indexing not working, Is Lzo codec class available in 'io.compression.codec' property in core-site.xml?

Snappy is not splittable on its own. But sequence files are splittable so when used together snappy gains the advantage of splittability.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

From: Chalcy Raja <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Mon, 18 Jun 2012 14:31:36
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>; '[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>'<[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: RE: sqoop, hive and lzo and cdh3u3 - not creating in index  automatically

Hi Bejoy,

The weird thing is I did not get any errors.  The sqoop import will not go to the second phase where it creates lzo index.

We did deploy the native libraries, except hadoop-lzo lib which we copied after we built in another machine.  We did the same thing on the test machine also.

I'll try snappy with sequence file also.  Will snappy with sequence file is naturally splitta
+
Owen OMalley 2012-06-26, 16:49
+
yongqiang he 2012-06-27, 04:40
+
Chalcy Raja 2012-06-27, 23:01