Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> has bzip2 compression been deprecated?

Copy link to this message
Re: has bzip2 compression been deprecated?
Yes. Hive doesn't format data when you load it. The only exception is if you do an INSERT OVERWRITE ... .


On Jan 10, 2012, at 6:08, Tony Burton <[EMAIL PROTECTED]> wrote:

> Thanks for this Bejoy, very helpful.
> So, to summarise: when I CREATE EXTERNAL TABLE in Hive, the STORED AS, ROW FORMAT and other parameters you mention are telling Hive what to expect when it reads the data I want to analyse, despite not checking the data to see if it meets these criteria?
> Do these guidelines still apply if the table is not EXTERNAL?
> Tony
> -----Original Message-----
> From: Bejoy Ks [mailto:[EMAIL PROTECTED]]
> Sent: 09 January 2012 19:00
> Subject: Re: has bzip2 compression been deprecated?
> Hi Tony
>       As  I understand your requirement, your mapreduce job produces a
> Sequence File as ouput and you need to use this file as an input to hive
> table.
>        When you CREATE and EXTERNAL Table in hive you specify a location
> where your data is stored and also what is the format of that data( like
> the field delimiter,row delimiter, file type etc of your data). You are
> actually not loading data any where when you create a hive external
> table(issue DDL), just specifying where the data lies in file system in
> fact there is not even any validation performed that time to check on the
> data quality. When you Query/Retrive your data  through Hive QLs the
> parameters specified along with CREATE TABLE as ROW FORMAT,FILEDS
> TERMINATED, STORED AS etc are used to execute the right MAP REDUCE job(s).
>     In short STORED AS refer to the type of files that a table's data
> directory holds.
> For details
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
> Hope it helps!..
> Regards
> Bejoy.K.S
> On Mon, Jan 9, 2012 at 11:32 PM, Tony Burton <[EMAIL PROTECTED]>wrote:
>> Thanks Bejoy - I'm fairly new to Hive so may be wrong here, but I was
>> under the impression that the STORED AS part of a CREATE TABLE in Hive
>> refers to how the data in the table will be stored once the table is
>> created, rather than the compression format of the data used to populate
>> the table. Can you clarify which is the correct interpretation? If it's the
>> latter, how would I read a sequence file into a Hive table?
>> Thanks,
>> Tony
>> -----Original Message-----
>> From: Bejoy Ks [mailto:[EMAIL PROTECTED]]
>> Sent: 09 January 2012 17:33
>> Subject: Re: has bzip2 compression been deprecated?
>> Hi Tony
>>      Adding on to Harsh's comments. If you want the generated sequence
>> files to be utilized by a hive table. Define your hive table as
>> ...
>> ...
>> ....
>> Regards
>> Bejoy.K.S
>> On Mon, Jan 9, 2012 at 10:32 PM, alo.alt <[EMAIL PROTECTED]> wrote:
>>> Tony,
>>> snappy is also available:
>>> http://code.google.com/p/hadoop-snappy/
>>> best,
>>> Alex
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>> On Jan 9, 2012, at 8:49 AM, Harsh J wrote:
>>>> Tony,
>>>> * Yeah, SequenceFiles aren't human-readable, but "fs -text" can read it
>>> out (instead of a plain "fs -cat"). But if you are gonna export your
>> files
>>> into a system you do not have much control over, probably best to have
>> the
>>> resultant files not be in SequenceFile/Avro-DataFile format.
>>>> * Intermediate (M-to-R) files use a custom IFile format these days,
>>> which is built purely for that purpose.
>>>> * Hive can use SequenceFiles very well. There is also documented info
>> on
>>> this in the Hive's wiki pages (Check the DDL pages, IIRC).
>>>> On 09-Jan-2012, at 9:44 PM, Tony Burton wrote:
>>>>> Thanks for the quick reply and the clarification about the
>>> documentation.
>>>>> Regarding sequence files: am I right in thinking that they're a good