-Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG
Thilina Gunarathne 2014-01-29, 02:37
Thanks for the information Edward.
When you use the default Serde (lazySerde) and sequence files hive writes a
> SequenceFile(create table x .... stored as sequence file) , the key is null
> and hive serializes all the columns into a Text Writable that is easy for
> other tools to read.
Does this mean, using the default Serde would not give us much advantages
over using a TextFile, other than the split-ability (and the compression
options due to it) of the SequenceFiles?
> Hive does not dictate the input format or the output format, usually you
> can get hive to produce exactly what you want by mixing and matching serde
> and output format options.
> On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne <[EMAIL PROTECTED]>wrote:
>> We have a requirement to store a large data set (more than 5TB) mapped to
>> a Hive table. This Hive table would be populated (and appended
>> periodically) using a Hive query from another Hive table. In addition to
>> the Hive queries, we need to be able to run Java MapReduce and preferably
>> Pig jobs as well on top of this data.
>> I'm wondering what would be the best storage format for this Hive table.
>> How easy it is to use JavaMapReduce on Hive generated sequence files (eg:
>> stored as SequenceFile). How easy it is to use JavaMapReduce on RC files.
>> Any pointers to examples of these would be really great. Does using
>> compressed Text Files (deflate) sound like the best option for this usecase.
>> BTW we are stuck with Hive 0.9 for the foreseeable future and ORC is out
>> of the options.