-Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG
When you use the default Serde (lazySerde) and sequence files hive writes a
SequenceFile(create table x .... stored as sequence file) , the key is null
and hive serializes all the columns into a Text Writable that is easy for
other tools to read. Hive does not dictate the input format or the output
format, usually you can get hive to produce exactly what you want by mixing
and matching serde and output format options.
On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne <[EMAIL PROTECTED]>wrote:
> We have a requirement to store a large data set (more than 5TB) mapped to
> a Hive table. This Hive table would be populated (and appended
> periodically) using a Hive query from another Hive table. In addition to
> the Hive queries, we need to be able to run Java MapReduce and preferably
> Pig jobs as well on top of this data.
> I'm wondering what would be the best storage format for this Hive table.
> How easy it is to use JavaMapReduce on Hive generated sequence files (eg:
> stored as SequenceFile). How easy it is to use JavaMapReduce on RC files.
> Any pointers to examples of these would be really great. Does using
> compressed Text Files (deflate) sound like the best option for this usecase.
> BTW we are stuck with Hive 0.9 for the foreseeable future and ORC is out
> of the options.