Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG


+
Thilina Gunarathne 2014-01-29, 01:06
+
Edward Capriolo 2014-01-29, 02:00
Copy link to this message
-
Re: Using Hive generated SeqeunceFiles and RC files with Java MapReduce and PIG
Thanks for the information Edward.

When you use the default Serde (lazySerde) and sequence files hive writes a
> SequenceFile(create table x .... stored as sequence file) , the key is null
> and hive serializes all the columns into a Text Writable that is easy for
> other tools to read.
>
Does this mean, using the default Serde would not give us much advantages
over using a TextFile, other than the split-ability (and the compression
options due to it) of the SequenceFiles?

thanks,
Thilina
> Hive does not dictate the input format or the output format, usually you
> can get hive to produce exactly what you want by mixing and matching serde
> and output format options.
>
>
> On Tue, Jan 28, 2014 at 8:05 PM, Thilina Gunarathne <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>> We have a requirement to store a large data set (more than 5TB) mapped to
>> a Hive table. This Hive table would be populated (and appended
>> periodically) using a Hive query from another Hive table. In addition to
>> the Hive queries, we need to be able to run Java MapReduce and preferably
>> Pig jobs as well on top of this data.
>>
>> I'm wondering what would be the best storage format for this Hive table.
>> How easy it is to use JavaMapReduce on Hive generated sequence files (eg:
>> stored as SequenceFile). How easy it is to use JavaMapReduce on RC files.
>> Any pointers to examples of these would be really great. Does using
>> compressed Text Files (deflate) sound like the best option for this usecase.
>>
>> BTW we are stuck with Hive 0.9 for the foreseeable future and ORC is out
>> of the options.
>>
>> thanks,
>> Thilina
>>
>> --
>> https://www.cs.indiana.edu/~tgunarat/
>> http://www.linkedin.com/in/thilina
>> http://thilina.gunarathne.org
>>
>
>
--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB