Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> best way to load millions of gzip files in hdfs to one table in hive?


Copy link to this message
-
Re: best way to load millions of gzip files in hdfs to one table in hive?
Options
1. create table and put files under the table dir

2. create external table and point it to files dir

3. if files are small then I recomend to create new set of files using
simple MR program and specifying number of reduce tasks. Goal is to make
files size > hdfs block size (it safes NN memory and read will be faster)
On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:

> I have millions of gzip files in hdfs (with the same fields), would like
> to load them into one table in hive with a specified schema.
> What is the most efficient ways to do that?
> Given that my data is only in hdfs, and also gzipped, does that mean I
> could just simply set up the table somehow bypassing some unnecessary
> overhead of the typical approach?
>
> Thanks!
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB