Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - best way to load millions of gzip files in hdfs to one table in hive?


Copy link to this message
-
Re: best way to load millions of gzip files in hdfs to one table in hive?
Alexander Pivovarov 2012-10-02, 20:16
Options
1. create table and put files under the table dir

2. create external table and point it to files dir

3. if files are small then I recomend to create new set of files using
simple MR program and specifying number of reduce tasks. Goal is to make
files size > hdfs block size (it safes NN memory and read will be faster)
On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:

> I have millions of gzip files in hdfs (with the same fields), would like
> to load them into one table in hive with a specified schema.
> What is the most efficient ways to do that?
> Given that my data is only in hdfs, and also gzipped, does that mean I
> could just simply set up the table somehow bypassing some unnecessary
> overhead of the typical approach?
>
> Thanks!
>