Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Can I merge files after I loaded them into hive?


Copy link to this message
-
Re: Can I merge files after I loaded them into hive?
Example:
insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select
`data`, `timestamp`, `hour`, `minute`, `second`  from my_table WHERE
year=2012 AND month=9 AND day=4;
2012/11/15 Bejoy KS <[EMAIL PROTECTED]>

> Hi Chen
>
> You can do it in hive as well. Enable hive merge and Insert OverWrite the
> Partition once agin with Select *.
>
> Hive.merge.mapfiles=true.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: "Bejoy KS" <[EMAIL PROTECTED]>
> Date: Thu, 15 Nov 2012 08:10:12
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Can I merge files after I loaded them into hive?
>
> Hi chen
>
> You can use Flume for ingestion into hdfs . Flume takes care of the file
> sizes, combines the files and stores as one large file. This is a better
> approach.
>
> You can have custom MR jobs to merge these files in hdfs as well. Use
> combineFileInputFormat and start a map only job with Identity mapper with
> split size set to the required large file size.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Cheng Su <[EMAIL PROTECTED]>
> Date: Thu, 15 Nov 2012 16:03:44
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Can I merge files after I loaded them into hive?
>
> Hi, all.
>
> Can I merge files after I loaded them into hive?
> This is my situation:
>
> There is a log table partitioned by date, which is store the nginx access
> logs.
> The raw log files are loaded into hive every hour.
> By now, a single log file size is small, say 10 MB or even smaller.
> So there are 24 small size files in one partition.
> This is ineffective in my opinion, and will consume more hadoop heap size.
> That's why I want to merge the small files.
>
> Can hive merge those files automatically?
> Or dose hive provide some tools to merge files?
> Or I can just use hadoop dfs -cat to do that?
>
> --
>
> Regards,
> Cheng Su
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB