Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Can I merge files after I loaded them into hive?


+
Cheng Su 2012-11-15, 08:03
Copy link to this message
-
Re: Can I merge files after I loaded them into hive?
Hi chen

You can use Flume for ingestion into hdfs . Flume takes care of the file sizes, combines the files and stores as one large file. This is a better approach.

You can have custom MR jobs to merge these files in hdfs as well. Use combineFileInputFormat and start a map only job with Identity mapper with split size set to the required large file size.
Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Cheng Su <[EMAIL PROTECTED]>
Date: Thu, 15 Nov 2012 16:03:44
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Can I merge files after I loaded them into hive?

Hi, all.

Can I merge files after I loaded them into hive?
This is my situation:

There is a log table partitioned by date, which is store the nginx access logs.
The raw log files are loaded into hive every hour.
By now, a single log file size is small, say 10 MB or even smaller.
So there are 24 small size files in one partition.
This is ineffective in my opinion, and will consume more hadoop heap size.
That's why I want to merge the small files.

Can hive merge those files automatically?
Or dose hive provide some tools to merge files?
Or I can just use hadoop dfs -cat to do that?

--

Regards,
Cheng Su
+
Bejoy KS 2012-11-15, 10:08
+
Роман Павленко 2012-11-15, 10:20
+
Cheng Su 2012-11-15, 10:41