Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Can I merge files after I loaded them into hive?


+
Cheng Su 2012-11-15, 08:03
+
Bejoy KS 2012-11-15, 08:10
+
Bejoy KS 2012-11-15, 10:08
Copy link to this message
-
Re: Can I merge files after I loaded them into hive?
Роман Павленко 2012-11-15, 10:20
Example:
insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select
`data`, `timestamp`, `hour`, `minute`, `second`  from my_table WHERE
year=2012 AND month=9 AND day=4;
2012/11/15 Bejoy KS <[EMAIL PROTECTED]>

> Hi Chen
>
> You can do it in hive as well. Enable hive merge and Insert OverWrite the
> Partition once agin with Select *.
>
> Hive.merge.mapfiles=true.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: "Bejoy KS" <[EMAIL PROTECTED]>
> Date: Thu, 15 Nov 2012 08:10:12
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Can I merge files after I loaded them into hive?
>
> Hi chen
>
> You can use Flume for ingestion into hdfs . Flume takes care of the file
> sizes, combines the files and stores as one large file. This is a better
> approach.
>
> You can have custom MR jobs to merge these files in hdfs as well. Use
> combineFileInputFormat and start a map only job with Identity mapper with
> split size set to the required large file size.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Cheng Su <[EMAIL PROTECTED]>
> Date: Thu, 15 Nov 2012 16:03:44
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Can I merge files after I loaded them into hive?
>
> Hi, all.
>
> Can I merge files after I loaded them into hive?
> This is my situation:
>
> There is a log table partitioned by date, which is store the nginx access
> logs.
> The raw log files are loaded into hive every hour.
> By now, a single log file size is small, say 10 MB or even smaller.
> So there are 24 small size files in one partition.
> This is ineffective in my opinion, and will consume more hadoop heap size.
> That's why I want to merge the small files.
>
> Can hive merge those files automatically?
> Or dose hive provide some tools to merge files?
> Or I can just use hadoop dfs -cat to do that?
>
> --
>
> Regards,
> Cheng Su
>
+
Cheng Su 2012-11-15, 10:41