Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Can I merge files after I loaded them into hive?


+
Cheng Su 2012-11-15, 08:03
+
Bejoy KS 2012-11-15, 08:10
+
Bejoy KS 2012-11-15, 10:08
+
Роман Павленко 2012-11-15, 10:20
Copy link to this message
-
Re: Can I merge files after I loaded them into hive?
Thank you guys.
I will try this later.
And sorry for additional questions:
if I do this, could the file become too big? Does hive have a config
to control the max file size? Does hive can automatically split files?

On Thu, Nov 15, 2012 at 6:20 PM, Роман Павленко
<[EMAIL PROTECTED]> wrote:
> Example:
> insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select
> `data`, `timestamp`, `hour`, `minute`, `second`  from my_table WHERE
> year=2012 AND month=9 AND day=4;
>
>
>
>
> 2012/11/15 Bejoy KS <[EMAIL PROTECTED]>
>>
>> Hi Chen
>>
>> You can do it in hive as well. Enable hive merge and Insert OverWrite the
>> Partition once agin with Select *.
>>
>> Hive.merge.mapfiles=true.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: "Bejoy KS" <[EMAIL PROTECTED]>
>> Date: Thu, 15 Nov 2012 08:10:12
>> To: <[EMAIL PROTECTED]>
>> Reply-To: [EMAIL PROTECTED]
>> Subject: Re: Can I merge files after I loaded them into hive?
>>
>> Hi chen
>>
>> You can use Flume for ingestion into hdfs . Flume takes care of the file
>> sizes, combines the files and stores as one large file. This is a better
>> approach.
>>
>> You can have custom MR jobs to merge these files in hdfs as well. Use
>> combineFileInputFormat and start a map only job with Identity mapper with
>> split size set to the required large file size.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Cheng Su <[EMAIL PROTECTED]>
>> Date: Thu, 15 Nov 2012 16:03:44
>> To: <[EMAIL PROTECTED]>
>> Reply-To: [EMAIL PROTECTED]
>> Subject: Can I merge files after I loaded them into hive?
>>
>> Hi, all.
>>
>> Can I merge files after I loaded them into hive?
>> This is my situation:
>>
>> There is a log table partitioned by date, which is store the nginx access
>> logs.
>> The raw log files are loaded into hive every hour.
>> By now, a single log file size is small, say 10 MB or even smaller.
>> So there are 24 small size files in one partition.
>> This is ineffective in my opinion, and will consume more hadoop heap size.
>> That's why I want to merge the small files.
>>
>> Can hive merge those files automatically?
>> Or dose hive provide some tools to merge files?
>> Or I can just use hadoop dfs -cat to do that?
>>
>> --
>>
>> Regards,
>> Cheng Su
>
>

--

Regards,
Cheng Su