Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Automating the partition creation process


+
Sadananda Hegde 2013-01-29, 04:05
+
Mark Grover 2013-01-29, 04:47
Copy link to this message
-
Re: Automating the partition creation process
This is very easy to do with a shell script and you can have cron run it
nightly, say just after midnight so the year, month, and day calculation
below return the correct values:

#!/bin/bash

year=$(year +%Y)
month=$(year +%m)
day=$(year +%d)

hive -e "use mydb; ALTER TABLE foo ADD IF NOT EXISTS PARTITION
(year='$year', month='$month', day='$day') location
'/path/to/mydb.db/foo/year=$year/month=$month/day=$day';"

(The last line wrapped)

If the year, month and day are integers, then omit the single quotes, i.e.,
(year=$year, month=$month, day=$day)

Including IF NOT EXISTS let's you run the script multiple times per day, if
you're paranoid... ;)

By the way, I used "use mydb; ALTER TABLE foo..." instead of "ALTER TABLE
mydb.foo..." because hive (v0.10.0) didn't like the latter.

dean

On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover
<[EMAIL PROTECTED]>wrote:

> Sadananda,
> See if this helps:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>
>
> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> My hive table is partitioned by year, month and day. I have defined it as
>> external table. The M/R job correctly loads the files into the daily
>> subfolders. The hdfs files will be loaded to
>> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
>> The M/R job has some business logic in determining the values for year,
>> month and day; so one run might create / load files into multiple sub
>> -folders (multiple days). I am able to query the tables after adding
>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
>> the partition creation step? Basically this script needs to identify the
>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
>> PARTITION statements.
>>
>> For example, say the M/R job loads files into the following 3 sub-folders
>>
>> /user/hive/warehouse/sales/year=2013/month=1/day=21
>> /user/hive/warehouse/sales/year=2013/month=1/day=22
>> /user/hive/warehouse/sales/year=2013/month=1/day=23
>>
>> Then it should create 3 alter table statements
>>
>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>  ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>>
>> I thought of changing M/R jobs to load all files into same folder,
>> then first load the files into non-partitioned table and then to load the
>> partitioned table from non-partitioned table (using dynamic partition); but
>> would prefer to avoid that extra step if possible (esp. since data is
>> already in the correct sub-folders).
>>
>> Any help would greately be appreciated.
>>
>> Regards,
>> Sadu
>>
>>
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
+
Sadananda Hegde 2013-01-30, 01:44
+
Sadananda Hegde 2013-01-30, 01:09
+
Mark Grover 2013-01-30, 01:17
+
Edward Capriolo 2013-01-30, 01:21
+
Sadananda Hegde 2013-01-30, 01:49
+
Dean Wampler 2013-01-30, 02:05
+
abhishek 2013-01-29, 04:47