Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Automating the partition creation process

Copy link to this message
Automating the partition creation process

My hive table is partitioned by year, month and day. I have defined it as
external table. The M/R job correctly loads the files into the daily
subfolders. The hdfs files will be loaded to
<hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
The M/R job has some business logic in determining the values for year,
month and day; so one run might create / load files into multiple sub
-folders (multiple days). I am able to query the tables after adding
partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
the partition creation step? Basically this script needs to identify the
subfolders created by the M/R job and create corresponding ALTER TABLE ADD
PARTITION statements.

For example, say the M/R job loads files into the following 3 sub-folders


Then it should create 3 alter table statements

ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);

I thought of changing M/R jobs to load all files into same folder,
then first load the files into non-partitioned table and then to load the
partitioned table from non-partitioned table (using dynamic partition); but
would prefer to avoid that extra step if possible (esp. since data is
already in the correct sub-folders).

Any help would greately be appreciated.

Mark Grover 2013-01-29, 04:47
Dean Wampler 2013-01-29, 16:37
Sadananda Hegde 2013-01-30, 01:44
Sadananda Hegde 2013-01-30, 01:09
Mark Grover 2013-01-30, 01:17
Edward Capriolo 2013-01-30, 01:21
Sadananda Hegde 2013-01-30, 01:49
Dean Wampler 2013-01-30, 02:05
abhishek 2013-01-29, 04:47