Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Automating the partition creation process


Copy link to this message
-
Re: Automating the partition creation process
Sadananda,

Look at Oozie workflow.

Regards
Abhishek

On Jan 28, 2013, at 11:05 PM, Sadananda Hegde <[EMAIL PROTECTED]> wrote:

> Hello,
>  
> My hive table is partitioned by year, month and day. I have defined it as external table. The M/R job correctly loads the files into the daily subfolders. The hdfs files will be loaded to <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs. The M/R job has some business logic in determining the values for year, month and day; so one run might create / load files into multiple sub -folders (multiple days). I am able to query the tables after adding partitions using ALTER TABLE ADD PARTITION statement. But how do I automate the partition creation step? Basically this script needs to identify the subfolders created by the M/R job and create corresponding ALTER TABLE ADD PARTITION statements.
>  
> For example, say the M/R job loads files into the following 3 sub-folders
>  
> /user/hive/warehouse/sales/year=2013/month=1/day=21
> /user/hive/warehouse/sales/year=2013/month=1/day=22
> /user/hive/warehouse/sales/year=2013/month=1/day=23
>  
> Then it should create 3 alter table statements
>  
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>  
> I thought of changing M/R jobs to load all files into same folder, then first load the files into non-partitioned table and then to load the partitioned table from non-partitioned table (using dynamic partition); but would prefer to avoid that extra step if possible (esp. since data is already in the correct sub-folders).
>  
> Any help would greately be appreciated.
>  
> Regards,
> Sadu
>  
>  
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB