Sadananda Hegde 2013-01-29, 04:05
Mark Grover 2013-01-29, 04:47
Dean Wampler 2013-01-29, 16:37
Sadananda Hegde 2013-01-30, 01:44
Sadananda Hegde 2013-01-30, 01:09
-Re: Automating the partition creation process
Mark Grover 2013-01-30, 01:17
Sorry to hear that.
It got committed, don't worry about the "ABORTED". Here is the commit on
However, there is no Apache Hive release with that patch in it.
You have two options:
1. Download the patch, rebuild hive and use it
2. Find a hacky way to recover your partitions when they are empty and
populate them later.
Sorry for the inconvenience.
On Tue, Jan 29, 2013 at 5:09 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
> Thanks Mark,
> Recover partition feature will satisfy my needs; but MSCK Repair Partition
> < tablename> option is not working for me. It does not give any error; but
> does not add any partitions either. It looks like it adds partitions only
> when the sub-folder is empty; but not when the sub-folder has the data
> files. I see a fix to this issue here.
> But probably it's not commited yet, since the final result says 'ABORTED".
> On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover <[EMAIL PROTECTED]
> > wrote:
>> See if this helps:
>> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>> My hive table is partitioned by year, month and day. I have defined it
>>> as external table. The M/R job correctly loads the files into the daily
>>> subfolders. The hdfs files will be loaded to
>>> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
>>> The M/R job has some business logic in determining the values for year,
>>> month and day; so one run might create / load files into multiple sub
>>> -folders (multiple days). I am able to query the tables after adding
>>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
>>> the partition creation step? Basically this script needs to identify the
>>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
>>> PARTITION statements.
>>> For example, say the M/R job loads files into the following 3 sub-folders
>>> Then it should create 3 alter table statements
>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>>> I thought of changing M/R jobs to load all files into same folder,
>>> then first load the files into non-partitioned table and then to load the
>>> partitioned table from non-partitioned table (using dynamic partition); but
>>> would prefer to avoid that extra step if possible (esp. since data is
>>> already in the correct sub-folders).
>>> Any help would greately be appreciated.
Edward Capriolo 2013-01-30, 01:21
Sadananda Hegde 2013-01-30, 01:49
Dean Wampler 2013-01-30, 02:05
abhishek 2013-01-29, 04:47