whenever you create a partition in hive, it needs to be registered with the
metadata store. So short answer would be partition data is looked from
metadata store instead of the actual source data.
having a lot of partitions does slow down hive (around 10000+). Normally
have not seen anyone using hourly partitions. You may want to look at
adding daily partition and bucket by hour.
but if you are adding data directly into partition directories then there
is no alternative other than adding partitions to metadata store manually
apart from alter partition.
if you are using hcatalog as metadata store then it does provide an api to
register your partition so you can automate your data loading and
registering both in a single flow.
Others will correct me if I have made any wrong assumption
On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman <[EMAIL PROTECTED]> wrote:
> Looking for some pointers on where the partitioning is figured out in the
> source when a query is executed.
> I'm investigating an alternative partitioning scheme based on date patterns
> (using external tables).
> The situation is that I have data being written to some HDFS root directory
> with some dated pattern (i.e. YYYY/MM/DD). Today I have to run an alter
> table to insert this partition every day. It gets worse if you have hourly
> partitions. This seems like it can be described once (root + date
> partition pattern in the metastore).
> So looking for some pointers on where in the code this is currently