You may want to look at Dynamic partitions
From: Ian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Ian <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, March 28, 2013 10:29 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: External table for hourly log files
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
Now we want to create external table to query the log data. So we use the "Add Partition".
CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.