We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
Now we want to create external table to query the log data. So we use the "Add Partition".
CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
Sanjay Subramanian 2013-03-28, 17:41
Ian 2013-03-28, 18:26