Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> External table for hourly log files


Copy link to this message
-
External table for hourly log files
Hi,
 
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
    /my/logs/2013-03-08/01/000000_0    /my/logs/2013-03-08/02/000000_0
    /my/logs/2013-03-08/03/000000_0
    ...
 
Now we want to create external table to query the log data. So we use the "Add Partition".
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
 
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
 
Thanks