Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - External table for hourly log files

Copy link to this message
External table for hourly log files
Ian 2013-03-28, 17:29
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
    /my/logs/2013-03-08/01/000000_0    /my/logs/2013-03-08/02/000000_0
Now we want to create external table to query the log data. So we use the "Add Partition".
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?