Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> External table for hourly log files


Copy link to this message
-
External table for hourly log files
Hi,
 
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
    /my/logs/2013-03-08/01/000000_0    /my/logs/2013-03-08/02/000000_0
    /my/logs/2013-03-08/03/000000_0
    ...
 
Now we want to create external table to query the log data. So we use the "Add Partition".
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
 
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
 
Thanks
+
Sanjay Subramanian 2013-03-28, 17:41
+
Ian 2013-03-28, 18:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB