Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Noob question on creating tables


Copy link to this message
-
Re: Noob question on creating tables
Hi

CREATE EXTERNAL TABLE IF NOT EXISTS log_data(col1 datatype1, col2
datatype2, . . . colN datatypeN) PARTITIONED BY (YEAR INT, MONTH INT, DAY
INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
ALTER table log_data ADD PARTITION (YEAR=2013 , MONTH=2, DAY=27) LOCATION
'/path/to/YEAR/MONTH/DAY/directory/ON/HDFS';"

Hive will read gzip and bz2 files out of the box.(so suppose you had
hourly log files in gzip format in your /YEAR/MONTH/DAY directory then it
will be read)
Snappy and LZO will need some jar installs and configs
https://github.com/toddlipcon/hadoop-lzo

https://code.google.com/p/snappy/
Note that for example - gzip format is not splittable..so huge gzip files
without splits are not recommended as input to maps

Hope this helps

sanjay
On 3/29/13 10:19 AM, "Mark" <[EMAIL PROTECTED]> wrote:

>We have existing log data in directories in the format of YEAR/MONTH/DAY.
>
>- How can we create a table over this table without hive modifying and/or
>moving it?
>- How can we tell Hive to partition this data so it knows about each day
>of logs?
>- Does hive out of the box work with reading compressed files?
>
>Thanks
CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB