Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Noob question on creating tables


+
Mark 2013-03-29, 17:19
+
Sanjay Subramanian 2013-03-29, 17:40
+
Mark 2013-03-29, 18:16
Copy link to this message
-
Re: Noob question on creating tables
That's a drawback of external tables, but it's actually not as difficult as
it sounds. It's easy to write a nightly "cron" job that creates the
partition for the next day (or a job per month...), if someone on your team
has some bash experience. Other job scheduling tools should support this
too. Here's an example. First, a hive script that uses parameters for the
date (Hive v0.8 or newer):

-- addlogpartition.hql
ALTER TABLE log ADD IF NOT EXISTS PARTITION (year = ${YEAR}, month ${MONTH}, day = ${DAY});

Then, run this bash script AFTER MIDNIGHT:

#!/bin/bash
YEAR=$(date +%Y)       # returns the string "2013" today.
MONTH=$(date +%m)   # returns the string "03" today, with the leading zero.
DAY=$(date +%d)          # returns the string "29" today. Will prefix with
0 for dates < 10.

# Assumes /path/to/2013/03/29 is the correct directory name:
/path/to/hive -f /path/to/addlogpartition.hql -d YEAR=$YEAR -d MON=$MONTH
-d DAY=$DAY
(Of course, all the /path/to will be different...)

So, be careful of how how "03" vs. "3" is handled in both the ALTER
TABLE statement and the path. Off hand, I don't know if Hive will complain
if you use 03 as an integer value in the ALTER TABLE statement.
On Fri, Mar 29, 2013 at 1:16 PM, Mark <[EMAIL PROTECTED]> wrote:

> Thanks
>
> Does this mean I need to create a partition for each day manually? There
> is no way to have infer that from my directory structure?
>
> On Mar 29, 2013, at 10:40 AM, Sanjay Subramanian <
> [EMAIL PROTECTED]> wrote:
>
> > Hi
> >
> > CREATE EXTERNAL TABLE IF NOT EXISTS log_data(col1 datatype1, col2
> > datatype2, . . . colN datatypeN) PARTITIONED BY (YEAR INT, MONTH INT, DAY
> > INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
> >
> >
> > ALTER table log_data ADD PARTITION (YEAR=2013 , MONTH=2, DAY=27) LOCATION
> > '/path/to/YEAR/MONTH/DAY/directory/ON/HDFS';"
> >
> > Hive will read gzip and bz2 files out of the box.(so suppose you had
> > hourly log files in gzip format in your /YEAR/MONTH/DAY directory then it
> > will be read)
> > Snappy and LZO will need some jar installs and configs
> > https://github.com/toddlipcon/hadoop-lzo
> >
> > https://code.google.com/p/snappy/
> >
> >
> > Note that for example - gzip format is not splittable..so huge gzip files
> > without splits are not recommended as input to maps
> >
> > Hope this helps
> >
> > sanjay
> >
> >
> > On 3/29/13 10:19 AM, "Mark" <[EMAIL PROTECTED]> wrote:
> >
> >> We have existing log data in directories in the format of
> YEAR/MONTH/DAY.
> >>
> >> - How can we create a table over this table without hive modifying
> and/or
> >> moving it?
> >> - How can we tell Hive to partition this data so it knows about each day
> >> of logs?
> >> - Does hive out of the box work with reading compressed files?
> >>
> >> Thanks
> >
> >
> > CONFIDENTIALITY NOTICE
> > =====================> > This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
> >
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
+
Sanjay Subramanian 2013-03-29, 19:45
+
Mark 2013-03-29, 20:25
+
Sanjay Subramanian 2013-03-29, 20:38
+
Dean Wampler 2013-03-29, 17:32
+
Mark 2013-03-29, 18:16
+
Nitin Pawar 2013-03-29, 18:36