|
|
-
Partitioning EXTERNAL TABLE without copying or moving files
Vince Hoang 2011-12-08, 20:46
Hi,
I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder full of web server logs dated back several months.
Is possible to partition an EXTERNAL TABLE without copying/moving files or altering the layout of the directory?
For example, in HDFS, I have:
> /logs/log-2011-09-01 > /logs/log-2011-09-02 > … > /logs/log-2011-12-01
I'd like to know if it's possible to partition the EXTERNAL TABLE by date without having to create subdirectories:
> /logs/2011-09-01/log-2011-09-01 > /logs/2011-09-02/log-2011-09-02 > … > /logs/2011-12-01/log-2011-12-01
Is it possible?
Thanks, Vince The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
-
RE: Partitioning EXTERNAL TABLE without copying or moving files
Tucker, Matt 2011-12-08, 21:25
Hi Vince,
External tables shouldn't issue copy or move commands to your data files. You should define the base table location to '/logs', and issue alter table commands to add partitions for each date.
Example:
CREATE EXTERNAL TABLE logs ( Data STRING ) PARTITIONED BY (cal_date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' LOCATION '/logs';
ALTER TABLE logs ADD IF NOT EXISTS PARTITION (cal_date = '2011-09-01') LOCATION 'log-2011-09-01';
Matt Tucker Associate eBusiness Analyst Walt Disney Parks and Resorts Online Ph: 407-566-2545 Tie: 8-296-2545
From: Vince Hoang [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 08, 2011 3:47 PM To: [EMAIL PROTECTED] Subject: Partitioning EXTERNAL TABLE without copying or moving files
Hi,
I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder full of web server logs dated back several months.
Is possible to partition an EXTERNAL TABLE without copying/moving files or altering the layout of the directory?
For example, in HDFS, I have:
> /logs/log-2011-09-01 > /logs/log-2011-09-02 > ... > /logs/log-2011-12-01
I'd like to know if it's possible to partition the EXTERNAL TABLE by date without having to create subdirectories:
> /logs/2011-09-01/log-2011-09-01 > /logs/2011-09-02/log-2011-09-02 > ... > /logs/2011-12-01/log-2011-12-01
Is it possible?
Thanks, Vince The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
-
Re: Partitioning EXTERNAL TABLE without copying or moving files
Vince Hoang 2011-12-08, 23:32
Hi Matt
Thanks for the response. We tried the example you provided without success. When we tried to add a partition by specifying the location as a file (log-2011-09-01.log), Hive complained with "Parent path is not a directory". I think Hive expects a directory.
Our directory structure, again, is: /logs/log-2011-09-01.log /logs/log-2011-09-02.log
Thanks, Vince
From: "Tucker, Matt" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Thu, 8 Dec 2011 16:25:25 -0500 To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: RE: Partitioning EXTERNAL TABLE without copying or moving files
Hi Vince,
External tables shouldn’t issue copy or move commands to your data files. You should define the base table location to ‘/logs’, and issue alter table commands to add partitions for each date.
Example:
CREATE EXTERNAL TABLE logs ( Data STRING ) PARTITIONED BY (cal_date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ LINES TERMINATED BY ‘\n’ LOCATION ‘/logs’;
ALTER TABLE logs ADD IF NOT EXISTS PARTITION (cal_date = ‘2011-09-01’) LOCATION ‘log-2011-09-01’;
Matt Tucker Associate eBusiness Analyst Walt Disney Parks and Resorts Online Ph: 407-566-2545 Tie: 8-296-2545
From: Vince Hoang [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 08, 2011 3:47 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Partitioning EXTERNAL TABLE without copying or moving files
Hi,
I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder full of web server logs dated back several months.
Is possible to partition an EXTERNAL TABLE without copying/moving files or altering the layout of the directory?
For example, in HDFS, I have:
> /logs/log-2011-09-01 > /logs/log-2011-09-02 > … > /logs/log-2011-12-01
I'd like to know if it's possible to partition the EXTERNAL TABLE by date without having to create subdirectories:
> /logs/2011-09-01/log-2011-09-01 > /logs/2011-09-02/log-2011-09-02 > … > /logs/2011-12-01/log-2011-12-01
Is it possible?
Thanks, Vince The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you. The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
-
Re: Partitioning EXTERNAL TABLE without copying or moving files
Jasper Knulst 2011-12-08, 23:59
Hi Vince,
Hive partitioning can only exist by issueing new directories in HDFS. There is no way to partition the data in a Hive table without adding extra filepaths/dirs in HDFS.
For an external table you have to redistribute the data yourself in corresponding filepaths and add the new partition based on that by editing the Hive metadata.
Cheers Jasper Op 8 dec. 2011 21:47 schreef "Vince Hoang" <[EMAIL PROTECTED]> het volgende:
> Hi, > > I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder full > of web server logs dated back several months. > > Is possible to partition an EXTERNAL TABLE without copying/moving files > or altering the layout of the directory? > > For example, in HDFS, I have: > > > /logs/log-2011-09-01 > > /logs/log-2011-09-02 > > … > > /logs/log-2011-12-01 > > I'd like to know if it's possible to partition the EXTERNAL TABLE by > date without having to create subdirectories: > > > /logs/2011-09-01/log-2011-09-01 > > /logs/2011-09-02/log-2011-09-02 > > … > > /logs/2011-12-01/log-2011-12-01 > > Is it possible? > > Thanks, > Vince > > > The contents of this message, together with any attachments, are intended > only for the use of the individual or entity to which they are addressed > and may contain information that is confidential and exempt from > disclosure. If you are not the intended recipient, you are hereby notified > that any dissemination, distribution, or copying of this message, or any > attachment, is strictly prohibited. If you have received this message in > error, please notify the original sender immediately by telephone or by > return E-mail and delete this message, along with any attachments, from > your computer. Thank you. >
-
Re: Partitioning EXTERNAL TABLE without copying or moving files
Aniket Mokashi 2011-12-09, 02:17
It is a hadoop limitation. hdfs move operation is inexpensive. I am assuming that is not an option to you because you want to save the path structure (for some backward compatibility sake).
Something like symbolic links (i think its not supported in 0.20, not sure) or path filter might help. But, it would be a hack.
Thanks, Aniket
On Thu, Dec 8, 2011 at 3:59 PM, Jasper Knulst <[EMAIL PROTECTED]>wrote:
> Hi Vince, > > Hive partitioning can only exist by issueing new directories in HDFS. > There is no way to partition the data in a Hive table without adding extra > filepaths/dirs in HDFS. > > For an external table you have to redistribute the data yourself in > corresponding filepaths and add the new partition based on that by editing > the Hive metadata. > > Cheers Jasper > Op 8 dec. 2011 21:47 schreef "Vince Hoang" <[EMAIL PROTECTED]> het > volgende: > > Hi, >> >> I am running Hive 0.7.0 with Hadoop 0.20.2. I have one HDFS folder >> full of web server logs dated back several months. >> >> Is possible to partition an EXTERNAL TABLE without copying/moving files >> or altering the layout of the directory? >> >> For example, in HDFS, I have: >> >> > /logs/log-2011-09-01 >> > /logs/log-2011-09-02 >> > … >> > /logs/log-2011-12-01 >> >> I'd like to know if it's possible to partition the EXTERNAL TABLE by >> date without having to create subdirectories: >> >> > /logs/2011-09-01/log-2011-09-01 >> > /logs/2011-09-02/log-2011-09-02 >> > … >> > /logs/2011-12-01/log-2011-12-01 >> >> Is it possible? >> >> Thanks, >> Vince >> >> >> The contents of this message, together with any attachments, are intended >> only for the use of the individual or entity to which they are addressed >> and may contain information that is confidential and exempt from >> disclosure. If you are not the intended recipient, you are hereby notified >> that any dissemination, distribution, or copying of this message, or any >> attachment, is strictly prohibited. If you have received this message in >> error, please notify the original sender immediately by telephone or by >> return E-mail and delete this message, along with any attachments, from >> your computer. Thank you. >> > -- "...:::Aniket:::... Quetzalco@tl"
|
|