|
|
-
File Path and Partition names
carla.staeben@... 2012-10-02, 10:55
Quick question about using hive to create new hdfs file paths.
Generally speaking, we like to keep our data files with a path similar to
Dataset/year/month/day/hour
I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table
(field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
And then do an insert overwrite into, I end up with this path in hdfs:
Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07
Is there an *easy* way to remove the partition name from the creation of the hdfs path?
Thanks Carla
+
carla.staeben@... 2012-10-02, 10:55
-
Re: File Path and Partition names
Bejoy KS 2012-10-02, 12:54
Hi Carla
If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using
'Alter Table Add Parition ...' Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2012 10:55:19 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: File Path and Partition names
Quick question about using hive to create new hdfs file paths.
Generally speaking, we like to keep our data files with a path similar to
Dataset/year/month/day/hour
I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table
(field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
And then do an insert overwrite into, I end up with this path in hdfs:
Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07
Is there an *easy* way to remove the partition name from the creation of the hdfs path?
Thanks Carla
+
Bejoy KS 2012-10-02, 12:54
-
RE: File Path and Partition names
carla.staeben@... 2012-10-02, 12:56
Thanks Bejoy, I was kind of hoping to avoid all of the 'extra' work...it would be nice if hive didn't include the partition name in the path creation...I was hoping that there was a 'set' parameter/config I was missing.
Thanks Carla
From: ext Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 02, 2012 08:54 To: [EMAIL PROTECTED] Subject: Re: File Path and Partition names
Hi Carla
If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using
'Alter Table Add Parition ...' Regards Bejoy KS
Sent from handheld, please excuse typos. ________________________________ From: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Tue, 2 Oct 2012 10:55:19 +0000 To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> ReplyTo: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: File Path and Partition names
Quick question about using hive to create new hdfs file paths.
Generally speaking, we like to keep our data files with a path similar to
Dataset/year/month/day/hour
I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this: CREATE EXTERNAL TABLE IF NOT EXISTS new_table
(field1 string ,field2 string ,field3 string ) partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
And then do an insert overwrite into, I end up with this path in hdfs:
Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07
Is there an *easy* way to remove the partition name from the creation of the hdfs path?
Thanks Carla
+
carla.staeben@... 2012-10-02, 12:56
-
Re: File Path and Partition names
Doug Houck 2012-10-02, 13:10
Hi Carla, I assume you are using dynamic partitioning for this, correct??
Assuming so, I have the same question and am trying to figure it out, and will let you know if I do.
If you are using static partitions, you just need to specify the location on the 'alter table' command when the partition(s) is/are added...
alter table my table add if not exists partition(year=2012,month=10,day=02) location '2012/10/02';
Again, I have not yet figured out if I can get this to occur with dynamic partitions.
----- Original Message ----- From: "carla staeben" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], "bejoy ks" <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 8:56:50 AM Subject: RE: File Path and Partition names Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra�� work…it would be nice if hive didn’t include the partition name in the path creation…I was hoping that there was a ��set’ parameter/config I was missing.
Thanks
Carla
From: ext Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 02, 2012 08:54 To: [EMAIL PROTECTED] Subject: Re: File Path and Partition names
Hi Carla
If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using
'Alter Table Add Parition ...' Regards Bejoy KS
Sent from handheld, please excuse typos. From: < [EMAIL PROTECTED] > Date: Tue, 2 Oct 2012 10:55:19 +0000 To: < [EMAIL PROTECTED] > ReplyTo: [EMAIL PROTECTED] Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths.
Generally speaking, we like to keep our data files with a path similar to
Dataset/year/month/day/hour
I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this:
CREATE EXTERNAL TABLE IF NOT EXISTS new_table
(field1 string
,field2 string
,field3 string
)
partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
And then do an insert overwrite into, I end up with this path in hdfs:
Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07
Is there an * easy * way to remove the partition name from the creation of the hdfs path?
Thanks
Carla
+
Doug Houck 2012-10-02, 13:10
-
RE: File Path and Partition names
carla.staeben@... 2012-10-02, 13:16
Yep, dynamic.
Let me know if you figure something out. I'd hate to have to go through all of the trouble to etl the data and then create tables on top with the alter table command. Such a waste of time and effort...
Carla
-----Original Message----- From: ext Doug Houck [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 02, 2012 09:11 To: [EMAIL PROTECTED] Cc: bejoy ks; Staeben Carla (Nokia-LC/Boston) Subject: Re: File Path and Partition names
Hi Carla, I assume you are using dynamic partitioning for this, correct??
Assuming so, I have the same question and am trying to figure it out, and will let you know if I do.
If you are using static partitions, you just need to specify the location on the 'alter table' command when the partition(s) is/are added...
alter table my table add if not exists partition(year=2012,month=10,day=02) location '2012/10/02';
Again, I have not yet figured out if I can get this to occur with dynamic partitions.
----- Original Message ----- From: "carla staeben" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], "bejoy ks" <[EMAIL PROTECTED]> Sent: Tuesday, October 2, 2012 8:56:50 AM Subject: RE: File Path and Partition names Thanks Bejoy, I was kind of hoping to avoid all of the ‘extra’ work…it would be nice if hive didn’t include the partition name in the path creation…I was hoping that there was a ‘set’ parameter/config I was missing.
Thanks
Carla
From: ext Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 02, 2012 08:54 To: [EMAIL PROTECTED] Subject: Re: File Path and Partition names
Hi Carla
If you like to have your custom directory structure for your partitions. You can create dirs in hdfs of your choice , load data into them (If from another hive table then you can use 'Insert Overwrite Directory..' To populate an hdfs dir). Now you need to register this dir as a new partition on to required table using
'Alter Table Add Parition ...' Regards Bejoy KS
Sent from handheld, please excuse typos. From: < [EMAIL PROTECTED] > Date: Tue, 2 Oct 2012 10:55:19 +0000 To: < [EMAIL PROTECTED] > ReplyTo: [EMAIL PROTECTED] Subject: File Path and Partition names Quick question about using hive to create new hdfs file paths.
Generally speaking, we like to keep our data files with a path similar to
Dataset/year/month/day/hour
I need to create a new table in hive and populate it with data from a different dataset, using a HiveQL query. If I do this:
CREATE EXTERNAL TABLE IF NOT EXISTS new_table
(field1 string
,field2 string
,field3 string
)
partitioned by (reg_yr string, reg_mon string, reg_day string, reg_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
And then do an insert overwrite into, I end up with this path in hdfs:
Dataset/reg_year=2012/reg_mon=10/reg_day=02/reg_hour=07
Is there an * easy * way to remove the partition name from the creation of the hdfs path?
Thanks
Carla
+
carla.staeben@... 2012-10-02, 13:16
|
|