|
|
-
partition column not allowed in clustered by clause ?
ameet chaubal 2012-05-10, 18:24
Hi All,
I am not able to create a table with partition column also included in the clustered by clause. create table abc ( col1, col2, col3 ) partitioned by (col3) clustered by (col1,col3) sorted by (col1,col3) into 10 buckets;
fails with : FAILED: Error in semantic analysis: Invalid column reference
Any reason why this is the case? Sincerely, Ameet
-
Re: partition column not allowed in clustered by clause ?
Mark Grover 2012-05-10, 18:49
Hi Ameet, That's the correct behaviour.
In Hive, clustering and sorting happens within a partition. Inside each partition, there is only one value associated with the partition column therefore it would have no impact on clustering and sorting. Therefore, putting the partition column in clustered by/sorted by in the create table statement doesn't make sense.
Also, your create table statement should be something like (note the removal of col3 from the column list): create table abc ( col1 string, col2 string) partitioned by (col3 string) clustered by (col1) sorted by (col1) into 10 buckets;
Mark
Mark Grover, Business Intelligence Analyst OANDA Corporation
www: oanda.com www: fxtrade.com
----- Original Message ----- From: "ameet chaubal" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, May 10, 2012 2:24:02 PM Subject: partition column not allowed in clustered by clause ?
Hi All, I am not able to create a table with partition column also included in the clustered by clause. create table abc ( col1, col2, col3 ) partitioned by (col3) clustered by (col1,col3) sorted by (col1,col3) into 10 buckets; fails with : FAILED: Error in semantic analysis: Invalid column reference Any reason why this is the case?
Sincerely, Ameet
-
Re: partition column not allowed in clustered by clause ?
ameet chaubal 2012-05-10, 18:59
sure, makes sense.
here's what I was trying to achieve - a sort merge bucket join. as I understand, to get an SMB join, the bucket columns have to be the join key columns. I am joining two tables- the key for each row in those tables is a composite key= acct + date. now, does that mean, I cannot partition this table based on date? If so, am I going to lose the benefit of partition pruning or does the SMB provides more benefits offsetting this situation? Sincerely, Ameet ________________________________ Hi Ameet, That's the correct behaviour.
In Hive, clustering and sorting happens within a partition. Inside each partition, there is only one value associated with the partition column therefore it would have no impact on clustering and sorting. Therefore, putting the partition column in clustered by/sorted by in the create table statement doesn't make sense.
Also, your create table statement should be something like (note the removal of col3 from the column list): create table abc ( col1 string, col2 string) partitioned by (col3 string) clustered by (col1) sorted by (col1) into 10 buckets;
Mark
|
|