Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> question..partitioned table...


Copy link to this message
-
Re: question..partitioned table...
The partition column is automatically included as part of the table
definition, and hence can be used like any other column.

So you can issue queries like SELECT c1, c2, c3 from callback_parts WHERE
c2='foo'; for table defined as follows:

create table callbacks_part
(
 c1 INT,
 c3 STRING
)
PARTITIONED BY(c2 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Dilip

On Fri, Aug 20, 2010 at 7:58 PM, Ronak Bhatt <[EMAIL PROTECTED]> wrote:

> I'm from oracle/sql background...the thinking there (in oracle) is -
>
> 1. one has a table with columns (c1, c2, c3)
> 2. define the partition on one of the columns (or combination of more than
> one) say c2
> 3. When data is inserted, it will be stored in right partition based on the
> value of c2.
> 4. when table is queries, if the WHERE predicate has partition column (c2
> in this case), optimizer will go to that partition without user taking any
> specific action (e.g. without user specifying the partition)
>
> whereas,
>
> In the scenario below that I mentioned, I've a text file that has one of
> the fields as exec_date ....after some more research, I also learned that in
> HIVE partition cannot be on existing column of defined table e.g. I've to
> have a column name that is not part of table..
>
> What I don't understand is that how do I write the queries so that I can
> utilize the partition if the partition column is not part of the table
> definition..
>
> Any inputs would be greatly appreciated
>
>
>
> *thanks, ronak*
> *
> *
> *408 504 4847*
> *My Blog : http://ronakbaps.posterous.com*
> *
> *
> *
> *
>
>
>
> On Fri, Aug 20, 2010 at 7:34 PM, Dilip Joseph <
> [EMAIL PROTECTED]> wrote:
>
>> You don't need to specify "EXEC_DATE STRING," in your column definition as
>> it is already specified by "PARTITIONED BY(EXEC_DATE STRING)
>>
>> Dilip
>>
>>
>> On Fri, Aug 20, 2010 at 6:41 PM, Ronak Bhatt <[EMAIL PROTECTED]> wrote:
>>
>>> for the following create table, I'm getting the error : FAILED: Error in
>>> semantic analysis: Column repeated in partitioning columns...
>>>
>>> any suggestion?
>>>
>>> thanks, ronak
>>>
>>>
>>> create table callbacks_part
>>> (
>>> ID INT,
>>> BOOKMARK_STEP_ID INT,
>>> CLIENT_ID INT,
>>> TXN_ID STRING,
>>> SESSION_ID STRING,
>>> IP_ADDRESS STRING,
>>> METHOD STRING,
>>> PAGE_ID INT,
>>> RULE_NAME_ID INT,
>>> CALLBACK_TYPE_ID INT,
>>> PAGE_ELEMENT_ID INT,
>>> CALLBACK_SENSE_ID INT,
>>> CALLBACK_INFO STRING,
>>> NUM_OCCURANCES INT,
>>> UNIQUE_ID STRING,
>>> EXEC_DATE STRING,
>>> EXEC_DURATION FLOAT
>>> )
>>> PARTITIONED BY(EXEC_DATE STRING)
>>> CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY ','
>>> STORED AS TEXTFILE;
>>>
>>>
>>>
>>>
>>>
>>> *thanks, ronak*
>>> *
>>> *
>>> *408 504 4847*
>>> *My Blog : http://ronakbaps.posterous.com*
>>> *
>>> *
>>> *
>>> *
>>>
>>>
>>
>>
>> --
>> _________________________________________
>> Dilip Antony Joseph
>> http://csgrad.blogspot.com
>> http://www.marydilip.info
>>
>
>
--
_________________________________________
Dilip Antony Joseph
http://csgrad.blogspot.com
http://www.marydilip.info