Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> dynamic Partition not splitting properly


Copy link to this message
-
Re: dynamic Partition not splitting properly
which UDF? it does not take to_date(event_date) column
On Fri, Jun 14, 2013 at 11:54 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> use already existing UDFs to split or transform your values the way you
> want
>
>
> On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad <[EMAIL PROTECTED]>wrote:
>
>> OIC. I got it. Thanx alot nitin :). One more thing i want to ask related
>> this issue, if old table contains event_date in format "2012-06-24
>> 06:04:11.9" then how can i partition it according to date part only? As
>> partition column does not accepts to_date(event_date) form.
>>
>>
>> On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> If the input column value is NULL or empty string, the row will be put into a special partition, whose name is controlled by the hive parameter hive.exec.default.dynamic.partition.name. The default value is `__HIVE_DEFAULT_PARTITION__`. Basically this partition will contain all
>>> "bad" rows whose value are not valid partition names.
>>>
>>> so basically you do following things
>>>
>>> when you create a partitioned table, your partitioned column is normally at the end of the table, so when you are inserting data into this partitioned table, I would recommend using the column names in place select * from
>>>
>>> so your insert query should look like
>>>
>>> set hive.exec.dynamic.partition=true;
>>>
>>>
>>> set hive.exec.dynamic.partition.mode=nonstrict;
>>>
>>>
>>>
>>> insert overwrite table new_table partition(event_date) select col1, col2 .... coln, event_date from old_table;
>>>
>>>
>>>
>>> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <[EMAIL PROTECTED]>wrote:
>>>
>>>> when i browse it in browser, all the data is in *
>>>> event_date=__HIVE_DEFAULT_PARTITION__<http://10.0.0.14:50075/browseDirectory.jsp?dir=%2Fvar%2Flog%2Fpring%2Fhive%2Fwarehouse%2Fnydus.db%2Fnew_rc_partition_cluster_table%2Fevent_date%3D__HIVE_DEFAULT_PARTITION__&namenodeInfoPort=50070>
>>>> *, rest of the files does not contains data
>>>>
>>>>
>>>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> what do you mean when you say "it wont split correctly" ?
>>>>>
>>>>>
>>>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> what if i have data of more then 500 days then how can i create
>>>>>> partition on date column by specifying each and every date? (i knw that
>>>>>> does not happens in dynamic partition but on dynamic partition, it wont
>>>>>> splits correctly).
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar <[EMAIL PROTECTED]
>>>>>> > wrote:
>>>>>>
>>>>>>> you can partition existing table unless the hdfs data is laid out in
>>>>>>> partitioned fashion.
>>>>>>> your best bet is create a new partitioned table
>>>>>>> enable dynamic paritionining
>>>>>>> read from old table and write into new table
>>>>>>>
>>>>>>> you can verify the new partitions by using command "show partitions"
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> now i created partition table like
>>>>>>>> *CREATE TABLE new_rc_partition_cluster_table(
>>>>>>>>
>>>>>>>>   id int,
>>>>>>>>   event_id int,
>>>>>>>>   user_id BIGINT,
>>>>>>>>
>>>>>>>>   intval_1 int ,
>>>>>>>>   intval_2 int,
>>>>>>>>   intval_3 int,
>>>>>>>>   intval_4 int,
>>>>>>>>   intval_5 int,
>>>>>>>>   intval_6 int,
>>>>>>>>   intval_7 int,
>>>>>>>>   intval_8 int,
>>>>>>>>   intval_9 int,
>>>>>>>>   intval_10 int,
>>>>>>>>   intval_11 int,
>>>>>>>>   intval_12 int,
>>>>>>>>   intval_13 int,
>>>>>>>>   intval_14 int,
>>>>>>>>   intval_15 int,
>>>>>>>>   intval_16 int,
>>>>>>>>   intval_17 int,
>>>>>>>>   intval_18 int,
>>>>>>>>   intval_19 int,
>>>>>>>>   intval_20 int,
>>>>>>>>   intval_21 int,
>>>>>>>>   intval_22 int,
>>>>>>>>   intval_23 int,
>>>>>>>>   intval_24 int,
>>>>>>>>   intval_25 int,
>>>>>>>>   intval_26 int)
>>>>>>>>   PARTITIONED BY (event_date string)
>>>>>>>>
>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS

*Muhammad Hamza Asad*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB