Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - how to load data to partitioned table


+
Daniel,Wu 2011-08-12, 06:30
+
wd 2011-08-12, 06:33
+
Vikas Srivastava 2011-08-12, 12:01
+
bejoy_ks@... 2011-08-12, 13:58
+
hadoopman 2011-08-14, 14:57
Copy link to this message
-
Re: how to load data to partitioned table
bejoy_ks@... 2011-08-14, 16:15
Ya I very much agree with you on those lines. Using the basic stuff would literally run into memory issues  with large datasets. I had some of those resolved by using the DISTRIBUTE BY clause and so. In short a little work around over your hive queries could help you out in some cases.
Regards
Bejoy K S

-----Original Message-----
From: hadoopman <[EMAIL PROTECTED]>
Date: Sun, 14 Aug 2011 08:57:12
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: how to load data to partitioned table

Something else I've noticed is when loading LOTS of historical data, if
you can try to say load a month of data at a time, try to just load THAT
month of data and only that month.  I've been able to load several years
of data (depending on the data) at a single load however there have been
times when loading a large dataset that I would run into memory issues
during the reduce phase (usually during shuffle/sort).  Things from out
of memory to stack overflow messages (I've compiled a list of the more
fun ones).

Then I noticed that only loading data from say a single month loaded
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!

On 08/12/2011 07:58 AM, [EMAIL PROTECTED] wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition
> based hive table from any input file the most hassle free approach
> would be.
> 1. Load the data into a non partitioned table that shares similar
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data
> partitions and distribute data accordingly.
>
> A similar implementation is described in the blog post
> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>
> Hope it helps
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <[EMAIL PROTECTED]>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: how to load data to partitioned table
>
> Hey ,
>
> Simpley you have run query like this
>
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key)
> SELECT *
>
>
> Regards
> Vikas Srivastava
>
>
> 2011/8/12 Daniel,Wu <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>
>
>     However it doesn't work also. please help.
>
>
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
+
hadoopman 2011-08-14, 23:22
+
Aggarwal, Vaibhav 2011-08-12, 17:18