Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> how to load data to partitioned table


Copy link to this message
-
Re: how to load data to partitioned table
Ya I very much agree with you on those lines. Using the basic stuff would literally run into memory issues  with large datasets. I had some of those resolved by using the DISTRIBUTE BY clause and so. In short a little work around over your hive queries could help you out in some cases.
Regards
Bejoy K S

-----Original Message-----
From: hadoopman <[EMAIL PROTECTED]>
Date: Sun, 14 Aug 2011 08:57:12
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: how to load data to partitioned table

Something else I've noticed is when loading LOTS of historical data, if
you can try to say load a month of data at a time, try to just load THAT
month of data and only that month.  I've been able to load several years
of data (depending on the data) at a single load however there have been
times when loading a large dataset that I would run into memory issues
during the reduce phase (usually during shuffle/sort).  Things from out
of memory to stack overflow messages (I've compiled a list of the more
fun ones).

Then I noticed that only loading data from say a single month loaded
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!

On 08/12/2011 07:58 AM, [EMAIL PROTECTED] wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition
> based hive table from any input file the most hassle free approach
> would be.
> 1. Load the data into a non partitioned table that shares similar
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data
> partitions and distribute data accordingly.
>
> A similar implementation is described in the blog post
> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>
> Hope it helps
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <[EMAIL PROTECTED]>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: how to load data to partitioned table
>
> Hey ,
>
> Simpley you have run query like this
>
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key)
> SELECT *
>
>
> Regards
> Vikas Srivastava
>
>
> 2011/8/12 Daniel,Wu <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>
>
>     However it doesn't work also. please help.
>
>
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB