-Re: HIVE issues when using large number of partitions
Edward Capriolo 2013-03-09, 17:53
2) Getting 'out of memory' java exception while adding partitions > 50000
3) Sometimes getting 'out of memory' java exception for select queries for
partitions > 10000
So hive/hadoop have to "plan" the job. Planning involves building all the
partitions into a list (in memory) of the client. It also involves hadoop
jobtracker calculating all the split information. With two many partitions
you push the limits of your client and the job tracker, which can not be
distributed. You can up your client heap and job tracker memory to a point,
but this is more of an anti-pattern.
Do not plan on having must success with a task spanning 20K + partitions.
On Sat, Mar 9, 2013 at 11:49 AM, Ramki Palle <[EMAIL PROTECTED]> wrote:
> Check this for your first question:
> Please post if you find any solution for your 2nd and 3rd questions.
> On Thu, Mar 7, 2013 at 8:01 PM, Suresh Krishnappa <
> [EMAIL PROTECTED]> wrote:
>> Hi All,
>> I have a hadoop cluster with data present in large number of directories
>> ( > 10,000)
>> To run HIVE queries over this data I created an external partitioned
>> table and pointed each directory as a partition to the external table using
>> 'alter table add partition' command.
>> Is there a better way to create a HIVE external table over large number
>> of directories?
>> Also I am facing the following issues due to the large number of
>> 1) The DDL operations of creating the table and adding partitions to the
>> table takes a very long time. Takes about an hour to add around 10,000
>> 2) Getting 'out of memory' java exception while adding partitions > 50000
>> 3) Sometimes getting 'out of memory' java exception for select queries
>> for partitions > 10000
>> What is the recommended limit to the number of partitions that we can
>> create with an HIVE table?
>> Are there any configuration settings in hive/hadoop to support large
>> number of partitions?
>> I am using HIVE 0.10.0. I re-ran the tests by replacing derby with
>> postgresql as metastore and still faced similar issues.
>> Would appreciate any inputs on this