Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Maximum Number of Hive Partitions = 256?


Copy link to this message
-
RE: Maximum Number of Hive Partitions = 256?
I have way more than 256 partitions per table. AFAIK, there is no partition limit.

>From your stack trace, you have some host name issue somewhere.
From: Time Less [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 03, 2011 6:52 PM
To: [EMAIL PROTECTED]
Subject: Maximum Number of Hive Partitions = 256?

I created a partitioned table, partitioned daily. If I query the earlier partitions, everything works. The later ones fail with error:

hive> select substr(user_name,1,1),count(*) from u_s_h_b where dtpartition='2010-10-24' group by substr(user_name,1,1) ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
java.lang.ArrayIndexOutOfBoundsException: 0
    at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:556)
    at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:524)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:235)
......snip.......
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Job Submission failed with exception 'java.lang.ArrayIndexOutOfBoundsException(0)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

It turns out that 2010-10-24 is 257 days from the very first partition in my dataset (2010-01-09):

| date_sub('2010-10-24',interval 257 day) |
+-----------------------------------------+
| 2010-02-09                              |

That seems like an interesting coincidence. But try as I might, the Great Googles will not show me a way to tune this, or even if it is tuneable, or expected. Has anyone else run into a 256-partition limit in Hive? How do you work around it? Why is that even the limit?! Shouldn't it be more like 32-bit maxint??!!

Thanks!

--
Tim Ellis
Riot Games
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB