Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Maximum Number of Hive Partitions = 256?


Copy link to this message
-
Re: Maximum Number of Hive Partitions = 256?
Viral Bajaria 2011-05-04, 02:53
same here ... we have way more than 256 partitions in multiple tables. I am
sure the issue has something to do with an empty string passed to the substr
function. can you validate that the table has no null/empty string for
user_name or try running the query with len(user_name) > 1 (not sure about
query syntax) ?

On Tue, May 3, 2011 at 7:02 PM, Steven Wong <[EMAIL PROTECTED]> wrote:

> I have way more than 256 partitions per table. AFAIK, there is no partition
> limit.
>
>
>
> From your stack trace, you have some host name issue somewhere.
>
>
>
>
>
> *From:* Time Less [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, May 03, 2011 6:52 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Maximum Number of Hive Partitions = 256?
>
>
>
> I created a partitioned table, partitioned daily. If I query the earlier
> partitions, everything works. The later ones fail with error:
>
> hive> select substr(user_name,1,1),count(*) from u_s_h_b where
> dtpartition='2010-10-24' group by substr(user_name,1,1) ;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> java.lang.ArrayIndexOutOfBoundsException: 0
>     at
> org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:556)
>     at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:524)
>     at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:235)
> ......snip.......
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Job Submission failed with exception
> 'java.lang.ArrayIndexOutOfBoundsException(0)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> It turns out that 2010-10-24 is 257 days from the very first partition in
> my dataset (2010-01-09):
>
> | date_sub('2010-10-24',interval 257 day) |
> +-----------------------------------------+
> | 2010-02-09                              |
>
> That seems like an interesting coincidence. But try as I might, the Great
> Googles will not show me a way to tune this, or even if it is tuneable, or
> expected. Has anyone else run into a 256-partition limit in Hive? How do you
> work around it? Why is that even the limit?! Shouldn't it be more like
> 32-bit maxint??!!
>
> Thanks!
>
> --
> Tim Ellis
> Riot Games
>