Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Partition performance


+
Ian 2013-04-04, 23:01
+
Sanjay Subramanian 2013-04-04, 23:06
+
Ramki Palle 2013-04-04, 23:21
+
Owen OMalley 2013-04-04, 23:25
+
Dean Wampler 2013-04-04, 23:28
+
Ian 2013-04-05, 18:36
+
Ramki Palle 2013-04-05, 20:12
+
Ian 2013-04-11, 22:25
+
Peter Marron 2013-07-02, 09:34
+
Owen OMalley 2013-07-02, 14:51
Copy link to this message
-
Re: Partition performance
David Morel 2013-07-03, 12:19
On 2 Jul 2013, at 16:51, Owen O'Malley wrote:

> On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Owen,****
>>
>> ** **
>>
>> I’m curious about this advice about partitioning. Is there some
>> fundamental reason why Hive****
>>
>> is slow when the number of partitions is 10,000 rather than 1,000?
>>
>
> The precise numbers don't matter. I wanted to give people a ballpark range
> that they should be looking at. Most tables at 1000 partitions won't cause
> big slow downs, but the cost scales with the number of partitions. By the
> time you are at 10,000 the cost is noticeable. I have one customer who has
> a table with 1.2 million partitions. That causes a lot of slow downs.

That is still not really answering the question, which is: why is it slower
to run a query on a heavily partitioned table than it is on the same number
of files in a less heavily partitioned table.

David
+
Edward Capriolo 2013-07-03, 14:22
+
Owen OMalley 2013-07-03, 14:56
+
Peter Marron 2013-07-04, 07:37
+
Peter Marron 2013-07-04, 09:25
+
Dean Wampler 2013-07-03, 13:51