-RE: Dealing with large number of partitions
Ashish Thusoo 2010-06-11, 23:09
+1 to that. That should help provided you are running hadoop 0.20 ..
From: wd [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 10, 2010 11:36 PM
To: [EMAIL PROTECTED]
Subject: Re: Dealing with large number of partitions
Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; before you query, this may be help.
2010/6/11 Sammy Yu <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
I am having an issue with a large number of 4000 partitions (each being very small <10k files). Any queries that I do which involve these partitions take an extremely long time to complete (10+ hours), I was wondering if there was any easy way in hive without having to merge the files improve it's performance. I can see the map reduce jobs are taking a long time due to the fact that there are so many separated raw data files that need to be read. I saw that HIVE-1332 dealt with using HAR files for partitioning. Could this perhaps help performance rather than hurt it, given that the queries will be using all the partitions in the har file?