-Re: Dealing with large number of partitions
wd 2010-06-11, 06:36
Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
before you query, this may be help.
2010/6/11 Sammy Yu <[EMAIL PROTECTED]>
> I am having an issue with a large number of 4000 partitions (each being
> very small <10k files). Any queries that I do which involve these
> partitions take an extremely long time to complete (10+ hours), I was
> wondering if there was any easy way in hive without having to merge the
> files improve it's performance. I can see the map reduce jobs are taking a
> long time due to the fact that there are so many separated raw data files
> that need to be read. I saw that HIVE-1332 dealt with using HAR files for
> partitioning. Could this perhaps help performance rather than hurt it,
> given that the queries will be using all the partitions in the har file?