Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Dealing with large number of partitions

Copy link to this message
Re: Dealing with large number of partitions
Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
before you query, this may be help.

2010/6/11 Sammy Yu <[EMAIL PROTECTED]>

> Hi,
>    I am having an issue with a large number of 4000 partitions (each being
> very small <10k files).  Any queries that I do which involve these
> partitions take an extremely long time to complete (10+ hours), I was
> wondering if there was any easy way in hive without having to merge the
> files improve it's performance.  I can see the map reduce jobs are taking a
> long time due to the fact that there are so many separated raw data files
> that need to be read.  I saw that HIVE-1332 dealt with using HAR files for
> partitioning.  Could this perhaps help performance rather than hurt it,
> given that the queries will be using all the partitions in the har file?
> Thanks,
> Sammy