Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop


Copy link to this message
-
HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
Roberto Congiu 2009-09-29, 01:57
Hi guys,
I've been working on integrating hive with a legacy file format we use
here. I wrote the appropriate InputFormat and SerDe and everything
works, but it's painfully slow.
The reason is that the files I am reading are many and hive uses one
mapper for every file.
I saw the HIVE-74 patches but those use CombineFileInputFormat which
is available on hadoop 0.20...but we use 0.19. Is there any reason the
same goal could not be achieved using the deprecated (but present  <
0.20) MultiFileInputFormat ?

Thanks,
Roberto