Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop


Copy link to this message
-
HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
Hi guys,
I've been working on integrating hive with a legacy file format we use
here. I wrote the appropriate InputFormat and SerDe and everything
works, but it's painfully slow.
The reason is that the files I am reading are many and hive uses one
mapper for every file.
I saw the HIVE-74 patches but those use CombineFileInputFormat which
is available on hadoop 0.20...but we use 0.19. Is there any reason the
same goal could not be achieved using the deprecated (but present  <
0.20) MultiFileInputFormat ?

Thanks,
Roberto
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB