Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> mapper is slower than hive' mapper


Copy link to this message
-
RE: mapper is slower than hive' mapper
This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself. Compilers know the tricks of their target language.

Chuck Connell
Nuance R&D Data Team
Burlington, MA
-----Original Message-----
From: Yue Guan [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 01, 2012 10:29 AM
To: [EMAIL PROTECTED]
Subject: mapper is slower than hive' mapper

Hi, there

I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like:

select sum(column1) from table group by column2, column3;

My mapreduce program likes this:

     public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey, DoubleWritable> {

         public void map(BytesWritable key, Text value, Context context) throws IOException, InterruptedException {
                 String[] sLine = StringUtils.split(value.toString(),
StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
             context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
         }

     }

I assume hive is doing something similar. Is there any trick in hive to speed this thing up? Thank you!

Best,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB