Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> mapper is slower than hive' mapper


+
Yue Guan 2012-08-01, 14:28
+
Connell, Chuck 2012-08-01, 14:35
Copy link to this message
-
Re: mapper is slower than hive' mapper
As mentioned, if you avoid using new, by re-using objects and possibly
use buffer objects you may be able to match or beat the speed. But in
the general case the hive saves you time by allowing you not to worry
about low level details like this.

On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck
<[EMAIL PROTECTED]> wrote:
> This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself. Compilers know the tricks of their target language.
>
> Chuck Connell
> Nuance R&D Data Team
> Burlington, MA
>
>
> -----Original Message-----
> From: Yue Guan [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 01, 2012 10:29 AM
> To: [EMAIL PROTECTED]
> Subject: mapper is slower than hive' mapper
>
> Hi, there
>
> I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like:
>
> select sum(column1) from table group by column2, column3;
>
> My mapreduce program likes this:
>
>      public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey, DoubleWritable> {
>
>          public void map(BytesWritable key, Text value, Context context) throws IOException, InterruptedException {
>                  String[] sLine = StringUtils.split(value.toString(),
> StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>              context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
>          }
>
>      }
>
> I assume hive is doing something similar. Is there any trick in hive to speed this thing up? Thank you!
>
> Best,
>
+
Bertrand Dechoux 2012-08-01, 15:40
+
Yue Guan 2012-08-01, 17:36
+
Bertrand Dechoux 2012-08-06, 12:15
+
Edward Capriolo 2012-08-01, 15:49
+
Bertrand Dechoux 2012-08-01, 16:02
+
Bertrand Dechoux 2012-08-01, 14:41
+
Yue Guan 2012-08-01, 15:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB