Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - mapper is slower than hive' mapper


Copy link to this message
-
Re: mapper is slower than hive' mapper
Edward Capriolo 2012-08-01, 15:13
As mentioned, if you avoid using new, by re-using objects and possibly
use buffer objects you may be able to match or beat the speed. But in
the general case the hive saves you time by allowing you not to worry
about low level details like this.

On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck
<[EMAIL PROTECTED]> wrote:
> This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself. Compilers know the tricks of their target language.
>
> Chuck Connell
> Nuance R&D Data Team
> Burlington, MA
>
>
> -----Original Message-----
> From: Yue Guan [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 01, 2012 10:29 AM
> To: [EMAIL PROTECTED]
> Subject: mapper is slower than hive' mapper
>
> Hi, there
>
> I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like:
>
> select sum(column1) from table group by column2, column3;
>
> My mapreduce program likes this:
>
>      public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey, DoubleWritable> {
>
>          public void map(BytesWritable key, Text value, Context context) throws IOException, InterruptedException {
>                  String[] sLine = StringUtils.split(value.toString(),
> StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>              context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
>          }
>
>      }
>
> I assume hive is doing something similar. Is there any trick in hive to speed this thing up? Thank you!
>
> Best,
>