Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> muti-thread mapreduce


+
Yu Yang 2012-12-12, 10:09
+
Yang 2012-12-12, 17:23
+
Harsh J 2012-12-12, 18:47
+
Yang 2012-12-12, 18:58
+
Yu Yang 2012-12-13, 12:14
Copy link to this message
-
Re: muti-thread mapreduce
I suppose you could also leverage job configuration or per-input
mapper impl. via MultipleInputs to do this.

On Thu, Dec 13, 2012 at 5:44 PM, Yu Yang <[EMAIL PROTECTED]> wrote:
> Thank you all.  In fact, I don't expect that this way can help to enhance
> the performance.
>  I  need to process 3 different logs (with different format). I just want to
> sart all these 3 logs processing at the same time , all in just this one
> program. but  I can give different separator to each thread to create maps
> to  handle different logs.
>
>
> 2012/12/13 Yang <[EMAIL PROTECTED]>
>>
>> but I do have run across some situations where I could benefit from
>> multi-threading: if your hadoop mapper is prone to random access IO (such as
>> looking up a TFile, or HBase, which ultimately makes a network call and then
>> looks into a file segment), having multiple threads could utilize the CPU
>> while IO is going on
>>
>>
>> On Wed, Dec 12, 2012 at 10:47 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Exactly - A job is already designed to be properly parallel w.r.t. its
>>> input, and this would just add additional overheads of job setup and
>>> scheduling. If your per-record processing requires threaded work,
>>> consider using the MultithreadedMapper/Reducer classes instead.
>>>
>>> On Wed, Dec 12, 2012 at 10:53 PM, Yang <[EMAIL PROTECTED]> wrote:
>>> > I think it won't help much, since in a hadoop cluster, people already
>>> > allocate "SLOTS" to be the number of cores, supposedly the inherent
>>> > parallelism can be already exploited, since different mappers/reducers
>>> > are
>>> > completely independent.
>>> >
>>> >
>>> > On Wed, Dec 12, 2012 at 2:09 AM, Yu Yang <[EMAIL PROTECTED]> wrote:
>>> >>
>>> >> Dears,
>>> >>
>>> >> I suddenly got this idea to do mapreduce job in a muti-thread way.
>>> >> I don't know if it can work. Could anyone give me some advices?
>>> >> Here is the java code:
>>> >>
>>> >>
>>> >> import java.io.IOException;
>>> >> import org.apache.hadoop.conf.Configuration;
>>> >> import org.apache.hadoop.fs.Path;
>>> >> import org.apache.hadoop.io.LongWritable;
>>> >> import org.apache.hadoop.io.Text;
>>> >> import org.apache.hadoop.mapreduce.Job;
>>> >> import org.apache.hadoop.mapreduce.Mapper;
>>> >> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>>> >> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
>>> >> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>>> >> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
>>> >>
>>> >> public class LogProcessApp extends Thread {
>>> >>
>>> >>  private static String sep;
>>> >>  private String x2;
>>> >>  private String x3;
>>> >>
>>> >>  public LogProcessApp(String arg1,String arg2,String arg3){
>>> >>   sep=arg1;
>>> >>   x2=arg2;
>>> >>   x3=arg3;
>>> >>  }
>>> >>
>>> >>  public static class CM extends Mapper<LongWritable, Text, Text,
>>> >> Text>{
>>> >>   private Text keyvar = new Text();
>>> >>   private Text valvar = new Text();
>>> >>   public void map(LongWritable key, Text value, Context context)
>>> >>     throws IOException, InterruptedException {
>>> >>    String line = value.toString();
>>> >>    try{
>>> >>     String data[] = line.split(sep);
>>> >>     keyvar.set(data[0]);
>>> >>     valvar.set(data[1]);
>>> >>     context.write(keyvar,valvar);
>>> >>    } catch (Exception e) {
>>> >>     return;
>>> >>    }
>>> >>   }
>>> >>  }
>>> >>
>>> >>  public void run(){
>>> >>   Configuration conf = new Configuration();
>>> >>   Job job = null;
>>> >>   try {
>>> >>    job = new Job(conf);
>>> >>   } catch (IOException e1) {
>>> >>    // TODO Auto-generated catch block
>>> >>    e1.printStackTrace();
>>> >>   }
>>> >>
>>> >>   job.setJobName("XXXJob");
>>> >>   job.setJarByClass(CMR.class);
>>> >>
>>> >>   job.setOutputKeyClass(Text.class);
>>> >>   job.setOutputValueClass(Text.class);
>>> >>
>>> >>   job.setMapperClass(CM.class);
>>> >>
>>> >>   job.setInputFormatClass(TextInputFormat.class);
>>> >>   job.setOutputFormatClass(TextOutputFormat.class);

Harsh J
+
satish verma 2012-12-26, 08:07
+
Harsh J 2012-12-26, 09:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB