|
|
-
Re: Re: Help!!The problem about HadoopAlejandro Abdelnur 2010-10-05, 10:07
Or you could try using MultiFileInputFormat for your MR job.
http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred/MultiFileInputFormat.html Alejandro On Tue, Oct 5, 2010 at 4:55 PM, Harsh J <[EMAIL PROTECTED]> wrote: > 500 small files comprising one gigabyte? Perhaps you should try > concatenating them all into one big file and try; as a mapper is > supposed to run at least for a minute optimally. And small files don't > make good use of the HDFS block feature. > > Have a read: http://www.cloudera.com/blog/2009/02/the-small-files-problem/ > > 2010/10/5 Jander <[EMAIL PROTECTED]>: >> Hi Jeff, >> >> Thank you very much for your reply sincerely. >> >> I exactly know hadoop has overhead, but is it too large in my problem? >> >> The 1GB text input has about 500 map tasks because the input is composed of little text file. And the time each map taken is from 8 seconds to 20 seconds. I use compression like conf.setCompressMapOutput(true). >> >> Thanks, >> Jander >> >> >> >> >> At 2010-10-05 16:28:55,"Jeff Zhang" <[EMAIL PROTECTED]> wrote: >> >>>Hi Jander, >>> >>>Hadoop has overhead compared to single-machine solution. How many task >>>have you get when you run your hadoop job ? And what is time consuming >>>for each map and reduce task ? >>> >>>There's lots of tips for performance tuning of hadoop. Such as >>>compression and jvm reuse. >>> >>> >>>2010/10/5 Jander <[EMAIL PROTECTED]>: >>>> Hi, all >>>> I do an application using hadoop. >>>> I take 1GB text data as input the result as follows: >>>> (1) the cluster of 3 PCs: the time consumed is 1020 seconds. >>>> (2) the cluster of 4 PCs: the time is about 680 seconds. >>>> But the application before I use Hadoop takes about 280 seconds, so as the speed above, I must use 8 PCs in order to have the same speed as before. Now the problem: whether it is correct? >>>> >>>> Jander, >>>> Thanks. >>>> >>>> >>>> >>> >>> >>> >>>-- >>>Best Regards >>> >>>Jeff Zhang >> > > > > -- > Harsh J > www.harshj.com > |