Hadoop has overhead compared to single-machine solution. How many task
have you get when you run your hadoop job ? And what is time consuming
for each map and reduce task ?
There's lots of tips for performance tuning of hadoop. Such as
compression and jvm reuse.
2010/10/5 Jander <[EMAIL PROTECTED]>:
> Hi, all
> I do an application using hadoop.
> I take 1GB text data as input the result as follows:
> (1) the cluster of 3 PCs: the time consumed is 1020 seconds.
> (2) the cluster of 4 PCs: the time is about 680 seconds.
> But the application before I use Hadoop takes about 280 seconds, so as the speed above, I must use 8 PCs in order to have the same speed as before. Now the problem: whether it is correct?