-Re: Distributed table processing is slower that local table processing
anil gupta 2012-03-29, 23:26
Is data properly distributed over the cluster in Distributed Mode? If the
data is not then you wont get good results in distributed mode.
On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <[EMAIL PROTECTED]>wrote:
> I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker
> and namenode with Hadoop and HBase and have strange performance results.
> The same map job runs with speed about 300 000 records per second for 1
> node table and 100 000 records per second for table distributed to 3
> Scan caching is 1000, each row is about 0.2K, compression is off,
> setCacheBlock is false.
> 7 map tasks in parallel for each node. (281 for the big table in summary
> and 16 for the small table)
> Map job reads some sequential data and writes down a few from it. No reduce
> tasks are set for this job.
> Both table have the same data and have sizes about 10M (first one) records
> and 150M (second one) records.
> Do you have any idea what could be the reason of such behavior?
Thanks & Regards,