|
|
-
Distributed table processing is slower that local table processing
Alexander Goryunov 2012-03-29, 15:37
Hello,
I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker and namenode with Hadoop and HBase and have strange performance results.
The same map job runs with speed about 300 000 records per second for 1 node table and 100 000 records per second for table distributed to 3 nodes.
Scan caching is 1000, each row is about 0.2K, compression is off, setCacheBlock is false.
7 map tasks in parallel for each node. (281 for the big table in summary and 16 for the small table)
Map job reads some sequential data and writes down a few from it. No reduce tasks are set for this job. Both table have the same data and have sizes about 10M (first one) records and 150M (second one) records.
Do you have any idea what could be the reason of such behavior?
Thanks.
-
Re: Distributed table processing is slower that local table processing
anil gupta 2012-03-29, 23:26
Hi Alexander,
Is data properly distributed over the cluster in Distributed Mode? If the data is not then you wont get good results in distributed mode.
Thanks, Anil Gupta
On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <[EMAIL PROTECTED]>wrote:
> Hello, > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker > and namenode with Hadoop and HBase and have strange performance results. > > The same map job runs with speed about 300 000 records per second for 1 > node table and 100 000 records per second for table distributed to 3 > nodes. > > Scan caching is 1000, each row is about 0.2K, compression is off, > setCacheBlock is false. > > 7 map tasks in parallel for each node. (281 for the big table in summary > and 16 for the small table) > > Map job reads some sequential data and writes down a few from it. No reduce > tasks are set for this job. > > > Both table have the same data and have sizes about 10M (first one) records > and 150M (second one) records. > > Do you have any idea what could be the reason of such behavior? > > Thanks. >
-- Thanks & Regards, Anil Gupta
-
Re: Distributed table processing is slower that local table processing
Alexander Goryunov 2012-03-30, 08:35
Hi Anil,
Yes, the second table is distributed, the first is not and I have 3х better results for nondistrubuted table.
I use distributed hadoop mode for all cases.
Thanks.
On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Alexander, > > Is data properly distributed over the cluster in Distributed Mode? If the > data is not then you wont get good results in distributed mode. > > Thanks, > Anil Gupta > > On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <[EMAIL PROTECTED] > >wrote: > > > Hello, > > > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker > > and namenode with Hadoop and HBase and have strange performance results. > > > > The same map job runs with speed about 300 000 records per second for 1 > > node table and 100 000 records per second for table distributed to 3 > > nodes. > > > > Scan caching is 1000, each row is about 0.2K, compression is off, > > setCacheBlock is false. > > > > 7 map tasks in parallel for each node. (281 for the big table in summary > > and 16 for the small table) > > > > Map job reads some sequential data and writes down a few from it. No > reduce > > tasks are set for this job. > > > > > > Both table have the same data and have sizes about 10M (first one) > records > > and 150M (second one) records. > > > > Do you have any idea what could be the reason of such behavior? > > > > Thanks. > > > > > > -- > Thanks & Regards, > Anil Gupta >
-
Re: Distributed table processing is slower that local table processing
anil gupta 2012-03-30, 20:57
Hi Alexander,
If you can provide more details of the stuff you are doing then it would be helpful. Are you sure that your cluster is running in distributed mode? Did you ran the job with 1 node in cluster and then added 2 additional node to the same cluster?
Thanks, Anil
2012/3/30 Alexander Goryunov <[EMAIL PROTECTED]>
> Hi Anil, > > Yes, the second table is distributed, the first is not and I have 3х better > results for nondistrubuted table. > > I use distributed hadoop mode for all cases. > > Thanks. > > > > On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <[EMAIL PROTECTED]> wrote: > > > Hi Alexander, > > > > Is data properly distributed over the cluster in Distributed Mode? If the > > data is not then you wont get good results in distributed mode. > > > > Thanks, > > Anil Gupta > > > > On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov < > [EMAIL PROTECTED] > > >wrote: > > > > > Hello, > > > > > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for > jobtracker > > > and namenode with Hadoop and HBase and have strange performance > results. > > > > > > The same map job runs with speed about 300 000 records per second for 1 > > > node table and 100 000 records per second for table distributed to 3 > > > nodes. > > > > > > Scan caching is 1000, each row is about 0.2K, compression is off, > > > setCacheBlock is false. > > > > > > 7 map tasks in parallel for each node. (281 for the big table in > summary > > > and 16 for the small table) > > > > > > Map job reads some sequential data and writes down a few from it. No > > reduce > > > tasks are set for this job. > > > > > > > > > Both table have the same data and have sizes about 10M (first one) > > records > > > and 150M (second one) records. > > > > > > Do you have any idea what could be the reason of such behavior? > > > > > > Thanks. > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > >
-- Thanks & Regards, Anil Gupta
-
Re: Distributed table processing is slower that local table processing
Alexander Goryunov 2012-03-31, 08:45
Hi Anil,
Yes, I'm sure I'm running cluster in distributed mode (I see 21 parallel map tasks in job tracker and processes on each node). max map tasks set to 7 for each node.
I run my job with the same cluster configuration on two tables: 1. Table located only on 1 node (I see it on HBase master page) - 10M records 2. Table even distribute on 3 nodes (also checked on HBase master page) - 150M records. Thanks.
On Sat, Mar 31, 2012 at 12:57 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Alexander, > > If you can provide more details of the stuff you are doing then it would be > helpful. Are you sure that your cluster is running in distributed mode? Did > you ran the job with 1 node in cluster and then added 2 additional node to > the same cluster? > > Thanks, > Anil > > 2012/3/30 Alexander Goryunov <[EMAIL PROTECTED]> > > > Hi Anil, > > > > Yes, the second table is distributed, the first is not and I have 3х > better > > results for nondistrubuted table. > > > > I use distributed hadoop mode for all cases. > > > > Thanks. > > > > > > > > On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <[EMAIL PROTECTED]> > wrote: > > > > > Hi Alexander, > > > > > > Is data properly distributed over the cluster in Distributed Mode? If > the > > > data is not then you wont get good results in distributed mode. > > > > > > Thanks, > > > Anil Gupta > > > > > > On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Hello, > > > > > > > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for > > jobtracker > > > > and namenode with Hadoop and HBase and have strange performance > > results. > > > > > > > > The same map job runs with speed about 300 000 records per second > for 1 > > > > node table and 100 000 records per second for table distributed to 3 > > > > nodes. > > > > > > > > Scan caching is 1000, each row is about 0.2K, compression is off, > > > > setCacheBlock is false. > > > > > > > > 7 map tasks in parallel for each node. (281 for the big table in > > summary > > > > and 16 for the small table) > > > > > > > > Map job reads some sequential data and writes down a few from it. No > > > reduce > > > > tasks are set for this job. > > > > > > > > > > > > Both table have the same data and have sizes about 10M (first one) > > > records > > > > and 150M (second one) records. > > > > > > > > Do you have any idea what could be the reason of such behavior? > > > > > > > > Thanks. > > > > > > > > > > > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > > > > -- > Thanks & Regards, > Anil Gupta >
|
|