|
|
-
HBase - hiting only one node on insert ...
pasaliczaharije 2010-01-18, 11:54
Hi we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores each node) + 1 master and on same nodes we deployed HBase (7 nodes). Currrenlty we are importing ~50milion records from csv files into hbase. csv can have about 100 columns and rowkey is uuid generated with java.util.UUID. We are having about 50files on HDFS which is imported into hbase by mapreduce. At start everything works fine, but after few minutes, we are having large load on second node. Here is list from hbase master.jsp hadoop-node01:60030 1263591474251 requests=184, regions=148, usedHeap=1196, maxHeap=1991 hadoop-node02:60030 1263591474109 requests=663, regions=148, usedHeap=1489, maxHeap=1991 hadoop-node03:60030 1263591474082 requests=161, regions=147, usedHeap=1526, maxHeap=1991 hadoop-node04:60030 1263632774794 requests=142, regions=147, usedHeap=1213, maxHeap=1991 hadoop-node06:60030 1263596977608 requests=152, regions=147, usedHeap=749, maxHeap=1991 hadoop-node07:60030 1263597118777 requests=156, regions=148, usedHeap=1749, maxHeap=1991 hadoop-node08:60030 1263597239565 requests=179, regions=148, usedHeap=1681, maxHeap=1991 (second node having about 5times more requests than other nodes) and at some time we will have request=0 for all nodes excepts for node2 (where we are having about 600-1800). In general we used uuid to have some kind of uniform load for all nodes. I'm not sure is this some UUID thing (not uniform) or something other. Also, we are using default hadoop configuration (70nodes will result in 14 maps which runs in parallel). Is this optimal for this kind of job? Any comments? Thanks -Zaharije -- View this message in context: http://old.nabble.com/HBase---hiting-only-one-node-on-insert-...-tp27209452p27209452.htmlSent from the HBase User mailing list archive at Nabble.com.
+
pasaliczaharije 2010-01-18, 11:54
-
Re: HBase - hiting only one node on insert ...
pasaliczaharije 2010-01-18, 11:55
Sorry for messed text. Here is propper format: Hi we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores each node) + 1 master and on same nodes we deployed HBase (7 nodes). Currrenlty we are importing ~50milion records from csv files into hbase. csv can have about 100 columns and rowkey is uuid generated with java.util.UUID. We are having about 50files on HDFS which is imported into hbase by mapreduce. At start everything works fine, but after few minutes, we are having large load on second node. Here is list from hbase master.jsp hadoop-node01:60030 1263591474251 requests=184, regions=148, usedHeap=1196, maxHeap=1991 hadoop-node02:60030 1263591474109 requests=663, regions=148, usedHeap=1489, maxHeap=1991 hadoop-node03:60030 1263591474082 requests=161, regions=147, usedHeap=1526, maxHeap=1991 hadoop-node04:60030 1263632774794 requests=142, regions=147, usedHeap=1213, maxHeap=1991 hadoop-node06:60030 1263596977608 requests=152, regions=147, usedHeap=749, maxHeap=1991 hadoop-node07:60030 1263597118777 requests=156, regions=148, usedHeap=1749, maxHeap=1991 hadoop-node08:60030 1263597239565 requests=179, regions=148, usedHeap=1681, maxHeap=1991 (second node having about 5times more requests than other nodes) and at some time we will have request=0 for all nodes excepts for node2 (where we are having about 600-1800). In general we used uuid to have some kind of uniform load for all nodes. I'm not sure is this some UUID thing (not uniform) or something other. Also, we are using default hadoop configuration (70nodes will result in 14 maps which runs in parallel). Is this optimal for this kind of job? Any comments? Thanks -Zaharije -- View this message in context: http://old.nabble.com/HBase---hiting-only-one-node-on-insert-...-tp27209452p27209462.htmlSent from the HBase User mailing list archive at Nabble.com.
+
pasaliczaharije 2010-01-18, 11:55
-
Re: HBase - hiting only one node on insert ...
Cosmin Lehene 2010-01-18, 16:56
I'm not sure why there would be 0 requests for most region servers, but I usually se a higher number of requests (even when the cluster is idle) on the regionserver that serves .META. My guess is that, on your cluster, hadoop-node02 serves .META.
Cosmin On 1/18/10 1:55 PM, "pasaliczaharije" <[EMAIL PROTECTED]> wrote:
> > Sorry for messed text. Here is propper format: > > > Hi > > we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores > each node) + 1 master and on same nodes we deployed HBase (7 nodes). > > Currrenlty we are importing ~50milion records from csv files into hbase. csv > can have about 100 columns and rowkey is uuid generated with java.util.UUID. > > We are having about 50files on HDFS which is imported into hbase by > mapreduce. > > At start everything works fine, but after few minutes, we are having large > load on second node. Here is list from hbase master.jsp > > hadoop-node01:60030 1263591474251 requests=184, regions=148, usedHeap=1196, > maxHeap=1991 > hadoop-node02:60030 1263591474109 requests=663, regions=148, usedHeap=1489, > maxHeap=1991 > hadoop-node03:60030 1263591474082 requests=161, regions=147, usedHeap=1526, > maxHeap=1991 > hadoop-node04:60030 1263632774794 requests=142, regions=147, usedHeap=1213, > maxHeap=1991 > hadoop-node06:60030 1263596977608 requests=152, regions=147, usedHeap=749, > maxHeap=1991 > hadoop-node07:60030 1263597118777 requests=156, regions=148, usedHeap=1749, > maxHeap=1991 > hadoop-node08:60030 1263597239565 requests=179, regions=148, usedHeap=1681, > maxHeap=1991 > > (second node having about 5times more requests than other nodes) and at some > time we will have request=0 for all nodes excepts for node2 (where we are > having about 600-1800). > > In general we used uuid to have some kind of uniform load for all nodes. I'm > not sure is this some UUID thing (not uniform) or something other. > > Also, we are using default hadoop configuration (70nodes will result in 14 > maps which runs in parallel). Is this optimal for this kind of job? > > Any comments? > > Thanks > -Zaharije >
+
Cosmin Lehene 2010-01-18, 16:56
-
Re: HBase - hiting only one node on insert ...
Zaharije Pasalic 2010-01-18, 17:12
Yes. That node contains META table. So, i can expect that for node(s) which will contain META? On Mon, Jan 18, 2010 at 5:56 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > I'm not sure why there would be 0 requests for most region servers, but I > usually se a higher number of requests (even when the cluster is idle) on > the regionserver that serves .META. My guess is that, on your cluster, > hadoop-node02 serves .META. > > Cosmin > > > On 1/18/10 1:55 PM, "pasaliczaharije" <[EMAIL PROTECTED]> wrote: > >> >> Sorry for messed text. Here is propper format: >> >> >> Hi >> >> we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores >> each node) + 1 master and on same nodes we deployed HBase (7 nodes). >> >> Currrenlty we are importing ~50milion records from csv files into hbase. csv >> can have about 100 columns and rowkey is uuid generated with java.util.UUID. >> >> We are having about 50files on HDFS which is imported into hbase by >> mapreduce. >> >> At start everything works fine, but after few minutes, we are having large >> load on second node. Here is list from hbase master.jsp >> >> hadoop-node01:60030 1263591474251 requests=184, regions=148, usedHeap=1196, >> maxHeap=1991 >> hadoop-node02:60030 1263591474109 requests=663, regions=148, usedHeap=1489, >> maxHeap=1991 >> hadoop-node03:60030 1263591474082 requests=161, regions=147, usedHeap=1526, >> maxHeap=1991 >> hadoop-node04:60030 1263632774794 requests=142, regions=147, usedHeap=1213, >> maxHeap=1991 >> hadoop-node06:60030 1263596977608 requests=152, regions=147, usedHeap=749, >> maxHeap=1991 >> hadoop-node07:60030 1263597118777 requests=156, regions=148, usedHeap=1749, >> maxHeap=1991 >> hadoop-node08:60030 1263597239565 requests=179, regions=148, usedHeap=1681, >> maxHeap=1991 >> >> (second node having about 5times more requests than other nodes) and at some >> time we will have request=0 for all nodes excepts for node2 (where we are >> having about 600-1800). >> >> In general we used uuid to have some kind of uniform load for all nodes. I'm >> not sure is this some UUID thing (not uniform) or something other. >> >> Also, we are using default hadoop configuration (70nodes will result in 14 >> maps which runs in parallel). Is this optimal for this kind of job? >> >> Any comments? >> >> Thanks >> -Zaharije >> > >
+
Zaharije Pasalic 2010-01-18, 17:12
-
Re: HBase - hiting only one node on insert ...
Jean-Daniel Cryans 2010-01-18, 17:52
Yes.
J-D
On Mon, Jan 18, 2010 at 9:12 AM, Zaharije Pasalic <[EMAIL PROTECTED]> wrote: > Yes. That node contains META table. So, i can expect that for node(s) > which will contain META? > > > On Mon, Jan 18, 2010 at 5:56 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: >> I'm not sure why there would be 0 requests for most region servers, but I >> usually se a higher number of requests (even when the cluster is idle) on >> the regionserver that serves .META. My guess is that, on your cluster, >> hadoop-node02 serves .META. >> >> Cosmin >> >> >> On 1/18/10 1:55 PM, "pasaliczaharije" <[EMAIL PROTECTED]> wrote: >> >>> >>> Sorry for messed text. Here is propper format: >>> >>> >>> Hi >>> >>> we are having small Hadoop cluster environment with 7 nodes (8GB ram/8cores >>> each node) + 1 master and on same nodes we deployed HBase (7 nodes). >>> >>> Currrenlty we are importing ~50milion records from csv files into hbase. csv >>> can have about 100 columns and rowkey is uuid generated with java.util.UUID. >>> >>> We are having about 50files on HDFS which is imported into hbase by >>> mapreduce. >>> >>> At start everything works fine, but after few minutes, we are having large >>> load on second node. Here is list from hbase master.jsp >>> >>> hadoop-node01:60030 1263591474251 requests=184, regions=148, usedHeap=1196, >>> maxHeap=1991 >>> hadoop-node02:60030 1263591474109 requests=663, regions=148, usedHeap=1489, >>> maxHeap=1991 >>> hadoop-node03:60030 1263591474082 requests=161, regions=147, usedHeap=1526, >>> maxHeap=1991 >>> hadoop-node04:60030 1263632774794 requests=142, regions=147, usedHeap=1213, >>> maxHeap=1991 >>> hadoop-node06:60030 1263596977608 requests=152, regions=147, usedHeap=749, >>> maxHeap=1991 >>> hadoop-node07:60030 1263597118777 requests=156, regions=148, usedHeap=1749, >>> maxHeap=1991 >>> hadoop-node08:60030 1263597239565 requests=179, regions=148, usedHeap=1681, >>> maxHeap=1991 >>> >>> (second node having about 5times more requests than other nodes) and at some >>> time we will have request=0 for all nodes excepts for node2 (where we are >>> having about 600-1800). >>> >>> In general we used uuid to have some kind of uniform load for all nodes. I'm >>> not sure is this some UUID thing (not uniform) or something other. >>> >>> Also, we are using default hadoop configuration (70nodes will result in 14 >>> maps which runs in parallel). Is this optimal for this kind of job? >>> >>> Any comments? >>> >>> Thanks >>> -Zaharije >>> >> >> >
+
Jean-Daniel Cryans 2010-01-18, 17:52
|
|