Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> commit semantics


+
Joydeep Sarma 2010-01-11, 23:46
+
Ryan Rawson 2010-01-11, 23:58
+
Jean-Daniel Cryans 2010-01-12, 00:03
+
Joydeep Sarma 2010-01-12, 04:12
+
Jean-Daniel Cryans 2010-01-12, 04:48
+
Dhruba Borthakur 2010-01-12, 06:25
+
Ryan Rawson 2010-01-12, 06:53
+
Dhruba Borthakur 2010-01-12, 08:24
+
Ryan Rawson 2010-01-12, 08:39
+
Jean-Daniel Cryans 2010-01-12, 17:41
Copy link to this message
-
RE: commit semantics
Btw, is there much gains in having a large number of regions-- i.e. to the tune of 500 -- per region server?

I understand that having multiple regions per region server allows finer grained rebalancing when new nodes are added or a node goes down. But would say having a smaller number of regions per region server (say ~50) be really bad. If a region server goes down, 50 other nodes would pick up ~1/50 of its work. Not as good as 500 other nodes picking up 1/500 of its work each-- but seems acceptable still. Are there other advantages of having a large number of regions per region server?

regards,
Kannan
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Jean-Daniel Cryans
Sent: Tuesday, January 12, 2010 9:42 AM
To: [EMAIL PROTECTED]
Subject: Re: commit semantics

wrt 1 HLog per region server, this is from the Bigtable paper. Their
main concern is the number of opened files since if you have 1000
region servers * 500 regions then you may have 100 000 HLogs to
manage. Also you can have more than one file per HLog, so let's say
you have on average 5 log files per HLog that's 500 000 files on HDFS.

J-D

On Tue, Jan 12, 2010 at 12:24 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
> Hi Ryan,
>
> thanks for ur response.
>
>>Right now each regionserver has 1 log, so if 2 puts on different
>>tables hit the same RS, they hit the same HLog.
>
> I understand. My point was that the application could insert the same record
> into two different tables on two different Hbase instances on two different
> piece of hardware.
>
> On a related note, can somebody explain what the tradeoff is if each region
> has its own hlog? are you worried about the number of files in HDFS? or
> maybe the number of sync-threads in the region server? Can multiple hlog
> files provide faster region splits?
>
>
>> I've thought about this issue quite a bit, and I think the sync every
>> 1 rows combined with optional no-sync and low time sync() is the way
>> to go. If you want to discuss this more in person, maybe we can meet
>> up for brews or something.
>>
>
> The group-commit thing I can understand. HDFS does a very similar thing. But
> can you explain your alternative "sync every 1 rows combined with optional
> no-sync and low time sync"? For those applications that have the natural
> characteristics of updating only one row per logical operation, how can they
> be sure that their data has reached some-sort-of-stable-storage unless they
> sync after every row update?
>
> thanks,
> dhruba
>
+
Jean-Daniel Cryans 2010-01-12, 19:53
+
Andrew Purtell 2010-01-12, 20:49
+
Kannan Muthukkaruppan 2010-01-12, 21:07
+
Jean-Daniel Cryans 2010-01-12, 21:36
+
stack 2010-01-13, 01:23
+
stack 2010-01-13, 05:12
+
stack 2010-01-13, 05:16
+
Joydeep Sarma 2010-01-13, 05:41
+
Dhruba Borthakur 2010-01-13, 16:51
+
Jean-Daniel Cryans 2010-01-13, 17:56
+
Dhruba Borthakur 2010-01-13, 18:38
+
Jean-Daniel Cryans 2010-01-13, 18:40
+
Dhruba Borthakur 2010-01-13, 18:43
+
Joydeep Sarma 2010-01-13, 19:01
+
Jean-Daniel Cryans 2010-01-13, 22:56
+
stack 2010-01-12, 17:58
+
Dhruba Borthakur 2010-01-12, 18:14
+
stack 2010-01-12, 18:51
+
Kannan Muthukkaruppan 2010-01-12, 19:29
+
Jean-Daniel Cryans 2010-01-12, 19:43
+
Kannan Muthukkaruppan 2010-01-12, 20:10