|
Adrien Mogenet
2013-01-13, 00:06
Ted
2013-01-13, 01:38
anil gupta
2013-01-13, 02:30
Andrew Purtell
2013-01-13, 02:39
Ted Yu
2013-01-13, 02:48
Andrew Purtell
2013-01-13, 03:58
ramkrishna vasudevan
2013-01-13, 10:04
Adrien Mogenet
2013-01-13, 10:42
Michel Segel
2013-01-13, 13:25
Anoop John
2013-01-13, 16:12
Wei Tan
2013-01-15, 18:44
Varun Sharma
2013-01-15, 18:56
Andrew Purtell
2013-01-15, 19:20
Wei Tan
2013-01-15, 22:41
Anoop Sam John
2013-01-16, 04:39
|
-
Coprocessor / threading modelAdrien Mogenet 2013-01-13, 00:06
Hi there,
I'm experiencing some issues with CP. I'm trying to implement an indexing solution (inspired by Annop's slides). In pre-put, I trigger another Put() in an external table (to build the secondary index). It works perfect for one client, but when I'm inserting data from 2 separate clients, I met issues with HTable object (the one used in pre-Put()), because it's not thread-safe. I decided to move on TablePool and that fixed my issue. But if I increase the write-load (and concurrency) HBase is throwing a OOM exception because it can't create new native threads. Looking at HBase metrics "threads count", I see that roughly 3500 threads are created. I'm looking for documentation about how CPs are working with threads : what/when should I protect against concurrency issues ? How may I solve my issue ? Help is welcome :-) -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
-
Re: Coprocessor / threading modelTed 2013-01-13, 01:38
Please take a look at hbase-6651 which improves thread safety of table pool.
Are you using hbase 0.94 ? Thanks On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED]> wrote: > Hi there, > > I'm experiencing some issues with CP. I'm trying to implement an indexing > solution (inspired by Annop's slides). In pre-put, I trigger another Put() > in an external table (to build the secondary index). It works perfect for > one client, but when I'm inserting data from 2 separate clients, I met > issues with HTable object (the one used in pre-Put()), because it's not > thread-safe. I decided to move on TablePool and that fixed my issue. > > But if I increase the write-load (and concurrency) HBase is throwing a OOM > exception because it can't create new native threads. Looking at HBase > metrics "threads count", I see that roughly 3500 threads are created. > > I'm looking for documentation about how CPs are working with threads : > what/when should I protect against concurrency issues ? How may I solve my > issue ? > > Help is welcome :-) > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me
-
Re: Coprocessor / threading modelanil gupta 2013-01-13, 02:30
I also ran into similar problem with one of my secondary index
implementation. But, i could not dig into the problem as i have to shift focus on some other stuff. I am also interested in knowing the resolution of this kind of problem in Coprocessors. On Sat, Jan 12, 2013 at 5:38 PM, Ted <[EMAIL PROTECTED]> wrote: > Please take a look at hbase-6651 which improves thread safety of table > pool. > > Are you using hbase 0.94 ? > > Thanks > > On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED]> > wrote: > > > Hi there, > > > > I'm experiencing some issues with CP. I'm trying to implement an indexing > > solution (inspired by Annop's slides). In pre-put, I trigger another > Put() > > in an external table (to build the secondary index). It works perfect for > > one client, but when I'm inserting data from 2 separate clients, I met > > issues with HTable object (the one used in pre-Put()), because it's not > > thread-safe. I decided to move on TablePool and that fixed my issue. > > > > But if I increase the write-load (and concurrency) HBase is throwing a > OOM > > exception because it can't create new native threads. Looking at HBase > > metrics "threads count", I see that roughly 3500 threads are created. > > > > I'm looking for documentation about how CPs are working with threads : > > what/when should I protect against concurrency issues ? How may I solve > my > > issue ? > > > > Help is welcome :-) > > > > -- > > Adrien Mogenet > > 06.59.16.64.22 > > http://www.mogenet.me > -- Thanks & Regards, Anil Gupta
-
Re: Coprocessor / threading modelAndrew Purtell 2013-01-13, 02:39
> In pre-put, I trigger another Put() in an external table (to build the
secondary index). We should probably call this a Coprocessor anti-pattern. Coprocessors are meant to operate on the region to which they are associated. They are a way you can extend HBase function while it operates in region on data for the region. Think of them as loadable kernel modules. They are not a general purpose server side platform for programming as if you are building a HBase client (with HTable, etc.). Just because you can do this doesn't mean you should. On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED]>wrote: > Hi there, > > I'm experiencing some issues with CP. I'm trying to implement an indexing > solution (inspired by Annop's slides). In pre-put, I trigger another Put() > in an external table (to build the secondary index). It works perfect for > one client, but when I'm inserting data from 2 separate clients, I met > issues with HTable object (the one used in pre-Put()), because it's not > thread-safe. I decided to move on TablePool and that fixed my issue. > > But if I increase the write-load (and concurrency) HBase is throwing a OOM > exception because it can't create new native threads. Looking at HBase > metrics "threads count", I see that roughly 3500 threads are created. > > I'm looking for documentation about how CPs are working with threads : > what/when should I protect against concurrency issues ? How may I solve my > issue ? > > Help is welcome :-) > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Coprocessor / threading modelTed Yu 2013-01-13, 02:48
bq. Coprocessors are meant to operate on the region to which they are
associated. For Anoop's case, the secondary table(s) have their regions aligned with the corresponding region from primary table. Meaning, related regions are served by the same region server. Would writes to such regions of secondary table(s) be acceptable ? Thanks On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > In pre-put, I trigger another Put() in an external table (to build the > secondary index). > > We should probably call this a Coprocessor anti-pattern. > > Coprocessors are meant to operate on the region to which they are > associated. They are a way you can extend HBase function while it operates > in region on data for the region. Think of them as loadable kernel modules. > They are not a general purpose server side platform for programming as if > you are building a HBase client (with HTable, etc.). Just because you can > do this doesn't mean you should. > > > On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED] > >wrote: > > > Hi there, > > > > I'm experiencing some issues with CP. I'm trying to implement an indexing > > solution (inspired by Annop's slides). In pre-put, I trigger another > Put() > > in an external table (to build the secondary index). It works perfect for > > one client, but when I'm inserting data from 2 separate clients, I met > > issues with HTable object (the one used in pre-Put()), because it's not > > thread-safe. I decided to move on TablePool and that fixed my issue. > > > > But if I increase the write-load (and concurrency) HBase is throwing a > OOM > > exception because it can't create new native threads. Looking at HBase > > metrics "threads count", I see that roughly 3500 threads are created. > > > > I'm looking for documentation about how CPs are working with threads : > > what/when should I protect against concurrency issues ? How may I solve > my > > issue ? > > > > Help is welcome :-) > > > > -- > > Adrien Mogenet > > 06.59.16.64.22 > > http://www.mogenet.me > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
-
Re: Coprocessor / threading modelAndrew Purtell 2013-01-13, 03:58
Yes, especially if the cross region communication is in process.
On Jan 12, 2013, at 6:48 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > bq. Coprocessors are meant to operate on the region to which they are > associated. > > For Anoop's case, the secondary table(s) have their regions aligned with > the corresponding region from primary table. Meaning, related regions are > served by the same region server. > Would writes to such regions of secondary table(s) be acceptable ? > > Thanks > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > >>> In pre-put, I trigger another Put() in an external table (to build the >> secondary index). >> >> We should probably call this a Coprocessor anti-pattern. >> >> Coprocessors are meant to operate on the region to which they are >> associated. They are a way you can extend HBase function while it operates >> in region on data for the region. Think of them as loadable kernel modules. >> They are not a general purpose server side platform for programming as if >> you are building a HBase client (with HTable, etc.). Just because you can >> do this doesn't mean you should. >> >> >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED] >>> wrote: >> >>> Hi there, >>> >>> I'm experiencing some issues with CP. I'm trying to implement an indexing >>> solution (inspired by Annop's slides). In pre-put, I trigger another >> Put() >>> in an external table (to build the secondary index). It works perfect for >>> one client, but when I'm inserting data from 2 separate clients, I met >>> issues with HTable object (the one used in pre-Put()), because it's not >>> thread-safe. I decided to move on TablePool and that fixed my issue. >>> >>> But if I increase the write-load (and concurrency) HBase is throwing a >> OOM >>> exception because it can't create new native threads. Looking at HBase >>> metrics "threads count", I see that roughly 3500 threads are created. >>> >>> I'm looking for documentation about how CPs are working with threads : >>> what/when should I protect against concurrency issues ? How may I solve >> my >>> issue ? >>> >>> Help is welcome :-) >>> >>> -- >>> Adrien Mogenet >>> 06.59.16.64.22 >>> http://www.mogenet.me >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >>
-
Re: Coprocessor / threading modelramkrishna vasudevan 2013-01-13, 10:04
In Anoop's soln its basicallly the put happens directly on the index region
rather than doing a put thro HTable. Regards Ram On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > Yes, especially if the cross region communication is in process. > > On Jan 12, 2013, at 6:48 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > bq. Coprocessors are meant to operate on the region to which they are > > associated. > > > > For Anoop's case, the secondary table(s) have their regions aligned with > > the corresponding region from primary table. Meaning, related regions are > > served by the same region server. > > Would writes to such regions of secondary table(s) be acceptable ? > > > > Thanks > > > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > >>> In pre-put, I trigger another Put() in an external table (to build the > >> secondary index). > >> > >> We should probably call this a Coprocessor anti-pattern. > >> > >> Coprocessors are meant to operate on the region to which they are > >> associated. They are a way you can extend HBase function while it > operates > >> in region on data for the region. Think of them as loadable kernel > modules. > >> They are not a general purpose server side platform for programming as > if > >> you are building a HBase client (with HTable, etc.). Just because you > can > >> do this doesn't mean you should. > >> > >> > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet < > [EMAIL PROTECTED] > >>> wrote: > >> > >>> Hi there, > >>> > >>> I'm experiencing some issues with CP. I'm trying to implement an > indexing > >>> solution (inspired by Annop's slides). In pre-put, I trigger another > >> Put() > >>> in an external table (to build the secondary index). It works perfect > for > >>> one client, but when I'm inserting data from 2 separate clients, I met > >>> issues with HTable object (the one used in pre-Put()), because it's not > >>> thread-safe. I decided to move on TablePool and that fixed my issue. > >>> > >>> But if I increase the write-load (and concurrency) HBase is throwing a > >> OOM > >>> exception because it can't create new native threads. Looking at HBase > >>> metrics "threads count", I see that roughly 3500 threads are created. > >>> > >>> I'm looking for documentation about how CPs are working with threads : > >>> what/when should I protect against concurrency issues ? How may I solve > >> my > >>> issue ? > >>> > >>> Help is welcome :-) > >>> > >>> -- > >>> Adrien Mogenet > >>> 06.59.16.64.22 > >>> http://www.mogenet.me > >> > >> > >> > >> -- > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> >
-
Re: Coprocessor / threading modelAdrien Mogenet 2013-01-13, 10:42
Thanks for pointing me out the Jira, that's useful for my understanding.
I'm using HBase 0.94.3, and regions of main and index table are co-located on the same RS as in Anoop's design. I'll browse the API tomorrow to find out how to not use HTable but inter-CPs communication. On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan < [EMAIL PROTECTED]> wrote: > In Anoop's soln its basicallly the put happens directly on the index region > rather than doing a put thro HTable. > > Regards > Ram > > On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <[EMAIL PROTECTED] > >wrote: > > > Yes, especially if the cross region communication is in process. > > > > On Jan 12, 2013, at 6:48 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > bq. Coprocessors are meant to operate on the region to which they are > > > associated. > > > > > > For Anoop's case, the secondary table(s) have their regions aligned > with > > > the corresponding region from primary table. Meaning, related regions > are > > > served by the same region server. > > > Would writes to such regions of secondary table(s) be acceptable ? > > > > > > Thanks > > > > > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > >>> In pre-put, I trigger another Put() in an external table (to build > the > > >> secondary index). > > >> > > >> We should probably call this a Coprocessor anti-pattern. > > >> > > >> Coprocessors are meant to operate on the region to which they are > > >> associated. They are a way you can extend HBase function while it > > operates > > >> in region on data for the region. Think of them as loadable kernel > > modules. > > >> They are not a general purpose server side platform for programming as > > if > > >> you are building a HBase client (with HTable, etc.). Just because you > > can > > >> do this doesn't mean you should. > > >> > > >> > > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet < > > [EMAIL PROTECTED] > > >>> wrote: > > >> > > >>> Hi there, > > >>> > > >>> I'm experiencing some issues with CP. I'm trying to implement an > > indexing > > >>> solution (inspired by Annop's slides). In pre-put, I trigger another > > >> Put() > > >>> in an external table (to build the secondary index). It works perfect > > for > > >>> one client, but when I'm inserting data from 2 separate clients, I > met > > >>> issues with HTable object (the one used in pre-Put()), because it's > not > > >>> thread-safe. I decided to move on TablePool and that fixed my issue. > > >>> > > >>> But if I increase the write-load (and concurrency) HBase is throwing > a > > >> OOM > > >>> exception because it can't create new native threads. Looking at > HBase > > >>> metrics "threads count", I see that roughly 3500 threads are created. > > >>> > > >>> I'm looking for documentation about how CPs are working with threads > : > > >>> what/when should I protect against concurrency issues ? How may I > solve > > >> my > > >>> issue ? > > >>> > > >>> Help is welcome :-) > > >>> > > >>> -- > > >>> Adrien Mogenet > > >>> 06.59.16.64.22 > > >>> http://www.mogenet.me > > >> > > >> > > >> > > >> -- > > >> Best regards, > > >> > > >> - Andy > > >> > > >> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > >> (via Tom White) > > >> > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
-
Re: Coprocessor / threading modelMichel Segel 2013-01-13, 13:25
There are a couple of different designs that you can use to perform the write to the secondary index.
I wouldn't call this an anti-pattern... (AP's comment) Using htablepool wouldn't be my first choice, unless you are writing to a durable queue first which then uses the pool to write to the table. This could work as part of a more general solution to handle indexing at a more general level. But that is a longer discussion. Sent from a remote device. Please excuse any typos... Mike Segel On Jan 12, 2013, at 6:06 PM, Adrien Mogenet <[EMAIL PROTECTED]> wrote: > Hi there, > > I'm experiencing some issues with CP. I'm trying to implement an indexing > solution (inspired by Annop's slides). In pre-put, I trigger another Put() > in an external table (to build the secondary index). It works perfect for > one client, but when I'm inserting data from 2 separate clients, I met > issues with HTable object (the one used in pre-Put()), because it's not > thread-safe. I decided to move on TablePool and that fixed my issue. > > But if I increase the write-load (and concurrency) HBase is throwing a OOM > exception because it can't create new native threads. Looking at HBase > metrics "threads count", I see that roughly 3500 threads are created. > > I'm looking for documentation about how CPs are working with threads : > what/when should I protect against concurrency issues ? How may I solve my > issue ? > > Help is welcome :-) > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me
-
Re: Coprocessor / threading modelAnoop John 2013-01-13, 16:12
In your CP methods you will get ObserverContext object from which you can
get HRS object. ObserverContext.getEnvironment().getRegionServerServices() >From this HRS you can get hold to any of the region served by that RS. Then directly call methods on HRegion to insert data. :) Good luck.. -Anoop- On Sun, Jan 13, 2013 at 4:12 PM, Adrien Mogenet <[EMAIL PROTECTED]>wrote: > Thanks for pointing me out the Jira, that's useful for my understanding. > I'm using HBase 0.94.3, and regions of main and index table are co-located > on the same RS as in Anoop's design. I'll browse the API tomorrow to find > out how to not use HTable but inter-CPs communication. > > > On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan < > [EMAIL PROTECTED]> wrote: > > > In Anoop's soln its basicallly the put happens directly on the index > region > > rather than doing a put thro HTable. > > > > Regards > > Ram > > > > On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell < > [EMAIL PROTECTED] > > >wrote: > > > > > Yes, especially if the cross region communication is in process. > > > > > > On Jan 12, 2013, at 6:48 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > bq. Coprocessors are meant to operate on the region to which they are > > > > associated. > > > > > > > > For Anoop's case, the secondary table(s) have their regions aligned > > with > > > > the corresponding region from primary table. Meaning, related regions > > are > > > > served by the same region server. > > > > Would writes to such regions of secondary table(s) be acceptable ? > > > > > > > > Thanks > > > > > > > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <[EMAIL PROTECTED] > > > > > wrote: > > > > > > > >>> In pre-put, I trigger another Put() in an external table (to build > > the > > > >> secondary index). > > > >> > > > >> We should probably call this a Coprocessor anti-pattern. > > > >> > > > >> Coprocessors are meant to operate on the region to which they are > > > >> associated. They are a way you can extend HBase function while it > > > operates > > > >> in region on data for the region. Think of them as loadable kernel > > > modules. > > > >> They are not a general purpose server side platform for programming > as > > > if > > > >> you are building a HBase client (with HTable, etc.). Just because > you > > > can > > > >> do this doesn't mean you should. > > > >> > > > >> > > > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet < > > > [EMAIL PROTECTED] > > > >>> wrote: > > > >> > > > >>> Hi there, > > > >>> > > > >>> I'm experiencing some issues with CP. I'm trying to implement an > > > indexing > > > >>> solution (inspired by Annop's slides). In pre-put, I trigger > another > > > >> Put() > > > >>> in an external table (to build the secondary index). It works > perfect > > > for > > > >>> one client, but when I'm inserting data from 2 separate clients, I > > met > > > >>> issues with HTable object (the one used in pre-Put()), because it's > > not > > > >>> thread-safe. I decided to move on TablePool and that fixed my > issue. > > > >>> > > > >>> But if I increase the write-load (and concurrency) HBase is > throwing > > a > > > >> OOM > > > >>> exception because it can't create new native threads. Looking at > > HBase > > > >>> metrics "threads count", I see that roughly 3500 threads are > created. > > > >>> > > > >>> I'm looking for documentation about how CPs are working with > threads > > : > > > >>> what/when should I protect against concurrency issues ? How may I > > solve > > > >> my > > > >>> issue ? > > > >>> > > > >>> Help is welcome :-) > > > >>> > > > >>> -- > > > >>> Adrien Mogenet > > > >>> 06.59.16.64.22 > > > >>> http://www.mogenet.me > > > >> > > > >> > > > >> > > > >> -- > > > >> Best regards, > > > >> > > > >> - Andy > > > >> > > > >> Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > > >> (via Tom White) > > > >> > > > > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me
-
Re: Coprocessor / threading modelWei Tan 2013-01-15, 18:44
Andrew, could you explain more, why doing cross-table operation is an
anti-pattern of using CP? Durability might be an issue, as far as I understand. Thanks, Best Regards, Wei From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 01/12/2013 09:39 PM Subject: Re: Coprocessor / threading model > In pre-put, I trigger another Put() in an external table (to build the secondary index). We should probably call this a Coprocessor anti-pattern. Coprocessors are meant to operate on the region to which they are associated. They are a way you can extend HBase function while it operates in region on data for the region. Think of them as loadable kernel modules. They are not a general purpose server side platform for programming as if you are building a HBase client (with HTable, etc.). Just because you can do this doesn't mean you should. On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <[EMAIL PROTECTED]>wrote: > Hi there, > > I'm experiencing some issues with CP. I'm trying to implement an indexing > solution (inspired by Annop's slides). In pre-put, I trigger another Put() > in an external table (to build the secondary index). It works perfect for > one client, but when I'm inserting data from 2 separate clients, I met > issues with HTable object (the one used in pre-Put()), because it's not > thread-safe. I decided to move on TablePool and that fixed my issue. > > But if I increase the write-load (and concurrency) HBase is throwing a OOM > exception because it can't create new native threads. Looking at HBase > metrics "threads count", I see that roughly 3500 threads are created. > > I'm looking for documentation about how CPs are working with threads : > what/when should I protect against concurrency issues ? How may I solve my > issue ? > > Help is welcome :-) > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Coprocessor / threading modelVarun Sharma 2013-01-15, 18:56
You should look at the jstack - I think HTablePool is the reason for the
large number of threads. Note that HTablePool is a reusable pool HTable(s) and each HTable consists of an ExecutorService containing 1 thread by default. Are you closing the HTable you obtain from HTablePool - if you are not closing the HTable - that will incessantly increase your thread count. Also on 64 bit machines, I think each thread is allocated 256K or 512K of stack by default. Varun On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <[EMAIL PROTECTED]> wrote: > Andrew, could you explain more, why doing cross-table operation is an > anti-pattern of using CP? > Durability might be an issue, as far as I understand. Thanks, > > > Best Regards, > Wei > > > > > From: Andrew Purtell <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, > Date: 01/12/2013 09:39 PM > Subject: Re: Coprocessor / threading model > > > > > In pre-put, I trigger another Put() in an external table (to build the > secondary index). > > We should probably call this a Coprocessor anti-pattern. > > Coprocessors are meant to operate on the region to which they are > associated. They are a way you can extend HBase function while it operates > in region on data for the region. Think of them as loadable kernel > modules. > They are not a general purpose server side platform for programming as if > you are building a HBase client (with HTable, etc.). Just because you can > do this doesn't mean you should. > > > On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet > <[EMAIL PROTECTED]>wrote: > > > Hi there, > > > > I'm experiencing some issues with CP. I'm trying to implement an > indexing > > solution (inspired by Annop's slides). In pre-put, I trigger another > Put() > > in an external table (to build the secondary index). It works perfect > for > > one client, but when I'm inserting data from 2 separate clients, I met > > issues with HTable object (the one used in pre-Put()), because it's not > > thread-safe. I decided to move on TablePool and that fixed my issue. > > > > But if I increase the write-load (and concurrency) HBase is throwing a > OOM > > exception because it can't create new native threads. Looking at HBase > > metrics "threads count", I see that roughly 3500 threads are created. > > > > I'm looking for documentation about how CPs are working with threads : > > what/when should I protect against concurrency issues ? How may I solve > my > > issue ? > > > > Help is welcome :-) > > > > -- > > Adrien Mogenet > > 06.59.16.64.22 > > http://www.mogenet.me > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > >
-
Re: Coprocessor / threading modelAndrew Purtell 2013-01-15, 19:20
HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably persisted. For client convenience many additional details of remote region invocation are hidden, for example META table lookups for relocated regions, reconnection, retries. Just about all coprocessor upcalls for the Observer interface happen with the RPC handler context. RPC handlers are drawn from a fixed pool of threads. Your CP code is tying up one of a fixed resource for as long as it has control. And in here you are running the complex HTable machinery. For many reasons your method call on HTable may block (potentially for a long time) and therefore the RPC handler your invocation is executing within will also block. An accidental cycle can cause a deadlock once there are no free handlers somewhere, which will happen as part of normal operation when the cluster is loaded, and the higher the load the more likely. Instead you can do what Anoop has described in this thread and install a CP into the master that insures index regions are assigned to the same regionserver as the primary table, and then call from a region of the primary table into a colocated region of the index table, or vice versa, bypassing HTable and the RPC stack. This is just making an in process method call on one object from another. Or, you could allocate a small executor pool for cross region RPC. When the upcall into your CP happens, dispatch work to the executor and return immediately to release the RPC worker thread back to the pool. This would avoid the possibility of deadlock but this may not give you the semantics you want because that background work could lag unpredictably. On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <[EMAIL PROTECTED]> wrote: > Andrew, could you explain more, why doing cross-table operation is an > anti-pattern of using CP? > Durability might be an issue, as far as I understand. Thanks, > > > Best Regards, > Wei > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Coprocessor / threading modelWei Tan 2013-01-15, 22:41
Thanks Andrew for your detailed clarification.
Now I understand that in general, the system is subject to CAP theorem. You want good consistency AND latency, then partition tolerance needs to be sacrificed: this is the "local index" approach, i.e., colocate index and data and avoid RPC. Otherwise, if you can tolerate consistency but not latency, you put RPCs in a queue and process them in the background. By this means you can have a "global" index with some lag. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center Yorktown Heights, NY 10598 [EMAIL PROTECTED]; 914-945-4386 From: Andrew Purtell <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 01/15/2013 02:20 PM Subject: Re: Coprocessor / threading model HTable is a blocking interface. When a client issues a put, for example, we do not want to return until we can confirm the store has been durably persisted. For client convenience many additional details of remote region invocation are hidden, for example META table lookups for relocated regions, reconnection, retries. Just about all coprocessor upcalls for the Observer interface happen with the RPC handler context. RPC handlers are drawn from a fixed pool of threads. Your CP code is tying up one of a fixed resource for as long as it has control. And in here you are running the complex HTable machinery. For many reasons your method call on HTable may block (potentially for a long time) and therefore the RPC handler your invocation is executing within will also block. An accidental cycle can cause a deadlock once there are no free handlers somewhere, which will happen as part of normal operation when the cluster is loaded, and the higher the load the more likely. Instead you can do what Anoop has described in this thread and install a CP into the master that insures index regions are assigned to the same regionserver as the primary table, and then call from a region of the primary table into a colocated region of the index table, or vice versa, bypassing HTable and the RPC stack. This is just making an in process method call on one object from another. Or, you could allocate a small executor pool for cross region RPC. When the upcall into your CP happens, dispatch work to the executor and return immediately to release the RPC worker thread back to the pool. This would avoid the possibility of deadlock but this may not give you the semantics you want because that background work could lag unpredictably. On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <[EMAIL PROTECTED]> wrote: > Andrew, could you explain more, why doing cross-table operation is an > anti-pattern of using CP? > Durability might be an issue, as far as I understand. Thanks, > > > Best Regards, > Wei > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
RE: Coprocessor / threading modelAnoop Sam John 2013-01-16, 04:39
Thanks Andrew. A detailed and useful reply.... Nothing more needed to explain the anti pattern.. :)
-Anoop- ________________________________________ From: Andrew Purtell [[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2013 12:50 AM To: [EMAIL PROTECTED] Subject: Re: Coprocessor / threading model HTable is a blocking interface. When a client issues a put, for example, we do not want to return until we can confirm the store has been durably persisted. For client convenience many additional details of remote region invocation are hidden, for example META table lookups for relocated regions, reconnection, retries. Just about all coprocessor upcalls for the Observer interface happen with the RPC handler context. RPC handlers are drawn from a fixed pool of threads. Your CP code is tying up one of a fixed resource for as long as it has control. And in here you are running the complex HTable machinery. For many reasons your method call on HTable may block (potentially for a long time) and therefore the RPC handler your invocation is executing within will also block. An accidental cycle can cause a deadlock once there are no free handlers somewhere, which will happen as part of normal operation when the cluster is loaded, and the higher the load the more likely. Instead you can do what Anoop has described in this thread and install a CP into the master that insures index regions are assigned to the same regionserver as the primary table, and then call from a region of the primary table into a colocated region of the index table, or vice versa, bypassing HTable and the RPC stack. This is just making an in process method call on one object from another. Or, you could allocate a small executor pool for cross region RPC. When the upcall into your CP happens, dispatch work to the executor and return immediately to release the RPC worker thread back to the pool. This would avoid the possibility of deadlock but this may not give you the semantics you want because that background work could lag unpredictably. On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <[EMAIL PROTECTED]> wrote: > Andrew, could you explain more, why doing cross-table operation is an > anti-pattern of using CP? > Durability might be an issue, as far as I understand. Thanks, > > > Best Regards, > Wei > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) |