|
|
-
Re: Python client lib for Accumulo?
Edmon Begoli 2012-07-27, 03:15
Hi folks, I have just joined the list with the purpose of volunteering ideas, design and development (and whatever else in lifecycle) related to development of the Python client for accumulo. I have developed several RESTful clients and libraries before using web.py and I am about to write another in Tornado ( http://www.tornadoweb.org/). I think that we could have a very nice, scalable and fast RESTful API for Accumulo through Tornado. I would also like to develop pure Python library for accumulo similar to HappyBase for HBase ( https://github.com/wbolster/happybase). I work at Oak Ridge National Lab as a software engineer and tech. lead on "big data" projects, I can devote time, possibly bring more team members and I would be happy to collaborate. Collaborations are welcome. I could certainly start a small wiki outlining the ideas and open them for discussion. Regards and please advise, Edmon On Wed, May 2, 2012 at 11:31 AM, Jason Trost <[EMAIL PROTECTED]> wrote: > I noticed that there are no JIRAs for a python client > interface/lib/API for Accumulo. How involved would it be to develop > AND maintain a python client for Accumulo? > > I realize that Jython can be used, but I am interested in a native > python lib that can be use more broadly with systems that don't work > with Jython. > > In order to do this, it seems like we would need to: > 1. generate the python thrift bindings code (this is trivial) > 2. develop and maintain the python glue code to use the thrift code > and python zookeeper code to interact with the various accumulo > components. The current Java "glue" code looks quite long. How often > does this code change (in terms of new features or changes in > protocol, not bug fixes)? > I would advise against rewriting the accumulo client code in python. The code that finds tablets, retries in case of failure, parallelizes read/writes, etc is fairly complex. I think the proxy option is best. David and Eric mentioned REST and Thrift proxies. If we were to go to down the route of writing the client code in another language, I think C++ with a C API would be the best option because many language can easily bind to a C API. > Ideally the python API would be very similar to the Java interface > (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value, > Mutation, etc). > > I guess what I am trying to get at is, does the Accumulo dev community > think it's worth the time and effort to develop and maintain a python > API? I personally think it is in order to help with adoption and > integration with other systems (Django is the primary system I want to > be able to use with it). I have some time to help this along, but I > don't think I have enough time to take this on alone. Is anyone else > interested in working together on this? > > Thanks, > > --Jason
+
Edmon Begoli 2012-07-27, 03:15
-
Re: Python client lib for Accumulo?
Keith Turner 2012-07-27, 16:43
Does anyone know anything about Py4J? http://py4j.sourceforge.net/index.htmlI have never used it, but I am wondering if it would fit the bill? On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: > Hi folks, > > I have just joined the list with the purpose of volunteering ideas, > design and development (and whatever else in lifecycle) > related to development of the Python client for accumulo. > > I have developed several RESTful clients and libraries before using > web.py and I am about to write another in Tornado > ( http://www.tornadoweb.org/). > > I think that we could have a very nice, scalable and fast RESTful API > for Accumulo through Tornado. > > I would also like to develop pure Python library for accumulo similar > to HappyBase for HBase ( https://github.com/wbolster/happybase). > > I work at Oak Ridge National Lab as a software engineer and tech. lead > on "big data" projects, > I can devote time, possibly bring more team members and I would be > happy to collaborate. Collaborations are welcome. > > I could certainly start a small wiki outlining the ideas and open them > for discussion. > > Regards and please advise, > Edmon > > > On Wed, May 2, 2012 at 11:31 AM, Jason Trost <[EMAIL PROTECTED]> wrote: >> I noticed that there are no JIRAs for a python client >> interface/lib/API for Accumulo. How involved would it be to develop >> AND maintain a python client for Accumulo? >> >> I realize that Jython can be used, but I am interested in a native >> python lib that can be use more broadly with systems that don't work >> with Jython. >> >> In order to do this, it seems like we would need to: >> 1. generate the python thrift bindings code (this is trivial) >> 2. develop and maintain the python glue code to use the thrift code >> and python zookeeper code to interact with the various accumulo >> components. The current Java "glue" code looks quite long. How often >> does this code change (in terms of new features or changes in >> protocol, not bug fixes)? >> > > I would advise against rewriting the accumulo client code in python. > The code that finds tablets, retries in case of failure, parallelizes > read/writes, etc is fairly complex. I think the proxy option is best. > David and Eric mentioned REST and Thrift proxies. > > If we were to go to down the route of writing the client code in > another language, I think C++ with a C API would be the best option > because many language can easily bind to a C API. > >> Ideally the python API would be very similar to the Java interface >> (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value, >> Mutation, etc). >> >> I guess what I am trying to get at is, does the Accumulo dev community >> think it's worth the time and effort to develop and maintain a python >> API? I personally think it is in order to help with adoption and >> integration with other systems (Django is the primary system I want to >> be able to use with it). I have some time to help this along, but I >> don't think I have enough time to take this on alone. Is anyone else >> interested in working together on this? >> >> Thanks, >> >> --Jason
+
Keith Turner 2012-07-27, 16:43
-
Re: Python client lib for Accumulo?
David Medinets 2012-07-27, 09:27
On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: > I have just joined the list with the purpose of volunteering ideas, > design and development (and whatever else in lifecycle) > related to development of the Python client for accumulo.
Welcome to the list. There are a lot of Python developers and I'm sure that your client would be well received by the community. My own advice is to write whatever is simplest (fastest to develop) and iterate towards a more complex complete solution.
Would jython be any use to provide python access to the existing Java API without any rewrite or plumbing needed?
+
David Medinets 2012-07-27, 09:27
-
Re: Python client lib for Accumulo?
Edmon Begoli 2012-07-27, 11:19
Hi David,
I think that Jython is a good idea as at least a prototype or as a bridge towards a full blown python library.
It is probably not a good end state because most Python developers do not want JVM and Java environment, and there is also performance overhead.
Personally, I program in both languages, so I am good.
Is there a particular protocol about contributing to accumulo project? On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: > > I have just joined the list with the purpose of volunteering ideas, > > design and development (and whatever else in lifecycle) > > related to development of the Python client for accumulo. > > Welcome to the list. There are a lot of Python developers and I'm sure > that your client would be well received by the community. My own > advice is to write whatever is simplest (fastest to develop) and > iterate towards a more complex complete solution. > > Would jython be any use to provide python access to the existing Java > API without any rewrite or plumbing needed? >
+
Edmon Begoli 2012-07-27, 11:19
-
Re: Python client lib for Accumulo?
Jim Klucar 2012-07-27, 11:37
Welcome Edmon. I think as far as a pure python library goes, you would have to interface with the thrift protocols. My sense is that would be discouraged at this point by the devs. I do have some experience with it though, I made an attempt to interface to Accumulo with Node.js. It turned into me writing the JavaScript version of TCompactProtocol, but it's still incomplete at this point. I would vote for either developing an officially supported Thrift interface, or an officially supported REST interface using a JVM language. Then the language barrier would be easier to overcome.
Jim
On Jul 27, 2012, at 7:19 AM, Edmon Begoli <[EMAIL PROTECTED]> wrote:
> Hi David, > > I think that Jython is a good idea as at least a prototype or as a bridge > towards a full blown python library. > > It is probably not a good end state because most Python developers do not > want JVM and Java environment, and there is also performance overhead. > > Personally, I program in both languages, so I am good. > > Is there a particular protocol about contributing to accumulo project? > On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> wrote: > >> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >>> I have just joined the list with the purpose of volunteering ideas, >>> design and development (and whatever else in lifecycle) >>> related to development of the Python client for accumulo. >> >> Welcome to the list. There are a lot of Python developers and I'm sure >> that your client would be well received by the community. My own >> advice is to write whatever is simplest (fastest to develop) and >> iterate towards a more complex complete solution. >> >> Would jython be any use to provide python access to the existing Java >> API without any rewrite or plumbing needed? >>
+
Jim Klucar 2012-07-27, 11:37
-
Re: Python client lib for Accumulo?
David Medinets 2012-07-27, 12:50
Which reminds me. There was a discussion of using a REST interface on this list. Several people liked that approach because it would provide loose coupling between client and server. Also the client could use any language. At the time, nobody could spare the time to implement it.
On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <[EMAIL PROTECTED]> wrote: > Welcome Edmon. I think as far as a pure python library goes, you would > have to interface with the thrift protocols. My sense is that would be > discouraged at this point by the devs. I do have some experience with > it though, I made an attempt to interface to Accumulo with Node.js. It > turned into me writing the JavaScript version of TCompactProtocol, but > it's still incomplete at this point. I would vote for either > developing an officially supported Thrift interface, or an officially > supported REST interface using a JVM language. Then the language > barrier would be easier to overcome. > > Jim > > On Jul 27, 2012, at 7:19 AM, Edmon Begoli <[EMAIL PROTECTED]> wrote: > >> Hi David, >> >> I think that Jython is a good idea as at least a prototype or as a bridge >> towards a full blown python library. >> >> It is probably not a good end state because most Python developers do not >> want JVM and Java environment, and there is also performance overhead. >> >> Personally, I program in both languages, so I am good. >> >> Is there a particular protocol about contributing to accumulo project? >> On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> wrote: >> >>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >>>> I have just joined the list with the purpose of volunteering ideas, >>>> design and development (and whatever else in lifecycle) >>>> related to development of the Python client for accumulo. >>> >>> Welcome to the list. There are a lot of Python developers and I'm sure >>> that your client would be well received by the community. My own >>> advice is to write whatever is simplest (fastest to develop) and >>> iterate towards a more complex complete solution. >>> >>> Would jython be any use to provide python access to the existing Java >>> API without any rewrite or plumbing needed? >>>
+
David Medinets 2012-07-27, 12:50
-
Re: Python client lib for Accumulo?
Jim Klucar 2012-07-27, 13:01
I have a small proof of concept going. I'm still not sure what the best way to do results paging is (i.e. your scan has a billion results and won't fit in memory) My initial work is moving towards opening up a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming API. The other thing I've been playing with are using websockets, but that may restrict you to using JavaScript but I'm sure more client side websocket libraries are coming.
On Fri, Jul 27, 2012 at 8:50 AM, David Medinets <[EMAIL PROTECTED]> wrote: > Which reminds me. There was a discussion of using a REST interface on > this list. Several people liked that approach because it would provide > loose coupling between client and server. Also the client could use > any language. At the time, nobody could spare the time to implement > it. > > On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <[EMAIL PROTECTED]> wrote: >> Welcome Edmon. I think as far as a pure python library goes, you would >> have to interface with the thrift protocols. My sense is that would be >> discouraged at this point by the devs. I do have some experience with >> it though, I made an attempt to interface to Accumulo with Node.js. It >> turned into me writing the JavaScript version of TCompactProtocol, but >> it's still incomplete at this point. I would vote for either >> developing an officially supported Thrift interface, or an officially >> supported REST interface using a JVM language. Then the language >> barrier would be easier to overcome. >> >> Jim >> >> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >> >>> Hi David, >>> >>> I think that Jython is a good idea as at least a prototype or as a bridge >>> towards a full blown python library. >>> >>> It is probably not a good end state because most Python developers do not >>> want JVM and Java environment, and there is also performance overhead. >>> >>> Personally, I program in both languages, so I am good. >>> >>> Is there a particular protocol about contributing to accumulo project? >>> On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> wrote: >>> >>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >>>>> I have just joined the list with the purpose of volunteering ideas, >>>>> design and development (and whatever else in lifecycle) >>>>> related to development of the Python client for accumulo. >>>> >>>> Welcome to the list. There are a lot of Python developers and I'm sure >>>> that your client would be well received by the community. My own >>>> advice is to write whatever is simplest (fastest to develop) and >>>> iterate towards a more complex complete solution. >>>> >>>> Would jython be any use to provide python access to the existing Java >>>> API without any rewrite or plumbing needed? >>>>
+
Jim Klucar 2012-07-27, 13:01
-
Re: Python client lib for Accumulo?
Edmon Begoli 2012-07-27, 13:06
Just let me know how and if we want to collaborate on this.
As for RESTful API and paging, I think we could also look into OData-like protocol conventions that specify an API to scroll through the result set using 'skip' and 'top' in addition to opening the stream.
Edmon
On Fri, Jul 27, 2012 at 9:01 AM, Jim Klucar <[EMAIL PROTECTED]> wrote: > I have a small proof of concept going. I'm still not sure what the > best way to do results paging is (i.e. your scan has a billion results > and won't fit in memory) My initial work is moving towards opening up > a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming > API. The other thing I've been playing with are using websockets, but > that may restrict you to using JavaScript but I'm sure more client > side websocket libraries are coming. > > On Fri, Jul 27, 2012 at 8:50 AM, David Medinets > <[EMAIL PROTECTED]> wrote: >> Which reminds me. There was a discussion of using a REST interface on >> this list. Several people liked that approach because it would provide >> loose coupling between client and server. Also the client could use >> any language. At the time, nobody could spare the time to implement >> it. >> >> On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <[EMAIL PROTECTED]> wrote: >>> Welcome Edmon. I think as far as a pure python library goes, you would >>> have to interface with the thrift protocols. My sense is that would be >>> discouraged at this point by the devs. I do have some experience with >>> it though, I made an attempt to interface to Accumulo with Node.js. It >>> turned into me writing the JavaScript version of TCompactProtocol, but >>> it's still incomplete at this point. I would vote for either >>> developing an officially supported Thrift interface, or an officially >>> supported REST interface using a JVM language. Then the language >>> barrier would be easier to overcome. >>> >>> Jim >>> >>> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >>> >>>> Hi David, >>>> >>>> I think that Jython is a good idea as at least a prototype or as a bridge >>>> towards a full blown python library. >>>> >>>> It is probably not a good end state because most Python developers do not >>>> want JVM and Java environment, and there is also performance overhead. >>>> >>>> Personally, I program in both languages, so I am good. >>>> >>>> Is there a particular protocol about contributing to accumulo project? >>>> On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> wrote: >>>> >>>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> wrote: >>>>>> I have just joined the list with the purpose of volunteering ideas, >>>>>> design and development (and whatever else in lifecycle) >>>>>> related to development of the Python client for accumulo. >>>>> >>>>> Welcome to the list. There are a lot of Python developers and I'm sure >>>>> that your client would be well received by the community. My own >>>>> advice is to write whatever is simplest (fastest to develop) and >>>>> iterate towards a more complex complete solution. >>>>> >>>>> Would jython be any use to provide python access to the existing Java >>>>> API without any rewrite or plumbing needed? >>>>>
+
Edmon Begoli 2012-07-27, 13:06
-
Re: Python client lib for Accumulo?
Adam Fuchs 2012-07-27, 13:16
One of the big challenges of connecting directly to the existing thrift services is that there is a lot of logic imbedded in the Java client libraries that would have to be recreated. This includes things like finding tablets, managing multiple connections, handling tablet migration, handling read and write threads, etc. Sapan Shah was working on building a thrift proxy that would make native python bindings a lot simpler: see ACCUMULO-482. Maybe we can encourage him to continue to work on that if we all ask nicely.
Adam On Fri, Jul 27, 2012 at 9:01 AM, Jim Klucar <[EMAIL PROTECTED]> wrote:
> I have a small proof of concept going. I'm still not sure what the > best way to do results paging is (i.e. your scan has a billion results > and won't fit in memory) My initial work is moving towards opening up > a HTTP/1.1 chunked-encoded stream like Twitter does for its streaming > API. The other thing I've been playing with are using websockets, but > that may restrict you to using JavaScript but I'm sure more client > side websocket libraries are coming. > > On Fri, Jul 27, 2012 at 8:50 AM, David Medinets > <[EMAIL PROTECTED]> wrote: > > Which reminds me. There was a discussion of using a REST interface on > > this list. Several people liked that approach because it would provide > > loose coupling between client and server. Also the client could use > > any language. At the time, nobody could spare the time to implement > > it. > > > > On Fri, Jul 27, 2012 at 7:37 AM, Jim Klucar <[EMAIL PROTECTED]> wrote: > >> Welcome Edmon. I think as far as a pure python library goes, you would > >> have to interface with the thrift protocols. My sense is that would be > >> discouraged at this point by the devs. I do have some experience with > >> it though, I made an attempt to interface to Accumulo with Node.js. It > >> turned into me writing the JavaScript version of TCompactProtocol, but > >> it's still incomplete at this point. I would vote for either > >> developing an officially supported Thrift interface, or an officially > >> supported REST interface using a JVM language. Then the language > >> barrier would be easier to overcome. > >> > >> Jim > >> > >> On Jul 27, 2012, at 7:19 AM, Edmon Begoli <[EMAIL PROTECTED]> wrote: > >> > >>> Hi David, > >>> > >>> I think that Jython is a good idea as at least a prototype or as a > bridge > >>> towards a full blown python library. > >>> > >>> It is probably not a good end state because most Python developers do > not > >>> want JVM and Java environment, and there is also performance overhead. > >>> > >>> Personally, I program in both languages, so I am good. > >>> > >>> Is there a particular protocol about contributing to accumulo project? > >>> On Jul 27, 2012 5:27 AM, "David Medinets" <[EMAIL PROTECTED]> > wrote: > >>> > >>>> On Thu, Jul 26, 2012 at 11:15 PM, Edmon Begoli <[EMAIL PROTECTED]> > wrote: > >>>>> I have just joined the list with the purpose of volunteering ideas, > >>>>> design and development (and whatever else in lifecycle) > >>>>> related to development of the Python client for accumulo. > >>>> > >>>> Welcome to the list. There are a lot of Python developers and I'm sure > >>>> that your client would be well received by the community. My own > >>>> advice is to write whatever is simplest (fastest to develop) and > >>>> iterate towards a more complex complete solution. > >>>> > >>>> Would jython be any use to provide python access to the existing Java > >>>> API without any rewrite or plumbing needed? > >>>> >
+
Adam Fuchs 2012-07-27, 13:16
|
|