Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Accumulo-Pig Integration


Copy link to this message
-
Re: Accumulo-Pig Integration
Opened PIG-3573 for the necessary work and additional discussion.

Actually, they share a common parent class, but AccumuloStorage doesn't
depend on AccumuloKVStorage. I would imagine that both have some worth
for inclusion into Pig -- if there happen to be any concerns, we can
address them later.

Thanks for your help, Daniel!

On 11/13/13, 2:19 PM, Daniel Dai wrote:
> Would be good to open a Jira ticket and attach the patch there. We can
> discuss from there.
>
> If the code only relevant to Pig, we shall put in Pig. I would imagine you
> will need to use AccumuloKVStorage to convert Pig Tuple to Accumulo in
> AccumuloStorage, right? Sounds like this should goes to Pig.
>
> Thanks,
> Daniel
>
>
> On Wed, Nov 13, 2013 at 11:40 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>
>> Daniel,
>>
>> I definitely see the AccumuloStorage class as a good addition (the class
>> that mimics the functionality that HBaseStorage also provides).
>>
>> There also exists the AccumuloKVStorage which maps a Tuple into an
>> Accumulo Key (which is a 5-tuple) and Value. I did this as a quick example,
>> but I'm not sure how relevant/useful for users it would be. I would imagine
>> that most people who would want to use this integration would be expecting
>> some amount of a data model. In the follow-on work, I also considered
>> additional data abstractions (e.g. inverted indexes, edge-list
>> representations) but I'm not sure if we would want to draw a line as to
>> what goes into Pig or not.
>>
>> I'm open to suggestions here. I think what I currently have in the
>> previously mentioned branch would both be good for inclusion presently.
>>
>>
>> On 11/13/13, 11:23 AM, Daniel Dai wrote:
>>
>>> Hi, Josh,
>>>
>>> That will be a great addon to Pig. When you say "some" in "including some
>>> or all of this functionality into Pig itself", what is the boundary are
>>> you
>>> thinking?
>>>
>>> Thanks,
>>> Daniel
>>>
>>>
>>> On Wed, Nov 13, 2013 at 9:58 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>>
>>>   All,
>>>>
>>>> I wanted to "announce" some work that I've been doing on allowing Pig to
>>>> interface with Accumulo [1]. The code is available in the ACCUMULO-1783
>>>> branch [2] but still relies on some changes I made to Accumulo that
>>>> aren't
>>>> yet in a released version of Accumulo.
>>>>
>>>> The origins of this work have been around for some time in a Git repo
>>>> underneath the Accumulo "umbrella" but had mostly ignored as of late. My
>>>> recent efforts have been to bring it up to speed up upstream Accumulo
>>>> releases and ensure a full breadth of Pig Latin functionality.
>>>>
>>>> Much of the design was modeled off of how HBaseStorage works, with some
>>>> differences between how HBase and Accumulo themselves differ. I've tried
>>>> to
>>>> make a decent write up on what currently works, a high-level view on
>>>> API/usage, an actual example with non-contrived data, and where I see
>>>> future work leading with the integration [3].
>>>>
>>>> A few questions for the Pig community: would you be interested in
>>>> including some or all of this functionality into Pig itself? I'd be happy
>>>> to work with you all to take this out of an "Accumulo" repo and into Pig
>>>> itself. Additionally, any feedback on convention, style and/or best
>>>> practices would be greatly appreciated as I'm still relatively new to
>>>> working with Pig.
>>>>
>>>> Thanks for your time!
>>>>
>>>> - Josh
>>>>
>>>>
>>>> [1] http://accumulo.apache.org
>>>> [2] https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git
>>>> [3] http://people.apache.org/~elserj/accumulo-pig/
>>>>
>>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB