Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Accumulo-Pig Integration


Copy link to this message
-
Re: Accumulo-Pig Integration
Opened PIG-3573 for the necessary work and additional discussion.

Actually, they share a common parent class, but AccumuloStorage doesn't
depend on AccumuloKVStorage. I would imagine that both have some worth
for inclusion into Pig -- if there happen to be any concerns, we can
address them later.

Thanks for your help, Daniel!

On 11/13/13, 2:19 PM, Daniel Dai wrote:
> Would be good to open a Jira ticket and attach the patch there. We can
> discuss from there.
>
> If the code only relevant to Pig, we shall put in Pig. I would imagine you
> will need to use AccumuloKVStorage to convert Pig Tuple to Accumulo in
> AccumuloStorage, right? Sounds like this should goes to Pig.
>
> Thanks,
> Daniel
>
>
> On Wed, Nov 13, 2013 at 11:40 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>
>> Daniel,
>>
>> I definitely see the AccumuloStorage class as a good addition (the class
>> that mimics the functionality that HBaseStorage also provides).
>>
>> There also exists the AccumuloKVStorage which maps a Tuple into an
>> Accumulo Key (which is a 5-tuple) and Value. I did this as a quick example,
>> but I'm not sure how relevant/useful for users it would be. I would imagine
>> that most people who would want to use this integration would be expecting
>> some amount of a data model. In the follow-on work, I also considered
>> additional data abstractions (e.g. inverted indexes, edge-list
>> representations) but I'm not sure if we would want to draw a line as to
>> what goes into Pig or not.
>>
>> I'm open to suggestions here. I think what I currently have in the
>> previously mentioned branch would both be good for inclusion presently.
>>
>>
>> On 11/13/13, 11:23 AM, Daniel Dai wrote:
>>
>>> Hi, Josh,
>>>
>>> That will be a great addon to Pig. When you say "some" in "including some
>>> or all of this functionality into Pig itself", what is the boundary are
>>> you
>>> thinking?
>>>
>>> Thanks,
>>> Daniel
>>>
>>>
>>> On Wed, Nov 13, 2013 at 9:58 AM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>>
>>>   All,
>>>>
>>>> I wanted to "announce" some work that I've been doing on allowing Pig to
>>>> interface with Accumulo [1]. The code is available in the ACCUMULO-1783
>>>> branch [2] but still relies on some changes I made to Accumulo that
>>>> aren't
>>>> yet in a released version of Accumulo.
>>>>
>>>> The origins of this work have been around for some time in a Git repo
>>>> underneath the Accumulo "umbrella" but had mostly ignored as of late. My
>>>> recent efforts have been to bring it up to speed up upstream Accumulo
>>>> releases and ensure a full breadth of Pig Latin functionality.
>>>>
>>>> Much of the design was modeled off of how HBaseStorage works, with some
>>>> differences between how HBase and Accumulo themselves differ. I've tried
>>>> to
>>>> make a decent write up on what currently works, a high-level view on
>>>> API/usage, an actual example with non-contrived data, and where I see
>>>> future work leading with the integration [3].
>>>>
>>>> A few questions for the Pig community: would you be interested in
>>>> including some or all of this functionality into Pig itself? I'd be happy
>>>> to work with you all to take this out of an "Accumulo" repo and into Pig
>>>> itself. Additionally, any feedback on convention, style and/or best
>>>> practices would be greatly appreciated as I'm still relatively new to
>>>> working with Pig.
>>>>
>>>> Thanks for your time!
>>>>
>>>> - Josh
>>>>
>>>>
>>>> [1] http://accumulo.apache.org
>>>> [2] https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git
>>>> [3] http://people.apache.org/~elserj/accumulo-pig/
>>>>
>>>>
>>>
>