Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo


Copy link to this message
-
Re: Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo


On Jan. 11, 2014, 1:56 a.m., Josh Elser wrote:
> > That much better, thanks.
> >
> > Caster is added. But for AccumuloBinaryConverter, we shall retain the data in binary style. Eg, toBytes(Integer), we shall not convert the integer into String, then get bytes from string, but retain in 4 bytes integer style (like Bytes.toBytes(Integer) in hbase, not sure what's best in Accumulo). Also in addition to specify caster in construct parameter, better to have a configuration entry for that. I assume user prefer BinaryConverter will always use BinaryConverter, they don't want to specify the option every time.
> >
> > I still feel the syntax for AccumuloStorage is not straightforward. Can user use the following style (similar to HBaseStorage)?
> >
> > a = load 'accumulo://....' using AccumuloStorage('info:age info2:address', 'other options');
> > -- produce a 3 item tuple (key, info:age, info2:address). info/info2 is colFam, age/address is colQual
> > -- user might optionally specify info:*, which will produce a map including every colQual in this colFam
> > -- I see aggregate flag in the code, but I cannot think of much use case that will be useful, how do you feel?
> >
> > store x into 'accumulo://....' using AccumuloStorage('info:age info2:address', 'other options');
> > -- input tuple contains 3 items: (key, info:age, info2:address)
> > -- mirror what's in input side, user can pass a map instead, then he needs to use wild card in construct AccumuloStorage('info:*');

I was planning to have a new patch uploaded tonight, but I just found a bug so I'll fix that up tmrw and should get a new patch early in the day. Thanks for the push to mimic HBaseStorage's columns; I think it did clean up things from a usage perspective.

I also added a few things over what HBaseStorage currently does, mainly because Accumulo tables can have any number of column families without altering the table. I'll go into details when I post the patch.

re: configuration entry for specific caster, don't lines ~180-184 in AbstractAccumuloStorage address a specific caster without needing to configure it for every AccumuloStorage invocation?

re: AccumuloBinaryConverter, yeah, I was being lazy. I'll write something that doesn't rely on String to serialize the numerics.
- Josh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16533/#review31537
-----------------------------------------------------------
On Jan. 10, 2014, 7:20 p.m., Josh Elser wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16533/
> -----------------------------------------------------------
>
> (Updated Jan. 10, 2014, 7:20 p.m.)
>
>
> Review request for pig.
>
>
> Bugs: PIG-3573
>     https://issues.apache.org/jira/browse/PIG-3573
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Provides basic StoreFunc and LoadFunc implementations. Based off of code that was in an Accumulo contrib project.
>
>
> Diffs
> -----
>
>   build.xml 575c9ae
>   ivy.xml 180eb2c
>   ivy/libraries.properties 14abdf8
>   src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloBinaryConverter.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageOptions.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/FixedByteArrayOutputStream.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageConfigurationTest.java PRE-CREATION