Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo

Copy link to this message
Re: Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo

This is an automatically generated e-mail. To reply, visit:
Thanks Josh, this is a very well written code to make Accumulo available to Pig. I can make it work locally. I have two major comment:
1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not seems pretty useful since it requires user to manipulate bag in Pig script. User will have to write a UDF for it. AccumuloKVStorage is fine, but it is also awkward to manipulate the flattened data. AccumuloStorage is the best style among these three, user can project map entries to get the columnar data. But personally, I would still prefer HBastStorage style in which user can specify the column he's interested in construct arguments. So I would suggest to improve AccumuloStorage (or a new class) to include that.

2. caster is not right. You will need to return correct caster in getLoadCaster. Otherwise cast data into Pig type will fail. Eg:
a = load 'accumulo://....' using org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, m:map[]);
b = foreach a generate (double)m#'gpa';
I see you give Utf8StorageConverter as the converter in some of the loaders, but Utf8StorageConverter will assume numerical data in string style, and try to parse the string into numerical. I see there is objToBytes in both AbstractAccumuloStorage and Utils. These should be consolidated into a caster (refer to HBaseBinaryConverter)

    Guess we don't need to create a profile for Accumulo. HBase create it because they have jar layout change between 94/95. Not the case for Accumulo


    Better to ship dependent jars automatically, so user don't have to register these jars in Pig script. See HBaseStorage.initializeHBaseClassLoaderResources. But that we can do in a follow up ticket.


    Missing Javadoc for the usage.


    Boolean construct argument does not work. Pig will always pass String. Need to parse the String to boolean in the construct.
- Daniel Dai
On Dec. 31, 2013, 2:02 a.m., Josh Elser wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16533/
> -----------------------------------------------------------
> (Updated Dec. 31, 2013, 2:02 a.m.)
> Review request for pig.
> Bugs: PIG-3573
>     https://issues.apache.org/jira/browse/PIG-3573
> Repository: pig-git
> Description
> -------
> Provides basic StoreFunc and LoadFunc implementations. Based off of code that was in an Accumulo contrib project.
> Diffs
> -----
>   build.xml 575c9ae
>   ivy.xml 180eb2c
>   ivy/libraries.properties 14abdf8
>   src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorage.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorageTest.java PRE-CREATION
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java PRE-CREATION