Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo


Copy link to this message
-
Re: Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo


> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > Thanks Josh, this is a very well written code to make Accumulo available to Pig. I can make it work locally. I have two major comment:
> > 1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not seems pretty useful since it requires user to manipulate bag in Pig script. User will have to write a UDF for it. AccumuloKVStorage is fine, but it is also awkward to manipulate the flattened data. AccumuloStorage is the best style among these three, user can project map entries to get the columnar data. But personally, I would still prefer HBastStorage style in which user can specify the column he's interested in construct arguments. So I would suggest to improve AccumuloStorage (or a new class) to include that.
> >
> > 2. caster is not right. You will need to return correct caster in getLoadCaster. Otherwise cast data into Pig type will fail. Eg:
> > a = load 'accumulo://....' using org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, m:map[]);
> > b = foreach a generate (double)m#'gpa';
> > I see you give Utf8StorageConverter as the converter in some of the loaders, but Utf8StorageConverter will assume numerical data in string style, and try to parse the string into numerical. I see there is objToBytes in both AbstractAccumuloStorage and Utils. These should be consolidated into a caster (refer to HBaseBinaryConverter)

For #1, I agree. I was never really sure about using the WholeRowStorage variant, I think it probably just introduces more complexity, and, like you said, isn't a great example for Pig in the first place. I can probably add in another constructor variant that lets you set a column projection. Perhaps the "fetch_column" option should be removed from the URI completely?

As a follow on, I want to reduce the args to that accumulo URI (probably pulling some file off of the classpath) so the user doesn't always have to re-type the same four args (user, pass, instance name and zookeeper hosts).

For #2, I think I understand, but I haven't been able to generate a failure given the little snippet you provided. I'll look at the HBaseStorage and HBaseBinaryConverter as an example.
> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > build.xml, line 193
> > <https://reviews.apache.org/r/16533/diff/1/?file=407112#file407112line193>
> >
> >     Guess we don't need to create a profile for Accumulo. HBase create it because they have jar layout change between 94/95. Not the case for Accumulo

Ok, I'll remove the profile.
> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java, line 380
> > <https://reviews.apache.org/r/16533/diff/1/?file=407115#file407115line380>
> >
> >     Better to ship dependent jars automatically, so user don't have to register these jars in Pig script. See HBaseStorage.initializeHBaseClassLoaderResources. But that we can do in a follow up ticket.

Oh that is awesome. REGISTER'ing the jars was kind of a pain. I'll get something working similar to the HBase code.
> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 49
> > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line49>
> >
> >     Missing Javadoc for the usage.

Done. Added some on the constructors too for good measure.
> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 66
> > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line66>
> >
> >     Boolean construct argument does not work. Pig will always pass String. Need to parse the String to boolean in the construct.

Ah, good call. I would've missed that.
- Josh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16533/#review31404
On Dec. 31, 2013, 2:02 a.m., Josh Elser wrote: