Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files


Copy link to this message
-
Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files
Cheolsoo Park 2013-03-04, 01:46


> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >    
> >     Open to argument, though.
>
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>    
>     Yes ":map[]" is redundant. I will remove it.
>    
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>    
>     Does this code meet your point?
>    
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>    
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.

>> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.

You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.

This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------
On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
>
> (Updated Feb. 28, 2013, 10:53 p.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
>
> The patch adds LTSVLoader function and its test class.
>
>
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
>
>
> Diffs
> -----
>
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/9685/diff/
>
>
> Testing
> -------
>
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
>
>
> Thanks,
>
> Taku Miyakawa
>
>