Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files


+
Taku Miyakawa 2013-02-28, 22:53
+
Jonathan Coveney 2013-03-01, 14:54
+
Cheolsoo Park 2013-03-04, 01:46
+
Taku Miyakawa 2013-03-03, 06:35
Copy link to this message
-
Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files
Jonathan Coveney 2013-03-04, 15:05


> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >    
> >     Open to argument, though.
>
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>    
>     Yes ":map[]" is redundant. I will remove it.
>    
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>    
>     Does this code meet your point?
>    
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>    
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
>
> Cheolsoo Park wrote:
>     >> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
>    
>     You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.
>    
>     This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.

Cheolsoo is totally correct. That said, this should probably be documented better. Also, the fact that we rely on an implicit foreach is so gross... but this is how you do it!
- Jonathan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------
On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
>
> (Updated Feb. 28, 2013, 10:53 p.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
>
> The patch adds LTSVLoader function and its test class.
>
>
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
>
>
> Diffs
> -----
>
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/9685/diff/
>
>
> Testing
> -------
>
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
>
>
> Thanks,
>
> Taku Miyakawa
>
>

+
Taku Miyakawa 2013-03-05, 23:10
+
Jonathan Coveney 2013-03-19, 16:45
+
Taku Miyakawa 2013-03-05, 23:38
+
Taku Miyakawa 2013-03-05, 23:42
+
Taku Miyakawa 2013-03-07, 22:46
+
Taku Miyakawa 2013-03-07, 23:31
+
Taku Miyakawa 2013-03-09, 08:18