Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Review Request: PIG-3223 AvroStorage does not handle comma separated input paths


+
Johnny Zhang 2013-04-08, 22:03
Copy link to this message
-
Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10351/#review19974
-----------------------------------------------------------

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/10351/#comment41190>

    Doing a globStatus again on a known file (FileStatus) is inefficient. Better move this block to a separate method and use that for recursion

contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/10351/#comment41189>

    Pattern should be a private static variable. This pattern only takes into account globs of the form {x,y}.  Hadoop glob status supports a lot more
    
    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path).
    
    Found this method in pig which would take care of the logic - LoadFunc.getPathStrings() . Use this for splitting paths.  This should simplify the whole change
    
    
- Rohini Palaniswamy
On April 8, 2013, 10:03 p.m., Johnny Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10351/
> -----------------------------------------------------------
>
> (Updated April 8, 2013, 10:03 p.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> we want to support comma separated input paths in AvroStorage, for example
> "test_dir1/test_glob1.avro,test_dir1/test_glob2.avro,test_dir1/test_glob3.avro"
> "test_dir1/*, test_dir2/test_glob4.avro, test_dir2/test_glob5.avro"
>
>
> This addresses bug PIG-3223.
>     https://issues.apache.org/jira/browse/PIG-3223
>
>
> Diffs
> -----
>
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 0ac0225
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java bd7a6d2
>
> Diff: https://reviews.apache.org/r/10351/diff/
>
>
> Testing
> -------
>
> added two more test cases in TestAvroStorage.java and they all pass
>
>
> Thanks,
>
> Johnny Zhang
>
>

+
Johnny Zhang 2013-05-02, 20:36
+
Johnny Zhang 2013-05-03, 00:27
+
Johnny Zhang 2013-05-03, 00:27
+
Johnny Zhang 2013-05-03, 00:33
+
Rohini Palaniswamy 2013-05-03, 18:53
+
Johnny Zhang 2013-05-03, 18:59
+
Johnny Zhang 2013-05-03, 19:30
+
Rohini Palaniswamy 2013-05-03, 19:50