Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Process multiple sub directories and match output


Copy link to this message
-
Re: Process multiple sub directories and match output
Try looking into MultiStorage. Maybe it can help:
http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/MultiStorage.html

Regards,
Shahab
On Mon, Jun 17, 2013 at 4:44 AM, Shin Chan <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I want to execute my pig script for multiple sub directories and then want
> that output should match with the input directorory structure.
>
> Example
>
> /input/parent/child1/
> /input/parent/child2/
> /input/parent/child3/
>
> etc
>
> Output should be
>
> /output/parent/child1/
> /output/parent/child2/
> /output/parent/child3/
>
> Which Pig Storage format can i use.
>
> to explain it better
>
> I want to make sure that my pig script executes only under folder which
> doubt have any further childs.
>
> Basically my folder structure is hive partitions
>
> /input/parent/child1/YYYY/MM/DD
> /input/parent/child2/YYYY/MM/DD
> /input/parent/child3/YYYY/MM/DD
>
> I want to process all data in hive partitions and have result in format
>
> /output/parent/child1/YYYY/MM/DD
> /output/parent/child2/YYYY/MM/DD
> /output/parent/child3/YYYY/MM/DD
>
> Thanks in advance
>