Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Process multiple sub directories and match output


Copy link to this message
-
Re: Process multiple sub directories and match output
Try looking into MultiStorage. Maybe it can help:
http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/MultiStorage.html

Regards,
Shahab
On Mon, Jun 17, 2013 at 4:44 AM, Shin Chan <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I want to execute my pig script for multiple sub directories and then want
> that output should match with the input directorory structure.
>
> Example
>
> /input/parent/child1/
> /input/parent/child2/
> /input/parent/child3/
>
> etc
>
> Output should be
>
> /output/parent/child1/
> /output/parent/child2/
> /output/parent/child3/
>
> Which Pig Storage format can i use.
>
> to explain it better
>
> I want to make sure that my pig script executes only under folder which
> doubt have any further childs.
>
> Basically my folder structure is hive partitions
>
> /input/parent/child1/YYYY/MM/DD
> /input/parent/child2/YYYY/MM/DD
> /input/parent/child3/YYYY/MM/DD
>
> I want to process all data in hive partitions and have result in format
>
> /output/parent/child1/YYYY/MM/DD
> /output/parent/child2/YYYY/MM/DD
> /output/parent/child3/YYYY/MM/DD
>
> Thanks in advance
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB