Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Choosing output directory based on field value


+
IGZ Nick 2012-01-10, 05:39
+
Daniel Dai 2012-01-10, 05:57
+
IGZ Nick 2012-01-10, 06:21
Copy link to this message
-
Re: Choosing output directory based on field value
Pig has MultiStorage in piggybank.

https://github.com/apache/pig/blob/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java

I think it has some limitation. You can check the javadoc/jiras for it.

Thanks,
Aniket

On Mon, Jan 9, 2012 at 10:21 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:

> I am able to group the tuples by date. But the problem I am facing is how
> do I ensure that when I finally STORE it, it is stored in separate folders?
>
> On Tue, Jan 10, 2012 at 11:27 AM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
>
> > You can use custom partitioner. Check
> > http://pig.apache.org/docs/r0.9.1/basic.html#partitionby.
> >
> > Daniel
> >
> > On Mon, Jan 9, 2012 at 9:39 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > What I would like to do is to store outputs to different directories
> > based
> > > on record value. Essentially I want to read the date from a field and
> > store
> > > the output in yyyy/mm/dd directory structure. How should I go about
> > this? I
> > > want to use AvroStorage for storing the stuff. I want to specify STORE
> > xyz
> > > INTO '$location' USING MyStorage(); where $location would be the base
> > > output directory. MyStorage() would be the modified version of
> > AvroStorage
> > > which stores the values in $location/yyyy/mm/dd/part-abc files, reading
> > the
> > > yyyymmdd from a particular field in the input records.
> > >
> > > What is the way to achieve this with minimal changes?
> > >
> > > Nick
> > >
> >
>

--
"...:::Aniket:::... Quetzalco@tl"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB