Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Choosing output directory based on field value


+
IGZ Nick 2012-01-10, 05:39
+
Daniel Dai 2012-01-10, 05:57
+
IGZ Nick 2012-01-10, 06:21
Copy link to this message
-
Re: Choosing output directory based on field value
Pig has MultiStorage in piggybank.

https://github.com/apache/pig/blob/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java

I think it has some limitation. You can check the javadoc/jiras for it.

Thanks,
Aniket

On Mon, Jan 9, 2012 at 10:21 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:

> I am able to group the tuples by date. But the problem I am facing is how
> do I ensure that when I finally STORE it, it is stored in separate folders?
>
> On Tue, Jan 10, 2012 at 11:27 AM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
>
> > You can use custom partitioner. Check
> > http://pig.apache.org/docs/r0.9.1/basic.html#partitionby.
> >
> > Daniel
> >
> > On Mon, Jan 9, 2012 at 9:39 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > What I would like to do is to store outputs to different directories
> > based
> > > on record value. Essentially I want to read the date from a field and
> > store
> > > the output in yyyy/mm/dd directory structure. How should I go about
> > this? I
> > > want to use AvroStorage for storing the stuff. I want to specify STORE
> > xyz
> > > INTO '$location' USING MyStorage(); where $location would be the base
> > > output directory. MyStorage() would be the modified version of
> > AvroStorage
> > > which stores the values in $location/yyyy/mm/dd/part-abc files, reading
> > the
> > > yyyymmdd from a particular field in the input records.
> > >
> > > What is the way to achieve this with minimal changes?
> > >
> > > Nick
> > >
> >
>

--
"...:::Aniket:::... Quetzalco@tl"