Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Store Groups Separately


Copy link to this message
-
Re: Store Groups Separately
Norbert Burger 2011-10-10, 18:36
In case it's not obvious, you'd also need a FLATTEN(group) in there before
the FOREACH to break the tuple apart so that the fields could by synthesized
into a filename.

Norbert

On Mon, Oct 10, 2011 at 12:57 PM, Jacob Perkins
<[EMAIL PROTECTED]>wrote:

> You'll have to run a FOREACH...GENERATE over the data first and generate
> a single key to look like the filename you want. Then you can use
> MultiStorage() from the piggybank. See:
>
> org.apache.pig.piggybank.storage.MultiStorage
>
> in the pig api docs.
>
> --jacob
> @thedatachef
>
> On Mon, 2011-10-10 at 18:43 +0200, Dustin Whitney wrote:
> > Hello all,
> >
> > I'm new to Hadoop and Pig, and I've got a question.  I've got relation
> that
> > looks like this via GROUP
> >
> > ((customer1,2011-10-07,GET,200),{....})
> > ((customer1,2011-10-07,PUT,201),{....})
> > ((customer1,2011-10-07,PUT,202),{....})
> > ((customer2,2011-10-07,GET,200),{....})
> > ((customer2,2011-10-07,PUT,201),{....})
> > ((customer2,2011-10-07,PUT,202),{....})
> >
> >
> > I'd like each group (i.e. the data in the {...}) stored separately, and
> I'd
> > like to use the values in the first tuple to name my file, so the first
> file
> > would be customer1-2011-10-07-GET-200, and the second would be
> > customer1-2011-10-07-PUT-201, etc.  Is this possible? I can only see how
> to
> > save a single full relation to file, and I can't find any documentation
> that
> > states how I might use variables to name things.
> >
> > Thanks,
> > Dustin
>
>
>