Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Store Groups Separately


+
Dustin Whitney 2011-10-10, 16:43
Copy link to this message
-
Re: Store Groups Separately
You'll have to run a FOREACH...GENERATE over the data first and generate
a single key to look like the filename you want. Then you can use
MultiStorage() from the piggybank. See:

org.apache.pig.piggybank.storage.MultiStorage

in the pig api docs.

--jacob
@thedatachef

On Mon, 2011-10-10 at 18:43 +0200, Dustin Whitney wrote:
> Hello all,
>
> I'm new to Hadoop and Pig, and I've got a question.  I've got relation that
> looks like this via GROUP
>
> ((customer1,2011-10-07,GET,200),{....})
> ((customer1,2011-10-07,PUT,201),{....})
> ((customer1,2011-10-07,PUT,202),{....})
> ((customer2,2011-10-07,GET,200),{....})
> ((customer2,2011-10-07,PUT,201),{....})
> ((customer2,2011-10-07,PUT,202),{....})
>
>
> I'd like each group (i.e. the data in the {...}) stored separately, and I'd
> like to use the values in the first tuple to name my file, so the first file
> would be customer1-2011-10-07-GET-200, and the second would be
> customer1-2011-10-07-PUT-201, etc.  Is this possible? I can only see how to
> save a single full relation to file, and I can't find any documentation that
> states how I might use variables to name things.
>
> Thanks,
> Dustin
+
Norbert Burger 2011-10-10, 18:36
+
Dustin Whitney 2011-10-10, 19:06
+
Dustin Whitney 2011-10-10, 19:48