Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Store Groups Separately


Copy link to this message
-
Re: Store Groups Separately
You'll have to run a FOREACH...GENERATE over the data first and generate
a single key to look like the filename you want. Then you can use
MultiStorage() from the piggybank. See:

org.apache.pig.piggybank.storage.MultiStorage

in the pig api docs.

--jacob
@thedatachef

On Mon, 2011-10-10 at 18:43 +0200, Dustin Whitney wrote:
> Hello all,
>
> I'm new to Hadoop and Pig, and I've got a question.  I've got relation that
> looks like this via GROUP
>
> ((customer1,2011-10-07,GET,200),{....})
> ((customer1,2011-10-07,PUT,201),{....})
> ((customer1,2011-10-07,PUT,202),{....})
> ((customer2,2011-10-07,GET,200),{....})
> ((customer2,2011-10-07,PUT,201),{....})
> ((customer2,2011-10-07,PUT,202),{....})
>
>
> I'd like each group (i.e. the data in the {...}) stored separately, and I'd
> like to use the values in the first tuple to name my file, so the first file
> would be customer1-2011-10-07-GET-200, and the second would be
> customer1-2011-10-07-PUT-201, etc.  Is this possible? I can only see how to
> save a single full relation to file, and I can't find any documentation that
> states how I might use variables to name things.
>
> Thanks,
> Dustin
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB