Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Store Groups Separately


+
Dustin Whitney 2011-10-10, 16:43
+
Jacob Perkins 2011-10-10, 16:57
Copy link to this message
-
Re: Store Groups Separately
In case it's not obvious, you'd also need a FLATTEN(group) in there before
the FOREACH to break the tuple apart so that the fields could by synthesized
into a filename.

Norbert

On Mon, Oct 10, 2011 at 12:57 PM, Jacob Perkins
<[EMAIL PROTECTED]>wrote:

> You'll have to run a FOREACH...GENERATE over the data first and generate
> a single key to look like the filename you want. Then you can use
> MultiStorage() from the piggybank. See:
>
> org.apache.pig.piggybank.storage.MultiStorage
>
> in the pig api docs.
>
> --jacob
> @thedatachef
>
> On Mon, 2011-10-10 at 18:43 +0200, Dustin Whitney wrote:
> > Hello all,
> >
> > I'm new to Hadoop and Pig, and I've got a question.  I've got relation
> that
> > looks like this via GROUP
> >
> > ((customer1,2011-10-07,GET,200),{....})
> > ((customer1,2011-10-07,PUT,201),{....})
> > ((customer1,2011-10-07,PUT,202),{....})
> > ((customer2,2011-10-07,GET,200),{....})
> > ((customer2,2011-10-07,PUT,201),{....})
> > ((customer2,2011-10-07,PUT,202),{....})
> >
> >
> > I'd like each group (i.e. the data in the {...}) stored separately, and
> I'd
> > like to use the values in the first tuple to name my file, so the first
> file
> > would be customer1-2011-10-07-GET-200, and the second would be
> > customer1-2011-10-07-PUT-201, etc.  Is this possible? I can only see how
> to
> > save a single full relation to file, and I can't find any documentation
> that
> > states how I might use variables to name things.
> >
> > Thanks,
> > Dustin
>
>
>
+
Dustin Whitney 2011-10-10, 19:06
+
Dustin Whitney 2011-10-10, 19:48
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB