Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Dynamically Generate Columns in Pig


Copy link to this message
-
Dynamically Generate Columns in Pig
Hi,
I have a relation set of browsers and number of people using each of the
form:

_browser_, _total_
firefox,1234
ie,123
chrome,321
ipad,437

Is there any good way I can rotate this, so that the first row
dynamically generates columns and I wind up with a result like:

_firefox_, _ie_, _chrome_, _ipad_
1234, 123, 321, 437

The basic Pig I'm using to load what I have so far is along the lines of:

         good = FILTER new BY (browser_identity IS NOT NULL)
                 AND (browser_version IS NOT NULL)
                 AND (ip_address IS NOT NULL);
          distincted = DISTINCT good
          distincted = FOREACH distincted GENERATE browser_identity,
browser_version;
          grouped = GROUP distincted BY (browser_identity, browser_version);
          counted = FOREACH grouped GENERATE
              group AS colname, COUNT(distincted) AS total

In case that helps.

I was thinking of writing a udf for this, but figured the output schema
would be really annoying to deal with, so I'd ask here first in case
there's an easier way, or someone had already done it.

Cheers,
Eli