Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Dynamically Generate Columns in Pig

Copy link to this message
Dynamically Generate Columns in Pig
Eli Finkelshteyn 2012-03-19, 18:43
I have a relation set of browsers and number of people using each of the

_browser_, _total_

Is there any good way I can rotate this, so that the first row
dynamically generates columns and I wind up with a result like:

_firefox_, _ie_, _chrome_, _ipad_
1234, 123, 321, 437

The basic Pig I'm using to load what I have so far is along the lines of:

         good = FILTER new BY (browser_identity IS NOT NULL)
                 AND (browser_version IS NOT NULL)
                 AND (ip_address IS NOT NULL);
          distincted = DISTINCT good
          distincted = FOREACH distincted GENERATE browser_identity,
          grouped = GROUP distincted BY (browser_identity, browser_version);
          counted = FOREACH grouped GENERATE
              group AS colname, COUNT(distincted) AS total

In case that helps.

I was thinking of writing a udf for this, but figured the output schema
would be really annoying to deal with, so I'd ask here first in case
there's an easier way, or someone had already done it.