Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pivot table with zeros instead of empty


Copy link to this message
-
Re: Pivot table with zeros instead of empty
You can group on ip which will result in ip -> bag mappings. Let's say you
loaded (ip, hour, count) dataset as follows

grunt> a = load 'data' using PigStorage(',') as (ip:chararray, hour:int,
count:int);
grunt> b = group a by
ip;
grunt> describe b;
b: {group: chararray,a: {(ip: chararray,hour: int,count: int)}}

The group here is ip and the resulting bag contains (ip, hour, count)
tuples.
grunt> dump b;
(128.187.97.22,{(128.187.97.22,0,180),(128.187.97.22,1,84),(128.187.97.22,2,25),(128.187.97.22,22,31),(128.187.97.22,23,2)})

You can write a very simple UDF that returns a Tuple based on the contents
of this bag that would have what you need. The UDF would traverse through
the tuples in the bag and create a map of hour->count. Your result tuple
should be sorted by hour and only return the counts in an order (m0...m23).

-Prashant

On Wed, Mar 6, 2013 at 7:20 AM, Eugene Morozov <[EMAIL PROTECTED]>wrote:

> Hello!
>
> I have a script that gives me following result:
>
> time_grouped = GROUP joined BY (ip, hour);
> counts = FOREACH time_grouped GENERATE group.ip as ip, group.hour as hour,
> COUNT(joined) as count;
>
> (128.187.97.22, 0, 180)
> (128.187.97.22, 1, 84)
> (128.187.97.22, 2, 25)
> (128.187.97.22, 22, 31)
> (128.187.97.22, 23, 2)
>
> That is IP address, hour of day, counter.
> I'd like to get following:
>
> (128.187.97.22, (m1,m2, m3, ..., m23))
> m1-m23 corresponds to the counter. And if there is nothing for particular
> hour, then I'd like to have 0 instead empty value.
>
> The trick here is that if I do not have anything for particular hour, then
> I won't have count for it.
> Is there a way to achieve the goal?
>
> Thanks in advance
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> [EMAIL PROTECTED]
>