Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to find ranges between a field in a set of records


Copy link to this message
-
Re: How to find ranges between a field in a set of records
Jonathan Coveney 2012-07-03, 16:57
There is not a way to do this in straight pig, but it is easy with a UDF
(ideally an accumulative UDF though if there are <100 records per key it
doesn't really matter). You'll do a nested sort in a foreach block then
pass the dates to the UDF. The docs should have an example of this.

2012/7/2 Bob Briski <[EMAIL PROTECTED]>

> Hi,
>
> I need to determine the number of days between dates on a running list
> of records.  The records associated with each key will be small (less
> than 100) I should be able to do it in one reducer.  The data would
> look something like this:
>
> Say the headers are:
> player_id, date, other_stuff
>
> values would be:
> 2, 6/1/2012 ...
> 2, 6/3/2012 ...
> 2, 6/10/2012 ...
>
> I want to add the number of days between the this and the previous
> record to get:
> player_id, date, range, other_stuff
>
> 2,6/1/2012,NULL, ...
> 2,6/3/2012,2, ...
> 2,6/10/2012,7, ...
>
> Is there an easy way to do this in PIG?  If not, is it something that
> can be handled with a UDF?
>
> Thanks,
> Bob
>