-Re: How to find ranges between a field in a set of records
Jonathan Coveney 2012-07-03, 16:57
There is not a way to do this in straight pig, but it is easy with a UDF
(ideally an accumulative UDF though if there are <100 records per key it
doesn't really matter). You'll do a nested sort in a foreach block then
pass the dates to the UDF. The docs should have an example of this.
2012/7/2 Bob Briski <[EMAIL PROTECTED]>
> I need to determine the number of days between dates on a running list
> of records. The records associated with each key will be small (less
> than 100) I should be able to do it in one reducer. The data would
> look something like this:
> Say the headers are:
> player_id, date, other_stuff
> values would be:
> 2, 6/1/2012 ...
> 2, 6/3/2012 ...
> 2, 6/10/2012 ...
> I want to add the number of days between the this and the previous
> record to get:
> player_id, date, range, other_stuff
> 2,6/1/2012,NULL, ...
> 2,6/3/2012,2, ...
> 2,6/10/2012,7, ...
> Is there an easy way to do this in PIG? If not, is it something that
> can be handled with a UDF?