Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to find ranges between a field in a set of records


Copy link to this message
-
Re: How to find ranges between a field in a set of records
There is not a way to do this in straight pig, but it is easy with a UDF
(ideally an accumulative UDF though if there are <100 records per key it
doesn't really matter). You'll do a nested sort in a foreach block then
pass the dates to the UDF. The docs should have an example of this.

2012/7/2 Bob Briski <[EMAIL PROTECTED]>

> Hi,
>
> I need to determine the number of days between dates on a running list
> of records.  The records associated with each key will be small (less
> than 100) I should be able to do it in one reducer.  The data would
> look something like this:
>
> Say the headers are:
> player_id, date, other_stuff
>
> values would be:
> 2, 6/1/2012 ...
> 2, 6/3/2012 ...
> 2, 6/10/2012 ...
>
> I want to add the number of days between the this and the previous
> record to get:
> player_id, date, range, other_stuff
>
> 2,6/1/2012,NULL, ...
> 2,6/3/2012,2, ...
> 2,6/10/2012,7, ...
>
> Is there an easy way to do this in PIG?  If not, is it something that
> can be handled with a UDF?
>
> Thanks,
> Bob
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB