Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> creating a graph over time


+
Marco Cadetg 2011-10-27, 09:56
+
Bill Graham 2011-10-27, 15:05
Copy link to this message
-
Re: creating a graph over time
In case what you're looking for is an analysis over the full learning
duration, and not just the start interval, then one further insight is
that each original record can be transformed into a sequence of
records, where the size of the sequence corresponds to the session
duration.  In other words, you can use a UDF to "explode" the original
record:

1,marco,1319708213,500,math

into:

1,marco,1319708190,500,math
1,marco,1319708220,500,math
1,marco,1319708250,500,math
1,marco,1319708280,500,math
1,marco,1319708310,500,math
1,marco,1319708340,500,math
1,marco,1319708370,500,math
1,marco,1319708400,500,math
1,marco,1319708430,500,math
1,marco,1319708460,500,math
1,marco,1319708490,500,math
1,marco,1319708520,500,math
1,marco,1319708550,500,math
1,marco,1319708580,500,math
1,marco,1319708610,500,math
1,marco,1319708640,500,math
1,marco,1319708670,500,math
1,marco,1319708700,500,math

and then use Bill's suggestion to group by course, interval.

Norbert

On Thu, Oct 27, 2011 at 11:05 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
> You can pass your time to a udf that rounds it down to the nearest 30 second
> interval and then group by course, interval to get counts for each course,
> interval.
>
> On Thursday, October 27, 2011, Marco Cadetg <[EMAIL PROTECTED]> wrote:
>> I have a problem where I don't know how or if pig is even suitable to
> solve
>> it.
>>
>> I have a schema like this:
>>
>> student-id,student-name,start-time,duration,course
>> 1,marco,1319708213,500,math
>> 2,ralf,1319708111,112,english
>> 3,greg,1319708321,333,french
>> 4,diva,1319708444,80,english
>> 5,susanne,1319708123,2000,math
>> 1,marco,1319708564,500,french
>> 2,ralf,1319708789,123,french
>> 7,fred,1319708213,5675,french
>> 8,laura,1319708233,123,math
>> 10,sab,1319708999,777,math
>> 11,fibo,1319708789,565,math
>> 6,dan,1319708456,50,english
>> 9,marco,1319708123,60,english
>> 12,bo,1319708456,345,math
>> 1,marco,1319708789,673,math
>> ...
>> ...
>>
>> I would like to retrieve a graph (interpolation) over time grouped by
>> course. Meaning how many students are learning for a course based on a 30
>> sec interval.
>> The grouping by course is easy but from there I've no clue how I would
>> achieve the rest. I guess the rest needs to be achieved via some UDF
>> or is there any way how to this in pig? I often think that I need a "for
>> loop" or something similar in pig.
>>
>> Thanks for your help!
>> -Marco
>>
>
+
Marco Cadetg 2011-10-27, 16:23
+
Guy Bayes 2011-10-27, 20:05
+
Norbert Burger 2011-10-28, 13:12
+
Guy Bayes 2011-10-28, 15:02
+
Marco Cadetg 2011-10-31, 15:55
+
Guy Bayes 2011-10-31, 16:58
+
Jonathan Coveney 2011-10-31, 17:15
+
Marco Cadetg 2011-11-01, 13:26
+
Jonathan Coveney 2011-11-01, 17:44
+
Ashutosh Chauhan 2011-11-02, 18:03
+
Jonathan Coveney 2011-11-02, 18:52
+
Marco Cadetg 2011-11-04, 11:33
+
Jonathan Coveney 2011-11-14, 18:10
+
Stan Rosenberg 2011-11-05, 19:15
+
pablomar 2011-10-28, 01:59
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB