|
|
jamal sasha 2012-10-11, 20:36
> > I have a data file in format > > > > User, movie, price > > 123,abc,22.2 > > 123,daw,39 > > 123,abc,99 ß Note that the user and movie is same but price is different > > > > I want to generate a pig script where I am counting how many times a user has rented a particular movie > > > > > > in = LOAD 'data' USING PigStorage('\\u001') AS ( user:long, movie: long, price: float) > > > > filtered_times = FILTER in BY price>0; > > perCust = GROUP filtered_times BY (user,movie); > > > > count = foreach perCust generate group, COUNT(filtered_times.movie); > > STORE count INTO 'results' using PigStorage(','); > > > > The out put is like: > > (3710100987700,5460986508),14 > > > > I don’t want these braces L > > I want like normal delimited by ","
Arun Ahuja 2012-10-12, 15:27
Instead of
count = foreach perCust generate group, COUNT(filtered_times.movie);
use
count = foreach perCust generate FLATTEN(group), COUNT(filtered_times.movie);
FLATTEN is a special operator that replaces a tuple with the elements inside the tuple.
On Thu, Oct 11, 2012 at 4:36 PM, jamal sasha <[EMAIL PROTECTED]> wrote: >> >> I have a data file in format >> >> >> >> User, movie, price >> >> 123,abc,22.2 >> >> 123,daw,39 >> >> 123,abc,99 ß Note that the user and movie is same but price is different >> >> >> >> I want to generate a pig script where I am counting how many times a user > has rented a particular movie >> >> >> >> >> >> in = LOAD 'data' USING PigStorage('\\u001') AS ( user:long, movie: long, > price: float) >> >> >> >> filtered_times = FILTER in BY price>0; >> >> perCust = GROUP filtered_times BY (user,movie); >> >> >> >> count = foreach perCust generate group, COUNT(filtered_times.movie); >> >> STORE count INTO 'results' using PigStorage(','); >> >> >> >> The out put is like: >> >> (3710100987700,5460986508),14 >> >> >> >> I don’t want these braces L >> >> I want like normal delimited by ","
|
|