|
|
-
Re: Beginner. Help needed in getting startedTianYi Zhu 2012-08-30, 04:51
Hi Mohit,
assuming you are using pig 0.9+, please check this link and learn how to write user defined functions in python: http://archive.cloudera.com/cdh4/cdh/4/pig/udf.html#python-udfs for your problem, you can handle it like this: 1. load data from text file 2. pass the data line by line through your UDF, your UDF should take a line as input, and output the line with a additional time_information ("morning", "afternoon", "evening") 3. group them by id 4. for each grouped result, filter and calculate the sum of the cost by time_information 5. write them to file additional reference: http://ofps.oreilly.com/titles/9781449302641/index.html -- Thanks, TianYi not a naive English speaker, correct me if i made mistakes.... On Thu, Aug 30, 2012 at 2:20 PM, Mohit Singh <[EMAIL PROTECTED]> wrote: > am new to hadoop and all its derivatives. And I am really getting > intimidated by the abundance of information available. > > But one thing I have realized is that to start implementing/using hadoop or > distributed codes, one has to basically change the way they think about a > problem. > > I was wondering if someone can help me in the following. > > So, basically (like anyone else) I have a raw data.. I want to parse it and > extract some information and then run some algorithm and save the results. > > Lets say I have a text file "foo.txt" where data is like: > > id,$value,garbage_field,time_string\n > 1, 200, grrrr,2012:12:2:13:00:00 > 2, 12.22,jlfa,2012:12:4:15:00:00 > 1, 2, ajf, 2012:12:22:13:56:00 > > As you can see that the id can be repeated.This id can be like how much > money a customer has spent!! What I want to do is save the result in a file > which contains how much money each of the customer has spent in > "morning","afternoon""evening""night" (You can define your some time > buckets to define what morning and all is. For example here probably > > 1, 0,202,0,0 > 1 is the id, 0--> 0$ spent in morning, 202 in afternon, 0 in evening and > night > > Now I have a python code for it.. But I have to implement this in pig.. to > get started. If anyone can just write/guide me thru this.. Thats all I need > to get started. > > Thanks > |