am new to hadoop and all its derivatives. And I am really getting
intimidated by the abundance of information available.
But one thing I have realized is that to start implementing/using hadoop or
distributed codes, one has to basically change the way they think about a
I was wondering if someone can help me in the following.
So, basically (like anyone else) I have a raw data.. I want to parse it and
extract some information and then run some algorithm and save the results.
Lets say I have a text file "foo.txt" where data is like:
1, 200, grrrr,2012:12:2:13:00:00
1, 2, ajf, 2012:12:22:13:56:00
As you can see that the id can be repeated.This id can be like how much
money a customer has spent!! What I want to do is save the result in a file
which contains how much money each of the customer has spent in
"morning","afternoon""evening""night" (You can define your some time
buckets to define what morning and all is. For example here probably
1 is the id, 0--> 0$ spent in morning, 202 in afternon, 0 in evening and night
Now I have a python code for it.. But I have to implement this in pig.. to
get started. If anyone can just write/guide me thru this.. Thats all I need
to get started.