Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Analysis of Log Files


Copy link to this message
-
Re: Analysis of Log Files
Eugene Kirpichov 2012-07-03, 22:46
Ok, I see, so you need to 1) group and count everything group by date
and product_id => {date, product_id, count} (this is 1 map+reduce) 2)
group this by product_id and get the value of date for which cnt is
highest (this is another 1 map+reduce).
Does this sound sensible?

I'm not sure if this can be efficiently done with just 1 stage of map+reduce.

On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala
<[EMAIL PROTECTED]> wrote:
> i want to find out how many times a product was searched during a day, and
> then select the day when this is highest.
>
> Until now, I have extracted all the required fields from the search string,
> and I am confused about what exactly I should be passing from the mapper to
> the reducer.
>
> On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <[EMAIL PROTECTED]>wrote:
>
>> So you want to compute select max(date) from log group by product?
>> Can you describe how far you have advanced so far and where precisely
>> are you stuck?
>>
>> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala
>> <[EMAIL PROTECTED]> wrote:
>> > I am writing a sample application to analyze some log files of webpage
>> > accesses. Basically, the log files record which products where accessed,
>> > and on what date.
>> > I want to write a MapReduce program to determine on what date was a
>> product
>> > most accessed.
>> > Please share your ideas with me. Thanks!
>>
>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>

--
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov