Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Analysis of Log Files


Copy link to this message
-
Re: Analysis of Log Files
Well, then you can simply do it like this:
Map: emit key=product_id value=date
Reduce for a particular product_id: manually count (in a hashtable)
dates and their counts, return the date with the highest count

Assuming you've started selling products later than computers were
invented, this should be fine w.r.t. performance and memory
consumption :)

On Tue, Jul 3, 2012 at 3:52 PM, Shailesh Samudrala
<[EMAIL PROTECTED]> wrote:
> Yes, I think that is possible, but I'm looking for a 1 MapReduce job
> solution, if possible.
>
> On Tue, Jul 3, 2012 at 3:46 PM, Eugene Kirpichov <[EMAIL PROTECTED]>wrote:
>
>> Ok, I see, so you need to 1) group and count everything group by date
>> and product_id => {date, product_id, count} (this is 1 map+reduce) 2)
>> group this by product_id and get the value of date for which cnt is
>> highest (this is another 1 map+reduce).
>> Does this sound sensible?
>>
>> I'm not sure if this can be efficiently done with just 1 stage of
>> map+reduce.
>>
>> On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala
>> <[EMAIL PROTECTED]> wrote:
>> > i want to find out how many times a product was searched during a day,
>> and
>> > then select the day when this is highest.
>> >
>> > Until now, I have extracted all the required fields from the search
>> string,
>> > and I am confused about what exactly I should be passing from the mapper
>> to
>> > the reducer.
>> >
>> > On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> So you want to compute select max(date) from log group by product?
>> >> Can you describe how far you have advanced so far and where precisely
>> >> are you stuck?
>> >>
>> >> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > I am writing a sample application to analyze some log files of webpage
>> >> > accesses. Basically, the log files record which products where
>> accessed,
>> >> > and on what date.
>> >> > I want to write a MapReduce program to determine on what date was a
>> >> product
>> >> > most accessed.
>> >> > Please share your ideas with me. Thanks!
>> >>
>> >>
>> >>
>> >> --
>> >> Eugene Kirpichov
>> >> http://www.linkedin.com/in/eugenekirpichov
>> >>
>>
>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>

--
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov