Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Analysis of Log Files


Copy link to this message
-
Re: Analysis of Log Files
Well, then you can simply do it like this:
Map: emit key=product_id value=date
Reduce for a particular product_id: manually count (in a hashtable)
dates and their counts, return the date with the highest count

Assuming you've started selling products later than computers were
invented, this should be fine w.r.t. performance and memory
consumption :)

On Tue, Jul 3, 2012 at 3:52 PM, Shailesh Samudrala
<[EMAIL PROTECTED]> wrote:
> Yes, I think that is possible, but I'm looking for a 1 MapReduce job
> solution, if possible.
>
> On Tue, Jul 3, 2012 at 3:46 PM, Eugene Kirpichov <[EMAIL PROTECTED]>wrote:
>
>> Ok, I see, so you need to 1) group and count everything group by date
>> and product_id => {date, product_id, count} (this is 1 map+reduce) 2)
>> group this by product_id and get the value of date for which cnt is
>> highest (this is another 1 map+reduce).
>> Does this sound sensible?
>>
>> I'm not sure if this can be efficiently done with just 1 stage of
>> map+reduce.
>>
>> On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala
>> <[EMAIL PROTECTED]> wrote:
>> > i want to find out how many times a product was searched during a day,
>> and
>> > then select the day when this is highest.
>> >
>> > Until now, I have extracted all the required fields from the search
>> string,
>> > and I am confused about what exactly I should be passing from the mapper
>> to
>> > the reducer.
>> >
>> > On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> So you want to compute select max(date) from log group by product?
>> >> Can you describe how far you have advanced so far and where precisely
>> >> are you stuck?
>> >>
>> >> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > I am writing a sample application to analyze some log files of webpage
>> >> > accesses. Basically, the log files record which products where
>> accessed,
>> >> > and on what date.
>> >> > I want to write a MapReduce program to determine on what date was a
>> >> product
>> >> > most accessed.
>> >> > Please share your ideas with me. Thanks!
>> >>
>> >>
>> >>
>> >> --
>> >> Eugene Kirpichov
>> >> http://www.linkedin.com/in/eugenekirpichov
>> >>
>>
>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>

--
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB