Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Analysis of Log Files


Copy link to this message
-
Re: Analysis of Log Files
Ok, I see, so you need to 1) group and count everything group by date
and product_id => {date, product_id, count} (this is 1 map+reduce) 2)
group this by product_id and get the value of date for which cnt is
highest (this is another 1 map+reduce).
Does this sound sensible?

I'm not sure if this can be efficiently done with just 1 stage of map+reduce.

On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala
<[EMAIL PROTECTED]> wrote:
> i want to find out how many times a product was searched during a day, and
> then select the day when this is highest.
>
> Until now, I have extracted all the required fields from the search string,
> and I am confused about what exactly I should be passing from the mapper to
> the reducer.
>
> On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <[EMAIL PROTECTED]>wrote:
>
>> So you want to compute select max(date) from log group by product?
>> Can you describe how far you have advanced so far and where precisely
>> are you stuck?
>>
>> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala
>> <[EMAIL PROTECTED]> wrote:
>> > I am writing a sample application to analyze some log files of webpage
>> > accesses. Basically, the log files record which products where accessed,
>> > and on what date.
>> > I want to write a MapReduce program to determine on what date was a
>> product
>> > most accessed.
>> > Please share your ideas with me. Thanks!
>>
>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>

--
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB