Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 12 (0.092s).
Loading phrases to help you
refine your search...
Re: how to find top N values using map-reduce ? - Hadoop - [mail # user]
...Hi,  Can you tell more about:  * How big is N  * How big is the input dataset  * How many mappers you have  * Do input splits correlate with the sorting criterion fo...
   Author: Eugene Kirpichov, 2013-02-02, 06:23
Re: Analysis of Log Files - Hadoop - [mail # user]
...Well, then you can simply do it like this: Map: emit key=product_id value=date Reduce for a particular product_id: manually count (in a hashtable) dates and their counts, return the date wit...
   Author: Eugene Kirpichov, 2012-07-03, 23:18
Re: Analysis of Log Files - Hadoop - [mail # user]
...Ok, I see, so you need to 1) group and count everything group by date and product_id => {date, product_id, count} (this is 1 map+reduce) 2) group this by product_id and get the value of d...
   Author: Eugene Kirpichov, 2012-07-03, 22:46
Re: Analysis of Log Files - Hadoop - [mail # user]
...So you want to compute select max(date) from log group by product? Can you describe how far you have advanced so far and where precisely are you stuck?  On Tue, Jul 3, 2012 at 3:23 PM, ...
   Author: Eugene Kirpichov, 2012-07-03, 22:30
Re: execute millions of "grep" - Hadoop - [mail # user]
...Hi,  Hm - I was not assuming that you'll have >20mln queries. However, you can partition the query space into as many parts as needed to fit each part in memory, and accordingly incr...
   Author: Eugene Kirpichov, 2011-11-03, 12:16
Re: execute millions of "grep" - Hadoop - [mail # user]
...Hi Oliver, I have solved a similar problem before, and it seems to me that the best sol ution is to build an automaton (with whole words, not letters on edges) on t he set of queries (basica...
   Author: Eugene Kirpichov, 2011-11-03, 11:37
Re: execute millions of "grep" - Hadoop - [mail # user]
...If you really need to do millions of exact text queries against millions of documents in realtime, a simple grep is not going to be sufficient for you. You'll need smarter datastructures and...
   Author: Eugene Kirpichov, 2011-11-03, 10:52
Re: data locality - Hadoop - [mail # user]
...Thanks!  2011/10/26 Steve Loughran :    Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/...
   Author: Eugene Kirpichov, 2011-10-26, 09:22
Re: data locality - Hadoop - [mail # user]
...But I guess it isn't always possible to achieve optimal scheduling, right? What's done then; any account for network topology perhaps?    26.10.2011, в 4:42, Mapred Learn  нап...
   Author: Eugene Kirpichov, 2011-10-26, 04:22
Re: Hbase + mapreduce -- operational design question - Hadoop - [mail # user]
...I believe HBase has some kind of TTL (timeout-based expiry) for records and it can clean them up on its own.  On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay  wrote: . Also, pe...
   Author: Eugene Kirpichov, 2011-09-10, 09:23
Sort:
project
Hadoop (12)
type
mail # user (12)
date
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (1)
last 9 months (12)
author
Harsh J (1377)
Steve Loughran (931)
Owen O'Malley (816)
Todd Lipcon (756)
Arun C Murthy (575)
Eli Collins (513)
Allen Wittenauer (461)
Doug Cutting (344)
Konstantin Boudnik (335)
Mark Kerzner (334)
Edward Capriolo (328)
Ted Dunning (321)
Brian Bockelman (305)
Tom White (303)
jason hadoop (279)
Eugene Kirpichov