Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 81 to 90 from 171 (0.109s).
Loading phrases to help you
refine your search...
Passing binary files in maps - Hadoop - [mail # user]
...Hi,  I need to put a binary file in map and then emit that map. I do it by encoding it as a string using Base64 encoding, so that's fine, but I am dealing with pretty large files, and I...
   Author: Mark Kerzner, 2010-05-28, 04:33
Re: Import the results into SimpleDB - Hadoop - [mail # user]
...I create this text file in Hadoop. Only I want to make the db import a separate Hadoop job, run it in Amazon EMR, and make it fast by running sufficient number of nodes.  Mark  On ...
3 emails [+ more]    Author: Mark Kerzner, 2010-05-12, 02:17
Re: Data-Intensive Text Processing with MapReduce - Hadoop - [mail # user]
...Dear Jimmy and Chris:  I am reading your book (thank you for providing the pre-release version) and I find it great in contents and in style. Thank you!  Sincerely, Mark  On S...
   Author: Mark Kerzner, 2010-05-09, 18:06
Accepting contributions for the "Hadooop in Practice" book - Hadoop - [mail # user]
...Hi, guys,  I am working on this book for Manning , and I need your solutions. If you had a specific problem that you solved with Hadoop, and you can share your solution, even in general...
   Author: Mark Kerzner, 2010-05-04, 23:06
Re: Hadoop Cookbook? - Hadoop - [mail # user]
...Thank you  On Tue, May 4, 2010 at 4:52 AM, Steve Loughran  wrote:  ...
2 emails [+ more]    Author: Mark Kerzner, 2010-05-04, 13:16
leads? - Hadoop - [mail # user]
...Hi, guys,  without imposing, any leads for cloud-based projects will be appreciated, my resume here .  Thank you, Mark...
   Author: Mark Kerzner, 2010-04-27, 14:57
Re: DeDuplication Techniques - Hadoop - [mail # user]
...Joe,  your approach would work, whether you use files to keep old data, or a database. However, it feels like a mix of new and old technologies. It just does not feel right to open a fi...
2 emails [+ more]    Author: Mark Kerzner, 2010-03-26, 00:24
Re: Parallelizing HTTP calls with Hadoop - Hadoop - [mail # user]
...Phil,  what you are describing is close to what Nutch is already doing. You can look at it - all this coding is non-trivial, and you can save yourself a lo t of work and debugging. &nbs...
   Author: Mark Kerzner, 2010-03-07, 14:34
Re: Hadoop as master's thesis - Hadoop - [mail # user]
...Tonci,  here are Enron email files used in the litigation that they had: http://edrm.net/resources/data-sets/enron-data-set-files  Here is much more stuff: http://infochimps.org/ &...
2 emails [+ more]    Author: Mark Kerzner, 2010-03-01, 14:28
Re: HDFS behaving strangely - Hadoop - [mail # user]
...You may be facing the other well-known problem in Hadoop - don't use many small files:  http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/  On Mon, Jan 25, 2010 at 7:...
2 emails [+ more]    Author: Mark Kerzner, 2010-01-26, 03:16
Sort:
project
Hadoop (170)
HBase (29)
MapReduce (13)
Hive (5)
Pig (1)
type
mail # user (170)
mail # general (1)
date
last 7 days (0)
last 30 days (1)
last 90 days (2)
last 6 months (3)
last 9 months (171)
author
Harsh J (555)
Owen O'Malley (394)
Steve Loughran (382)
Todd Lipcon (238)
Eli Collins (182)
Alejandro Abdelnur (164)
Arun C Murthy (162)
Chris Nauroth (142)
Allen Wittenauer (128)
Tom White (120)
Ted Yu (118)
Nigel Daley (115)
Daryn Sharp (110)
Konstantin Shvachko (107)
Doug Cutting (96)
Aaron Kimball (94)
Edward Capriolo (87)
Colin Patrick McCabe (86)
Mark Kerzner (86)
jason hadoop (82)
Hairong Kuang (74)
Konstantin Boudnik (72)
Runping Qi (72)
Benoy Antony (69)
Suresh Srinivas (64)