Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 36 (0.417s).
Loading phrases to help you
refine your search...
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
   Author: Niels Basjes, 2013-05-19, 09:03
[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
http://issues.apache.org/jira/browse/MAPREDUCE-2094    Author: Niels Basjes, 2013-05-03, 19:26
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
   Author: Niels Basjes, 2013-04-19, 08:21
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
   Author: Niels Basjes, 2013-02-02, 12:44
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer - MapReduce - [mail # user]
...F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost.  (this is what I did recently with a multi GiB dataset)  Met vriendelijke g...
   Author: Niels Basjes, 2012-12-30, 19:38
Re: Doubts on compressed file - MapReduce - [mail # user]
...Hi,   Yes.   Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "th...
   Author: Niels Basjes, 2012-11-07, 12:47
Re: Hadoop Real time help - MapReduce - [mail # user]
...Thanks for the pointers, I have stuff to read now :)  On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux  wrote:    Best regards / Met vriendelijke groeten,  Niels Ba...
   Author: Niels Basjes, 2012-08-22, 18:21
Re: Hadoop Real time help - MapReduce - [mail # user]
...Is there a "complete" overview of the tools that allow processing streams of data in realtime?  Or even better; what are the terms to google for?  Met vriendelijke groet, Niels Bas...
   Author: Niels Basjes, 2012-08-19, 19:44
Re: output/input ratio > 1 for map tasks? - MapReduce - [mail # user]
...Hi,  On Mon, Jul 30, 2012 at 8:47 PM, brisk  wrote:  For a simple example: Have a look at the WordCount example.  Input of a single map call is 1 record: "This is a line"...
   Author: Niels Basjes, 2012-07-30, 20:15
Making gzip splittable for Hadoop - MapReduce - [mail # user]
...Hi,  In many Hadoop production environments you get gzipped files as the raw input. Usually these are Apache HTTPD logfiles. When putting these gzipped files into Hadoop you are stuck w...
   Author: Niels Basjes, 2012-03-30, 14:07
Sort:
project
Hadoop (58)
MapReduce (36)
Pig (13)
HBase (4)
HDFS (1)
type
mail # user (35)
issue (1)
date
last 7 days (1)
last 30 days (2)
last 90 days (3)
last 6 months (5)
last 9 months (36)
author
Harsh J (1037)
Arun C Murthy (501)
Vinod Kumar Vavilapalli (351)
Todd Lipcon (283)
Amar Kamat (184)
Mohammad Tariq (174)
Thomas Graves (174)
Owen O'Malley (162)
Hemanth Yamijala (155)
Amareshwari Sriramadasu (153)
Pedro Costa (153)
Ted Yu (148)
Robert Evans (147)
Tom White (138)
Aaron Kimball (131)
Niels Basjes