| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
36 (0.417s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
|
|
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
|
|
|
Author: Niels Basjes,
2013-05-19, 09:03
|
|
|
[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
|
|
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
|
|
|
http://issues.apache.org/jira/browse/MAPREDUCE-2094
Author: Niels Basjes,
2013-05-03, 19:26
|
|
|
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
|
|
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
|
|
|
Author: Niels Basjes,
2013-04-19, 08:21
|
|
|
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
|
|
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
|
|
|
Author: Niels Basjes,
2013-02-02, 12:44
|
|
|
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer - MapReduce - [mail # user]
|
|
...F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost. (this is what I did recently with a multi GiB dataset) Met vriendelijke g...
|
|
|
Author: Niels Basjes,
2012-12-30, 19:38
|
|
|
Re: Doubts on compressed file - MapReduce - [mail # user]
|
|
...Hi, Yes. Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "th...
|
|
|
Author: Niels Basjes,
2012-11-07, 12:47
|
|
|
Re: Hadoop Real time help - MapReduce - [mail # user]
|
|
...Thanks for the pointers, I have stuff to read now :) On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux wrote: Best regards / Met vriendelijke groeten, Niels Ba...
|
|
|
Author: Niels Basjes,
2012-08-22, 18:21
|
|
|
Re: Hadoop Real time help - MapReduce - [mail # user]
|
|
...Is there a "complete" overview of the tools that allow processing streams of data in realtime? Or even better; what are the terms to google for? Met vriendelijke groet, Niels Bas...
|
|
|
Author: Niels Basjes,
2012-08-19, 19:44
|
|
|
Re: output/input ratio > 1 for map tasks? - MapReduce - [mail # user]
|
|
...Hi, On Mon, Jul 30, 2012 at 8:47 PM, brisk wrote: For a simple example: Have a look at the WordCount example. Input of a single map call is 1 record: "This is a line"...
|
|
|
Author: Niels Basjes,
2012-07-30, 20:15
|
|
|
Making gzip splittable for Hadoop - MapReduce - [mail # user]
|
|
...Hi, In many Hadoop production environments you get gzipped files as the raw input. Usually these are Apache HTTPD logfiles. When putting these gzipped files into Hadoop you are stuck w...
|
|
|
Author: Niels Basjes,
2012-03-30, 14:07
|
|
|
|