| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
39 (0.133s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: gz containing null chars? - MapReduce - [mail # user]
|
|
...My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the ...
|
|
|
Author: Niels Basjes,
2013-06-10, 20:27
|
|
|
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
|
|
...I've installed CentOS on several different types of old (originally Windows XP) Dell desktops for the last 4 years (i.e. desktops as old as 7 years ago) and so far installing CentOS wa...
|
|
|
Author: Niels Basjes,
2013-06-01, 20:27
|
|
|
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
|
|
...My first suggestion is to go for CentOS as it is free and almost the same as RHEL. Also having a 64 bit OS lets you use a bit more of the installed memory Then if you can simply instal...
|
|
|
Author: Niels Basjes,
2013-06-01, 19:52
|
|
|
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
|
|
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
|
|
|
Author: Niels Basjes,
2013-05-19, 09:03
|
|
|
[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
|
|
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
|
|
|
http://issues.apache.org/jira/browse/MAPREDUCE-2094
Author: Niels Basjes,
2013-05-03, 19:26
|
|
|
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
|
|
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
|
|
|
Author: Niels Basjes,
2013-04-19, 08:21
|
|
|
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
|
|
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
|
|
|
Author: Niels Basjes,
2013-02-02, 12:44
|
|
|
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer - MapReduce - [mail # user]
|
|
...F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost. (this is what I did recently with a multi GiB dataset) Met vriendelijke g...
|
|
|
Author: Niels Basjes,
2012-12-30, 19:38
|
|
|
Re: Doubts on compressed file - MapReduce - [mail # user]
|
|
...Hi, Yes. Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "th...
|
|
|
Author: Niels Basjes,
2012-11-07, 12:47
|
|
|
Re: Hadoop Real time help - MapReduce - [mail # user]
|
|
...Thanks for the pointers, I have stuff to read now :) On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux wrote: Best regards / Met vriendelijke groeten, Niels Ba...
|
|
|
Author: Niels Basjes,
2012-08-22, 18:21
|
|
|
|