Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 39 (0.133s).
Loading phrases to help you
refine your search...
Re: gz containing null chars? - MapReduce - [mail # user]
...My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the ...
   Author: Niels Basjes, 2013-06-10, 20:27
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
...I've installed CentOS on several different types of old (originally Windows XP)  Dell desktops for the last 4 years (i.e. desktops as old as 7 years ago) and so far installing CentOS wa...
   Author: Niels Basjes, 2013-06-01, 20:27
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
...My first suggestion is to go for CentOS as it is free and almost the same as RHEL. Also having a 64 bit OS lets you use a bit more of the installed memory  Then if you can simply instal...
   Author: Niels Basjes, 2013-06-01, 19:52
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
   Author: Niels Basjes, 2013-05-19, 09:03
[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
http://issues.apache.org/jira/browse/MAPREDUCE-2094    Author: Niels Basjes, 2013-05-03, 19:26
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
   Author: Niels Basjes, 2013-04-19, 08:21
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
   Author: Niels Basjes, 2013-02-02, 12:44
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer - MapReduce - [mail # user]
...F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost.  (this is what I did recently with a multi GiB dataset)  Met vriendelijke g...
   Author: Niels Basjes, 2012-12-30, 19:38
Re: Doubts on compressed file - MapReduce - [mail # user]
...Hi,   Yes.   Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "th...
   Author: Niels Basjes, 2012-11-07, 12:47
Re: Hadoop Real time help - MapReduce - [mail # user]
...Thanks for the pointers, I have stuff to read now :)  On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux  wrote:    Best regards / Met vriendelijke groeten,  Niels Ba...
   Author: Niels Basjes, 2012-08-22, 18:21
Sort:
project
Hadoop (58)
MapReduce (39)
Pig (15)
HBase (4)
HDFS (2)
type
mail # user (38)
issue (1)
date
last 7 days (0)
last 30 days (3)
last 90 days (6)
last 6 months (8)
last 9 months (39)
author
Harsh J (1066)
Arun C Murthy (516)
Vinod Kumar Vavilapalli (354)
Todd Lipcon (283)
Amar Kamat (184)
Mohammad Tariq (177)
Thomas Graves (176)
Owen O'Malley (162)
Hemanth Yamijala (155)
Amareshwari Sriramadasu (153)
Pedro Costa (153)
Robert Evans (151)
Ted Yu (148)
Tom White (138)
Alejandro Abdelnur (132)
Niels Basjes