Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 35 (0.064s).
Loading phrases to help you
refine your search...
Re: stop generating these "part-XXXX" empty files when using MultipleOutputs in mapreduce job - MapReduce - [mail # user]
...Use the LazyOutputFormat.  Have a look at this: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html and http://stackoverflow.com/quest...
   Author: Niels Basjes, 2013-10-28, 19:31
Generating mysql or sqlite datafiles from Hadoop (Java)? - MapReduce - [mail # user]
...Hi,  I remember hearing a while ago that (if I remember correctly) Facebook had an outputformat that wrote the underlying MySQL database files directly from a MapReduce job.  For m...
   Author: Niels Basjes, 2013-09-17, 14:39
Re: Why LineRecordWriter.write(..) is synchronized - MapReduce - [mail # user]
...I expect the impact on the IO speed to be almost 0 because waiting for a single disk seek is longer than many thousands of calls to a synchronized method.  Niels On Aug 11, 2013 3:00 PM...
   Author: Niels Basjes, 2013-08-11, 16:02
Re: Is there any way to use a hdfs file as a Circular buffer? - MapReduce - [mail # user]
...A circular file on hdfs is not possible.  Some of the ways around this limitation: - Create a series of files and delete the oldest file when you have too much. - Put the data into an h...
   Author: Niels Basjes, 2013-07-24, 22:22
Re: gz containing null chars? - MapReduce - [mail # user]
...My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the ...
   Author: Niels Basjes, 2013-06-10, 20:27
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
...I've installed CentOS on several different types of old (originally Windows XP)  Dell desktops for the last 4 years (i.e. desktops as old as 7 years ago) and so far installing CentOS wa...
   Author: Niels Basjes, 2013-06-01, 20:27
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
   Author: Niels Basjes, 2013-05-19, 09:03
[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
http://issues.apache.org/jira/browse/MAPREDUCE-2094    Author: Niels Basjes, 2013-05-03, 19:26
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
   Author: Niels Basjes, 2013-04-19, 08:21
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
   Author: Niels Basjes, 2013-02-02, 12:44
Hadoop (38)
MapReduce (33)
Pig (11)
HBase (6)
HDFS (3)
Cassandra (1)
mail # user (34)
issue (1)
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (1)
last 9 months (35)
Harsh J (454)
Arun C Murthy (325)
Vinod Kumar Vavilapalli (307)
Todd Lipcon (197)
Amar Kamat (181)
Thomas Graves (164)
Amareshwari Sriramadasu (153)
Jason Lowe (150)
Owen O'Malley (126)
Sandy Ryza (123)
Tom White (111)
Siddharth Seth (109)
Aaron Kimball (107)
Ramya Sunil (103)
Devaraj K (102)
Niels Basjes