Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 29 (0.125s).
Loading phrases to help you
refine your search...
Re: How to sort in a WordCount - Hadoop - [mail # user]
...You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted...
   Author: Kai Voigt, 2014-08-17, 04:51
Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode? - Hadoop - [mail # user]
...3. seems a biased and incomplete statement.Cloudera’s distribution CDH is fully open source. The proprietary „stuff" you refer to is most likely Cloudera Manager, an additional tool to make ...
   Author: Kai Voigt, 2014-08-12, 21:11
Re: about rack awareness - Hadoop - [mail # user]
...Rack Awareness actually should be called Switch Awareness, and that¡¯s what people typically do: Nodes in a rack are at the same switch, also you should have balanced capacity across racks/s...
   Author: Kai Voigt, 2014-07-04, 03:44
Re: Map  reduce Query - Hadoop - [mail # user]
...That’s exactly what MapReduce does. The input is processed by the mapper function, and its output will be automatically sent into the reducer function. Between mappers and reducers we have t...
   Author: Kai Voigt, 2014-06-19, 10:07
Re: Counters in MapReduce - Hadoop - [mail # user]
...Like you said, just wrap your 3 jobs into a while loop and check the built-in counters, like the number of reduce output records to check if the job output was empty.Unfortunately, oozie can...
   Author: Kai Voigt, 2014-06-09, 09:47
Re: - Hadoop - [mail # user]
...In my opinion, another 2782829 times, give or take a few.  Or try reading and understanding http://hadoop.apache.org/mailing_lists.html otherwise which tells you to send an email to [EM...
   Author: Kai Voigt, 2013-03-06, 13:04
[expand - 1 more] - Re: aggregation by time window - Hadoop - [mail # user]
...Hi again,  the idea is that you emit every event multiple times. So your map input record (event1, 10:07) will be emitted seven times during the map() call. Like I said, (10:04,event1),...
   Author: Kai Voigt, 2013-01-28, 13:48
Re: Get the name of node where mapper is running - Hadoop - [mail # user]
...Hello,  the JobTracker has a built-in Web UI (http://hostname_of_jobtracker:50030/) where you can get details for all completed and running jobs. For the map phase, it will tell you on ...
   Author: Kai Voigt, 2012-11-21, 18:06
Re: Transfer large file >50Gb with DistCp from s3 to cluster - Hadoop - [mail # user]
...Hi,  my guess is that you run "hadoop distcp" on one of the datanodes... In  that case, the node will get the first replica of each block. But you  should also see copies on m...
   Author: Kai Voigt, 2012-09-04, 20:09
[expand - 1 more] - Re: Hadoop or HBase - Hadoop - [mail # user]
...Having a distributed filesystem doesn't save you from having backups. If  someone deletes a file in HDFS, it's gone.  What backend storage is supported by your CMS?  Kai  ...
   Author: Kai Voigt, 2012-08-28, 10:18
Hadoop (29)
MapReduce (11)
HDFS (5)
Hive (1)
Pig (1)
Sqoop (1)
mail # user (27)
mail # dev (2)
last 7 days (0)
last 30 days (0)
last 90 days (2)
last 6 months (5)
last 9 months (29)
Harsh J (559)
Owen O'Malley (394)
Steve Loughran (390)
Todd Lipcon (238)
Eli Collins (182)
Alejandro Abdelnur (178)
Arun C Murthy (163)
Allen Wittenauer (148)
Chris Nauroth (146)
Ted Yu (126)
Tom White (120)
Daryn Sharp (115)
Nigel Daley (115)
Konstantin Shvachko (107)
Doug Cutting (96)
Aaron Kimball (94)
Colin Patrick McCabe (93)
Edward Capriolo (88)
Mark Kerzner (87)
jason hadoop (82)
Hairong Kuang (74)
Konstantin Boudnik (72)
Runping Qi (72)
Benoy Antony (70)
Suresh Srinivas (65)
Kai Voigt