Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 11 to 20 from 257 (0.059s).
Loading phrases to help you
refine your search...
Re: Setting number of mappers according to number of TextInput lines - Hadoop - [mail # user]
...No. The number of lines is not known at planning time. All you know is the size of the blocks. You want to look at mapred.max.split.size .  On Sat, Jun 16, 2012 at 5:31 AM, OndÅ™ej Klimp...
   Author: Edward Capriolo, 2012-06-16, 16:12
Re: Ideal file size - Hadoop - [mail # user]
...It does not matter what the file size is because the file size is split into blocks which is what the NN tracks.  For larger deployments you can go with a large block size like 256MB or...
   Author: Edward Capriolo, 2012-06-06, 14:55
Re: Hadoop with Sharded MySql - Hadoop - [mail # user]
...Maybe you can do some VIEWs or unions or merge tables on the mysql side to overcome the aspect of launching so many sqoop jobs.  On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani  ...
   Author: Edward Capriolo, 2012-06-01, 00:12
Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines - Hadoop - [mail # user]
...We actually were in an Amazon/host it yourself debate with someone. Which prompted us to do some calculations:  http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/myth_busters_op...
   Author: Edward Capriolo, 2012-05-31, 19:22
Re: Problems with block compression using native codecs (Snappy, LZO) and MapFile.Reader.get() - Hadoop - [mail # user]
...if You are getting a SIGSEG it never hurts to try a more recent JVM. 21 has many bug fixes at this point.  On Tue, May 22, 2012 at 11:45 AM, Jason B  wrote: ring t records) ath, pt...
   Author: Edward Capriolo, 2012-05-22, 15:59
Re: Splunk + Hadoop - Hadoop - [mail # user]
...So a while back their was an article: http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-quer y-terabytes-data  I recently did my own take on full text searching you...
   Author: Edward Capriolo, 2012-05-22, 13:56
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3 - Hadoop - [mail # user]
...Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the s...
   Author: Edward Capriolo, 2012-05-03, 15:25
Re: hadoop.tmp.dir with multiple disks - Hadoop - [mail # user]
...Since each hadoop tasks is isolated from others having more tmp directories allows you to isolate that disk bandwidth as well. By listing the disks you give more firepower to shuffle-sorting...
   Author: Edward Capriolo, 2012-04-22, 14:44
Re: Feedback on real world production experience with Flume - Hadoop - [mail # user]
...I think this is valid to talk about for example one need not need a decentralized collector if they can just write log directly to decentralized files in a decentralized file system. In any ...
   Author: Edward Capriolo, 2012-04-22, 14:14
Re: Multiple data centre in Hadoop - Hadoop - [mail # user]
...Hive is beginning to implement Region support where one metastore will manage multiple filesystems and jobtrackers. When a query creates a table it will then be copied to one ore more datace...
   Author: Edward Capriolo, 2012-04-19, 23:43
Hive (610)
Hadoop (257)
Cassandra (60)
HBase (47)
Kafka (9)
MapReduce (6)
Pig (6)
HDFS (2)
Zookeeper (1)
mail # user (255)
issue (2)
last 7 days (0)
last 30 days (0)
last 90 days (2)
last 6 months (2)
last 9 months (257)
Harsh J (537)
Owen O'Malley (402)
Steve Loughran (357)
Todd Lipcon (234)
Eli Collins (181)
Arun C Murthy (157)
Chris Nauroth (129)
Alejandro Abdelnur (121)
Allen Wittenauer (115)
Nigel Daley (112)
Tom White (111)
Daryn Sharp (108)
Konstantin Shvachko (102)
Ted Yu (95)
Aaron Kimball (93)
Edward Capriolo