I'd recommend reading Eric Sammer's "Hadoop Operations" (O'Reilly)
book. It goes over a lot of this stuff - building, monitoring, tuning,
If your goal is just speed and quicker results, and not retention or
safety, by all means use replication factor as 1. Note that its
difficult for us to suggest configs unless you also share your
use-case (in brief) or goals. While the software is highly tunable, a
lot of tweaks depend on what you are planning to do.
On Fri, Sep 6, 2013 at 6:11 AM, Sundeep Kambhampati
<[EMAIL PROTECTED]> wrote:
> Hi all,
> I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
> Thank you for your time.