-Re: Preferred ways to specify input and output directories to Hadoop jobs
bejoy.hadoop@... 2012-02-08, 18:13
When you give in the arguments on CLI in your driver class you are making it assign to mapred.input.dir and mapred.output.dir . I believe no such default exists in map reduce frame work that would assign the position arguments to input and output dir. If you don't want this assignment in your driver class from the arguments, you can specify the same from CLI as -D mapred.input.dir = myInputDir and -D mapred.output.dir = myOutputDir . In both cases you are doing the same, no difference.
Choose any that is comfortable for you.
Bejoy K S
From handheld, Please excuse typos.
From: "W.P. McNeill" <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2012 10:00:55
To: Hadoop Mailing List<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Preferred ways to specify input and output directories to Hadoop jobs
How do you like to specify input and output directories to your Hadoop jobs?
I have been using positional arguments. All but the last argument are input
directories and the last one is an output directory. These override
any mapred.output.dir configuration parameter and augment
any mapred.input.dir. I like positional arguments because it's a very
natural UNIXy way of doing things. However, the more I use this convention,
the more complex it seems to me. For instance, you have to decide what to
do when there's only one positional argument. Or maybe there are scenarios
in which you want the positional input directories to overwrite the
configurational ones. More generally, you have to figure out how to
reconcile positional and configurational arguments. Now I'm leaning towards
only using the mapred.input.dir and mapred.output.dir parameters.
What do other people do?