Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Preferred ways to specify input and output directories to Hadoop jobs


Copy link to this message
-
Re: Preferred ways to specify input and output directories to Hadoop jobs
Hi
      When you give in the arguments on CLI in your driver class you are making it assign to mapred.input.dir and mapred.output.dir . I believe no such  default exists in map reduce frame work that would assign the position arguments to input and output dir.  If you don't want this assignment in your driver class from the arguments, you can specify the same from CLI as -D mapred.input.dir = myInputDir and -D mapred.output.dir = myOutputDir . In both cases you are doing the same, no difference.
Choose any that is comfortable for you.
Regards
Bejoy K S

From handheld, Please excuse typos.

-----Original Message-----
From: "W.P. McNeill" <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2012 10:00:55
To: Hadoop Mailing List<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Preferred ways to specify input and output directories to Hadoop jobs

How do you like to specify input and output directories to your Hadoop jobs?

I have been using positional arguments. All but the last argument are input
directories and the last one is an output directory. These override
any mapred.output.dir configuration parameter and augment
any mapred.input.dir. I like positional arguments because it's a very
natural UNIXy way of doing things. However, the more I use this convention,
the more complex it seems to me. For instance, you have to decide what to
do when there's only one positional argument. Or maybe there are scenarios
in which you want the positional input directories to overwrite the
configurational ones. More generally, you have to figure out how to
reconcile positional and configurational arguments. Now I'm leaning towards
only using the mapred.input.dir and mapred.output.dir parameters.

What do other people do?

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB