i'm on windows using AWS EMR/EC2. i use the ruby client to manipulate AWS EMR.
1. spawn an EMR cluster. this should return a jobflow id (jobflow-id).
ruby elastic-mapreduce --create --name j-med --alive --num-instances
10 --instance-type c1.medium
2. run a job. you need to describe the job parameters in a JSON file.
ruby elastic-mapreduce --jobflow <jobflow-id> --json job.json
the JSON file might look like this.
"Name": "special word count job",
two lines of code + a JSON configuration file specifying input/output
(and other mapred) parameter values. this assumes you got a AWS
account, downloaded and processed all the access keys, installed ruby,
installed aws ruby client, etc.... there's a series on non-trivial
steps to undertake before you can just keep reusing this 2 step
approach (spawn EMR cluster, then submit hadoop job).
imho, i'm not surprised you're asking, because the aws emr
documentation is scattered. the information is there, but you got to
dig and make sense of it yourself. i'm not sure why amazon claims aws
is a billion+ dollar operation, yet, their documentation is weak and
not cohesive (and their support forums are horrible and their aws
evangelists are anything but evangelists).
On Fri, Oct 5, 2012 at 7:15 AM, sudha sadhasivam
<[EMAIL PROTECTED]> wrote:
> We tried to setup hadoop on AWS. The procedure is given. We face problem with the parameters needed for input and output files. Can somebody provide us with a sample exercise with steps for working on hadoop in AWS?
> thanking you
> Dr G Sudha