Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Remote connection bottleneck?


Copy link to this message
-
Re: Remote connection bottleneck?
> In the new shell, I wento to the hadoop/bin directory in my computer

Why didn't you issue the command from window which had ssh ?

On Sat, Sep 25, 2010 at 6:53 PM, Mario M <[EMAIL PROTECTED]> wrote:

> Hi,
> what I did was this:
>
> I am working with Cygwin in Windows 7.
>
> - I copied my jar file ITESMCEMdebug.jar to the cluster in the directory
> /home/mariom . (I then connected with the ssh and confirmed that it is
> there).
>
> - I left the ssh window open and opened another cygwin shell.
>
> - In the new shell, I wento to the hadoop/bin directory in my computer, and
> ran:
>
> "bash hadoop jar /home/mariom/ITESMCEMdebug.jar"
>
> (I omitted the arguments just to test, my program outputs the usage
> instructions when called without arguments)
>
> - I got this:
>
> Exception in thread "main" java.io.IOException: Error opening job jar:
> /home/mariom/ITESMCEMdebug.jar
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.io.FileNotFoundException: \home\mariom\ITESMCEMdebug.jar
> (El sistema no puede encontrar la ruta especificada)
>         at java.util.zip.ZipFile.open(Native Method)
>         at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>         at java.util.jar.JarFile.<init>(JarFile.java:133)
>         at java.util.jar.JarFile.<init>(JarFile.java:70)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
>
> - If I run my local jar file with "bash hadoop jar ITESMCEMdebug.jar", it
> works fine (it outputs the usage instructions).
>
> Also, is it ok that I have to write "bash" everytime? The examples I have
> seen just seem to use "hadoop jar etc", I guess this is Cygwin specific,
> otherwise it will say bash: hadoop: command not found.
>
> Thanks again :) for your time.
>
> Mario Maqueo
> ITESM-CEM
>
>
>
> PS: "El sistema no puede encontrar la ruta especificada" = "The system
> can't find the specified route" In case the spanish text might confuse you.
>
>
> 2010/9/25 Ted Yu <[EMAIL PROTECTED]>
>
>> Mario:
>> Can you show us the error when you run the following ?
>> "hadoop jar <route where I placed the file with the ssh connection>
>> <arguments>"
>>
>>
>>
>>  Hello,
>>>> please excuse my ignorance, but how can I run it from there?
>>>> Up to now I've been running the programs with "hadoop jar <localfile>
>>>> <arguments>".
>>>>
>>>> I tried copying the jar to the HDFS and using "hadoop jar <HDFS route>
>>>> <arguments>" but that didn't work (file not found), so I went to the ssh
>>>> connection and copied the jar to my directory in there, but now I don't know
>>>> how to run it from there.  "hadoop jar <route where I placed the file with
>>>> the ssh connection> " didn't work.
>>>>
>>>> I am not very experienced with ssh, so I am sorry if this is basic
>>>> stuff.
>>>>
>>>> Thanks,
>>>>
>>>> Mario Maqueo
>>>> ITESM-CEM
>>>>
>>>> 2010/9/25 Ted Yu <[EMAIL PROTECTED]>
>>>>
>>>> Mario:
>>>>> Please produce a jar, place it on one of the servers in the cloud and
>>>>> run from there.
>>>>>
>>>>>
>>>>> On Sat, Sep 25, 2010 at 7:46 AM, Raja Thiruvathuru <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> MapReduce doesn't download the actual data, but it reads meta-data
>>>>>> before it starts MapReduce job
>>>>>>
>>>>>>
>>>>>> On Sat, Sep 25, 2010 at 7:55 AM, Mario M <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> I am having a problem that might be expected behaviour. I am using a
>>>>>>> cloud with Hadoop remotely through ssh. I have a program that runs for about
>>>>>>> a minute, it processes a 200 MB file using NLineInputFormat and the user
>>>>>>> decides the number of lines to divide the file. However, before the
>>>>>>> map-reduce phase starts, the part of the program that divides the input runs
>>>>>>> locally in my computer, which means that if I use a 100 Mbps connection to
>>>>>>> access the cloud, it isn't that much of a problem, but in my house with a 1
>>>>>>> Mbps connection, the program takes about 30 minutes or more to process this
>>>>>>> input. Apparently it is downloading the full 200 MB, processing them to