Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Remote connection bottleneck?


Copy link to this message
-
Re: Remote connection bottleneck?
Mario:
Please produce a jar, place it on one of the servers in the cloud and run
from there.

On Sat, Sep 25, 2010 at 7:46 AM, Raja Thiruvathuru
<[EMAIL PROTECTED]>wrote:

> MapReduce doesn't download the actual data, but it reads meta-data before
> it starts MapReduce job
>
>
> On Sat, Sep 25, 2010 at 7:55 AM, Mario M <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>> I am having a problem that might be expected behaviour. I am using a cloud
>> with Hadoop remotely through ssh. I have a program that runs for about a
>> minute, it processes a 200 MB file using NLineInputFormat and the user
>> decides the number of lines to divide the file. However, before the
>> map-reduce phase starts, the part of the program that divides the input runs
>> locally in my computer, which means that if I use a 100 Mbps connection to
>> access the cloud, it isn't that much of a problem, but in my house with a 1
>> Mbps connection, the program takes about 30 minutes or more to process this
>> input. Apparently it is downloading the full 200 MB, processing them to
>> decide the byte offsets for dividing the file and sending that to the cloud.
>>
>> This 30 minutes startup time kills all the advantages of using mapreduce
>> for us. My question is, is this expected behaviour? Is the InputFormat phase
>> of the program supposed to run locally and not in the cloud? Or am I doing
>> something wrong?  As a contrast, I ran the terasort Hadoop example for 100
>> GB and it took 3-4 minutes of startup and then started the map phase, which
>> clearly shows that it isn't downloading all the information. Terasort
>> doesn't use NLineInputFormat, but still it has to read the files to divide
>> them, or not?
>>
>> Thank you in advance for your time. :)
>>
>> Mario Maqueo
>> ITESM-CEM
>>
>
>
>
> --
>
> Raja Thiruvathuru
>