Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Hadoop noob question


Copy link to this message
-
Re: Hadoop noob question
Rahul Bhattacharjee 2013-05-12, 13:05
Soon after replying I realized something else related to this.

Say we have a single file in HDFS (hdfs configured for default block size
64 MB) and the size of the file is 1 GB. Now if we use distcp to move it
from the current hdfs to another one , then
whether there would be any parallelism or just a single map task would be
fired?

As per what I have read , a mapper is launcher for a complete file or a set
of files. It doesn't operate at block level.So no parallelism even if the
file resides in HDFS.

Thanks,
Rahul
On Sun, May 12, 2013 at 6:28 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:

> yeah you are right I mis read your earlier post.
>
> Thanks,
> Rahul
>
>
> On Sun, May 12, 2013 at 6:25 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> I had said that if you use distcp to copy data *from localFS to HDFS*then you won't be able to exploit parallelism as entire file is present on
>> a single machine. So no multiple TTs.
>>
>> Please comment if you think I am wring somewhere.
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Sun, May 12, 2013 at 6:15 PM, Rahul Bhattacharjee <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Yes , it's a MR job under the hood . my question was that you wrote that
>>> using distcp you loose the benefits  of parallel processing of Hadoop. I
>>> think the MR job of distcp divides files into individual map tasks based on
>>> the total size of the transfer , so multiple mappers would still be spawned
>>> if the size of transfer is huge and they would work in parallel.
>>>
>>> Correct me if there is anything wrong!
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Sun, May 12, 2013 at 6:07 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>>> No. distcp is actually a mapreduce job under the hood.
>>>>
>>>> Warm Regards,
>>>> Tariq
>>>> cloudfront.blogspot.com
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Thanks to both of you!
>>>>>
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> you can do that using file:///
>>>>>>
>>>>>> example:
>>>>>>
>>>>>>
>>>>>> hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> @Tariq can you point me to some resource which shows how distcp is
>>>>>>> used to upload files from local to hdfs.
>>>>>>>
>>>>>>> isn't distcp a MR job ? wouldn't it need the data to be already
>>>>>>> present in the hadoop's fs?
>>>>>>>
>>>>>>>  Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> You'r welcome :)
>>>>>>>>
>>>>>>>> Warm Regards,
>>>>>>>> Tariq
>>>>>>>> cloudfront.blogspot.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee <
>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> Thanks Tariq!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq <
>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>>> @Rahul : Yes. distcp can do that.
>>>>>>>>>>
>>>>>>>>>> And, bigger the files lesser the metadata hence lesser memory
>>>>>>>>>> consumption.
>>>>>>>>>>
>>>>>>>>>> Warm Regards,
>>>>>>>>>> Tariq
>>>>>>>>>> cloudfront.blogspot.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee <
>>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>>
>>>>>>>>>>> IMHO,I think the statement about NN with regard to block
>>>>>>>>>>> metadata is more like a general statement. Even if you put lots of small
>>>>>>>>>>> files of combined size 10 TB , you need to have a capable NN.
>>>>>>>>>>>
>>>>>>>>>>> can disct cp be used to copy local - to - hdfs ?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Rahul
>>>>>>>>>>>
>>>>>>>>>>>