Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Hadoop noob question


Copy link to this message
-
Re: Hadoop noob question
yeah you are right I mis read your earlier post.

Thanks,
Rahul
On Sun, May 12, 2013 at 6:25 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> I had said that if you use distcp to copy data *from localFS to HDFS*then you won't be able to exploit parallelism as entire file is present on
> a single machine. So no multiple TTs.
>
> Please comment if you think I am wring somewhere.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sun, May 12, 2013 at 6:15 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>
>> Yes , it's a MR job under the hood . my question was that you wrote that
>> using distcp you loose the benefits  of parallel processing of Hadoop. I
>> think the MR job of distcp divides files into individual map tasks based on
>> the total size of the transfer , so multiple mappers would still be spawned
>> if the size of transfer is huge and they would work in parallel.
>>
>> Correct me if there is anything wrong!
>>
>> Thanks,
>> Rahul
>>
>>
>> On Sun, May 12, 2013 at 6:07 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> No. distcp is actually a mapreduce job under the hood.
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Thanks to both of you!
>>>>
>>>> Rahul
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> you can do that using file:///
>>>>>
>>>>> example:
>>>>>
>>>>>
>>>>> hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> @Tariq can you point me to some resource which shows how distcp is
>>>>>> used to upload files from local to hdfs.
>>>>>>
>>>>>> isn't distcp a MR job ? wouldn't it need the data to be already
>>>>>> present in the hadoop's fs?
>>>>>>
>>>>>>  Rahul
>>>>>>
>>>>>>
>>>>>> On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> You'r welcome :)
>>>>>>>
>>>>>>> Warm Regards,
>>>>>>> Tariq
>>>>>>> cloudfront.blogspot.com
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee <
>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>>> Thanks Tariq!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq <
>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> @Rahul : Yes. distcp can do that.
>>>>>>>>>
>>>>>>>>> And, bigger the files lesser the metadata hence lesser memory
>>>>>>>>> consumption.
>>>>>>>>>
>>>>>>>>> Warm Regards,
>>>>>>>>> Tariq
>>>>>>>>> cloudfront.blogspot.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee <
>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>>> IMHO,I think the statement about NN with regard to block metadata
>>>>>>>>>> is more like a general statement. Even if you put lots of small files of
>>>>>>>>>> combined size 10 TB , you need to have a capable NN.
>>>>>>>>>>
>>>>>>>>>> can disct cp be used to copy local - to - hdfs ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 11, 2013 at 9:35 PM, Nitin Pawar <
>>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>>
>>>>>>>>>>> absolutely rite Mohammad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, May 11, 2013 at 9:33 PM, Mohammad Tariq <
>>>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sorry for barging in guys. I think Nitin is talking about this :
>>>>>>>>>>>>
>>>>>>>>>>>> Every file and block in HDFS is treated as an object and for
>>>>>>>>>>>> each object around 200B of metadata get created. So the NN should be
>>>>>>>>>>>> powerful enough to handle that much metadata, since it is going to be
>>>>>>>>>>>> in-memory. Actually memory is the most important metric when it comes to
>>>>>>>>>>>> NN.
>>>>>>>>>>>>
>>>>>>>>>>>> Am I correct @Nitin?