Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Hadoop noob question


+
Rahul Bhattacharjee 2013-05-11, 15:41
+
Nitin Pawar 2013-05-11, 15:50
+
Mohammad Tariq 2013-05-11, 16:03
+
Nitin Pawar 2013-05-11, 16:05
+
Shahab Yunus 2013-05-11, 16:10
+
Mohammad Tariq 2013-05-11, 17:04
+
Rahul Bhattacharjee 2013-05-11, 17:16
+
Mohammad Tariq 2013-05-11, 17:22
+
Rahul Bhattacharjee 2013-05-12, 12:30
Copy link to this message
-
Re: Hadoop noob question
Mohammad Tariq 2013-05-12, 12:10
@Rahul : I'm sorry I answered this on a wrong thread by mistake. You could
do that as Nitin has shown.

Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> you can do that using file:///
>
> example:
>
> hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
>
>
>
> On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>
>> @Tariq can you point me to some resource which shows how distcp is used
>> to upload files from local to hdfs.
>>
>> isn't distcp a MR job ? wouldn't it need the data to be already present
>> in the hadoop's fs?
>>
>>  Rahul
>>
>>
>> On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> You'r welcome :)
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Thanks Tariq!
>>>>
>>>>
>>>> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> @Rahul : Yes. distcp can do that.
>>>>>
>>>>> And, bigger the files lesser the metadata hence lesser memory
>>>>> consumption.
>>>>>
>>>>> Warm Regards,
>>>>> Tariq
>>>>> cloudfront.blogspot.com
>>>>>
>>>>>
>>>>> On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> IMHO,I think the statement about NN with regard to block metadata is
>>>>>> more like a general statement. Even if you put lots of small files of
>>>>>> combined size 10 TB , you need to have a capable NN.
>>>>>>
>>>>>> can disct cp be used to copy local - to - hdfs ?
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Sat, May 11, 2013 at 9:35 PM, Nitin Pawar <[EMAIL PROTECTED]
>>>>>> > wrote:
>>>>>>
>>>>>>> absolutely rite Mohammad
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 11, 2013 at 9:33 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> Sorry for barging in guys. I think Nitin is talking about this :
>>>>>>>>
>>>>>>>> Every file and block in HDFS is treated as an object and for each
>>>>>>>> object around 200B of metadata get created. So the NN should be powerful
>>>>>>>> enough to handle that much metadata, since it is going to be in-memory.
>>>>>>>> Actually memory is the most important metric when it comes to NN.
>>>>>>>>
>>>>>>>> Am I correct @Nitin?
>>>>>>>>
>>>>>>>> @Thoihen : As Nitin has said, when you talk about that much data
>>>>>>>> you don't actually just do a "put". You could use something like "distcp"
>>>>>>>> for parallel copying. A better approach would be to use a data aggregation
>>>>>>>> tool like Flume or Chukwa, as Nitin has already pointed. Facebook uses
>>>>>>>> their own data aggregation tool, called Scribe for this purpose.
>>>>>>>>
>>>>>>>> Warm Regards,
>>>>>>>> Tariq
>>>>>>>> cloudfront.blogspot.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 11, 2013 at 9:20 PM, Nitin Pawar <
>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> NN would still be in picture because it will be writing a lot of
>>>>>>>>> meta data for each individual file. so you will need a NN capable enough
>>>>>>>>> which can store the metadata for your entire dataset. Data will never go to
>>>>>>>>> NN but lot of metadata about data will be on NN so its always good idea to
>>>>>>>>> have a strong NN.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 11, 2013 at 9:11 PM, Rahul Bhattacharjee <
>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>>> @Nitin , parallel dfs to write to hdfs is great , but could not
>>>>>>>>>> understand the meaning of capable NN. As I know , the NN would not be a
>>>>>>>>>> part of the actual data write pipeline , means that the data would not
>>>>>>>>>> travel through the NN , the dfs would contact the NN from time to time to
>>>>>>>>>> get locations of DN as where to store the data blocks.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 11, 2013 at 4:54 PM, Nitin Pawar <