Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Hadoop noob question


+
Rahul Bhattacharjee 2013-05-11, 16:10
+
Thoihen Maibam 2013-05-11, 10:49
+
Nitin Pawar 2013-05-11, 10:54
+
maisnam ns 2013-05-11, 11:08
+
Nitin Pawar 2013-05-11, 11:24
+
Mohammad Tariq 2013-05-12, 13:42
+
Rahul Bhattacharjee 2013-05-12, 11:53
+
Nitin Pawar 2013-05-12, 12:06
+
Mohammad Tariq 2013-05-12, 12:37
+
Rahul Bhattacharjee 2013-05-12, 12:45
+
Mohammad Tariq 2013-05-12, 12:55
+
Chris Mawata 2013-05-12, 14:21
Copy link to this message
-
Re: Hadoop noob question
Just wanted to bring one thing up.

Using distcp to upload local file to hdfs might not work if launched from a
gateway host.Gateway hosts typically configured to only submit jobs and are
only aware of NN and JT, so mappers running in various data nodes might not
have access to the local fs of data node.

distcp is possible when data is loaded into the local fs of any of the
datanodes and then distcp  is run from there.

Thanks,
Rahul
On Sun, May 12, 2013 at 7:51 PM, Chris Mawata <[EMAIL PROTECTED]>wrote:

>  It is being read sequentially but is it not potentially being written on
> multiple drives and since reading is typically faster than writing don't
> you still get a little benefit of parallelism?
>
>
> On 5/12/2013 8:55 AM, Mohammad Tariq wrote:
>
> I had said that if you use distcp to copy data *from localFS to HDFS*then you won't be able to exploit parallelism as entire file is present on
> a single machine. So no multiple TTs.
>
>  Please comment if you think I am wring somewhere.
>
>  Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sun, May 12, 2013 at 6:15 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>
>>  Yes , it's a MR job under the hood . my question was that you wrote
>> that using distcp you loose the benefits  of parallel processing of Hadoop.
>> I think the MR job of distcp divides files into individual map tasks based
>> on the total size of the transfer , so multiple mappers would still be
>> spawned if the size of transfer is huge and they would work in parallel.
>>
>>  Correct me if there is anything wrong!
>>
>> Thanks,
>> Rahul
>>
>>
>>  On Sun, May 12, 2013 at 6:07 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> No. distcp is actually a mapreduce job under the hood.
>>>
>>>  Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>>  Thanks to both of you!
>>>>
>>>>   Rahul
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> you can do that using file:///
>>>>>
>>>>>  example:
>>>>>
>>>>>  hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>  @Tariq can you point me to some resource which shows how distcp is
>>>>>> used to upload files from local to hdfs.
>>>>>>
>>>>>>  isn't distcp a MR job ? wouldn't it need the data to be already
>>>>>> present in the hadoop's fs?
>>>>>>
>>>>>>   Rahul
>>>>>>
>>>>>>
>>>>>> On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> You'r welcome :)
>>>>>>>
>>>>>>>  Warm Regards,
>>>>>>> Tariq
>>>>>>> cloudfront.blogspot.com
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee <
>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>>>  Thanks Tariq!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq <
>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> @Rahul : Yes. distcp can do that.
>>>>>>>>>
>>>>>>>>>  And, bigger the files lesser the metadata hence lesser memory
>>>>>>>>> consumption.
>>>>>>>>>
>>>>>>>>>  Warm Regards,
>>>>>>>>> Tariq
>>>>>>>>> cloudfront.blogspot.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee <
>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>>>  IMHO,I think the statement about NN with regard to block
>>>>>>>>>> metadata is more like a general statement. Even if you put lots of small
>>>>>>>>>> files of combined size 10 TB , you need to have a capable NN.
>>>>>>>>>>
>>>>>>>>>> can disct cp be used to copy local - to - hdfs ?
>>>>>>>>>>
>>>>>>>>>>  Thanks,
>>>>>>>>>>  Rahul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, May 11, 2013 at 9:35 PM, Nitin Pawar <
>>>>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>>>>>
>>>>>>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB