Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Hadoop noob question


+
Rahul Bhattacharjee 2013-05-11, 16:10
+
Thoihen Maibam 2013-05-11, 10:49
+
Nitin Pawar 2013-05-11, 10:54
+
maisnam ns 2013-05-11, 11:08
+
Nitin Pawar 2013-05-11, 11:24
+
Mohammad Tariq 2013-05-12, 13:42
+
Rahul Bhattacharjee 2013-05-12, 11:53
+
Nitin Pawar 2013-05-12, 12:06
+
Mohammad Tariq 2013-05-12, 12:37
+
Rahul Bhattacharjee 2013-05-12, 12:45
+
Mohammad Tariq 2013-05-12, 12:55
Copy link to this message
-
Re: Hadoop noob question
Chris Mawata 2013-05-12, 14:21
It is being read sequentially but is it not potentially being written on
multiple drives and since reading is typically faster than writing don't
you still get a little benefit of parallelism?

On 5/12/2013 8:55 AM, Mohammad Tariq wrote:
> I had said that if you use distcp to copy data *from localFS to HDFS*
> then you won't be able to exploit parallelism as entire file is
> present on a single machine. So no multiple TTs.
>
> Please comment if you think I am wring somewhere.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com <http://cloudfront.blogspot.com>
>
>
> On Sun, May 12, 2013 at 6:15 PM, Rahul Bhattacharjee
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Yes , it's a MR job under the hood . my question was that you
>     wrote that using distcp you loose the benefits  of parallel
>     processing of Hadoop. I think the MR job of distcp divides files
>     into individual map tasks based on the total size of the transfer
>     , so multiple mappers would still be spawned if the size of
>     transfer is huge and they would work in parallel.
>
>     Correct me if there is anything wrong!
>
>     Thanks,
>     Rahul
>
>
>     On Sun, May 12, 2013 at 6:07 PM, Mohammad Tariq
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>         No. distcp is actually a mapreduce job under the hood.
>
>         Warm Regards,
>         Tariq
>         cloudfront.blogspot.com <http://cloudfront.blogspot.com>
>
>
>         On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee
>         <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>             Thanks to both of you!
>
>             Rahul
>
>
>             On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar
>             <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>             wrote:
>
>                 you can do that using file:///
>
>                 example:
>
>                 |hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
>
>
>
>
>
>
>
>
>
>
>
>
>                 |
>
>
>
>                 On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee
>                 <[EMAIL PROTECTED]
>                 <mailto:[EMAIL PROTECTED]>> wrote:
>
>                     @Tariq can you point me to some resource which
>                     shows how distcp is used to upload files from
>                     local to hdfs.
>
>                     isn't distcp a MR job ? wouldn't it need the data
>                     to be already present in the hadoop's fs?
>
>                     Rahul
>
>
>                     On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq
>                     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>                     wrote:
>
>                         You'r welcome :)
>
>                         Warm Regards,
>                         Tariq
>                         cloudfront.blogspot.com
>                         <http://cloudfront.blogspot.com>
>
>
>                         On Sat, May 11, 2013 at 10:46 PM, Rahul
>                         Bhattacharjee <[EMAIL PROTECTED]
>                         <mailto:[EMAIL PROTECTED]>> wrote:
>
>                             Thanks Tariq!
>
>
>                             On Sat, May 11, 2013 at 10:34 PM, Mohammad
>                             Tariq <[EMAIL PROTECTED]
>                             <mailto:[EMAIL PROTECTED]>> wrote:
>
>                                 @Rahul : Yes. distcp can do that.
>
>                                 And, bigger the files lesser the
>                                 metadata hence lesser memory consumption.
>
>                                 Warm Regards,
>                                 Tariq
>                                 cloudfront.blogspot.com
>                                 <http://cloudfront.blogspot.com>
>
>
>                                 On Sat, May 11, 2013 at 9:40 PM, Rahul
>                                 Bhattacharjee <[EMAIL PROTECTED]
+
Rahul Bhattacharjee 2013-05-16, 14:18