|
|
Mohit Anchlia 2012-10-31, 00:10
With respect to replication if I run pig job from one of the nodes within the Hadoop cluster then do I always end up with writing 1 replica copy to that client node always and remaining 2 replica copies to other nodes?
+
Mohit Anchlia 2012-10-31, 00:10
ranjith raghunath 2012-10-31, 00:36
If your client node is a datanode with your cluster then the first copy does get written to that data node.
Experts please feel free to correct me here. On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> With respect to replication if I run pig job from one of the nodes within > the Hadoop cluster then do I always end up with writing 1 replica copy to > that client node always and remaining 2 replica copies to other nodes? > >
+
ranjith raghunath 2012-10-31, 00:36
Mohit Anchlia 2012-10-31, 01:13
Thanks and if it is not the datanode then I am guessing namenode decides the nodes in replication pipeline?
On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath < [EMAIL PROTECTED]> wrote:
> If your client node is a datanode with your cluster then the first copy > does get written to that data node. > > Experts please feel free to correct me here. > On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote: > >> With respect to replication if I run pig job from one of the nodes within >> the Hadoop cluster then do I always end up with writing 1 replica copy to >> that client node always and remaining 2 replica copies to other nodes? >> >> >
+
Mohit Anchlia 2012-10-31, 01:13
Harsh J 2012-10-31, 05:20
Hi,
Yes if you are purely a regular client (non DN box) writing to HDFS, then the chosen DNs are selected at random (but fit within policy of cross-rack writes, if it applies to your environment).
On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Thanks and if it is not the datanode then I am guessing namenode decides the > nodes in replication pipeline? > > > On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath > <[EMAIL PROTECTED]> wrote: >> >> If your client node is a datanode with your cluster then the first copy >> does get written to that data node. >> >> Experts please feel free to correct me here. >> >> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote: >>> >>> With respect to replication if I run pig job from one of the nodes within >>> the Hadoop cluster then do I always end up with writing 1 replica copy to >>> that client node always and remaining 2 replica copies to other nodes? >>> > >
-- Harsh J
+
Harsh J 2012-10-31, 05:20
ranjith raghunath 2012-10-31, 04:19
The namenode does decide the replica for either case. It just so happens that when running from a datanode the first replica is housed on the same node. Hope this makes sense. On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> Thanks and if it is not the datanode then I am guessing namenode decides > the nodes in replication pipeline? > > On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath < > [EMAIL PROTECTED]> wrote: > >> If your client node is a datanode with your cluster then the first copy >> does get written to that data node. >> >> Experts please feel free to correct me here. >> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote: >> >>> With respect to replication if I run pig job from one of the nodes >>> within the Hadoop cluster then do I always end up with writing 1 replica >>> copy to that client node always and remaining 2 replica copies to other >>> nodes? >>> >>> >> >
+
ranjith raghunath 2012-10-31, 04:19
|
|