|
Austin Chungath
2012-05-03, 06:11
Nitin Pawar
2012-05-03, 06:53
Austin Chungath
2012-05-03, 07:21
Nitin Pawar
2012-05-03, 09:42
Austin Chungath
2012-05-03, 09:51
Prashant Kommireddi
2012-05-03, 09:55
Austin Chungath
2012-05-03, 10:25
Michel Segel
2012-05-03, 10:40
Austin Chungath
2012-05-03, 10:46
Michel Segel
2012-05-03, 11:25
Edward Capriolo
2012-05-03, 15:25
Suresh Srinivas
2012-05-03, 16:26
Michel Segel
2012-05-03, 23:00
Austin Chungath
2012-05-07, 10:27
Austin Chungath
2012-05-07, 11:14
Nitin Pawar
2012-05-07, 11:29
Adam Faris
2012-05-07, 14:37
Austin Chungath
2012-05-08, 05:55
Adam Faris
2012-05-08, 18:22
Austin Chungath
2012-05-09, 11:25
|
-
Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-03, 06:11
Hi,
I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks & Regards, Austin +
Austin Chungath 2012-05-03, 06:11
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Nitin Pawar 2012-05-03, 06:53
i can think of following options
1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did a copyFromLocal On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > Hi, > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > I don't want to lose the data that is in the HDFS of Apache hadoop > 0.20.205. > How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. > What is the best practice/ techniques to do this? > > Thanks & Regards, > Austin > -- Nitin Pawar +
Nitin Pawar 2012-05-03, 06:53
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-03, 07:21
Thanks for the suggestions,
My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, which is based on 0.20 Now it is actually a downgrade as 0.20.205's namenode info has to be used by 0.20's namenode. Any idea how I can achieve what I am trying to do? Thanks. On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote: > i can think of following options > > 1) write a simple get and put code which gets the data from DFS and loads > it in dfs > 2) see if the distcp between both versions are compatible > 3) this is what I had done (and my data was hardly few hundred GB) .. did a > dfs -copyToLocal and then in the new grid did a copyFromLocal > > On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > > > Hi, > > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > > I don't want to lose the data that is in the HDFS of Apache hadoop > > 0.20.205. > > How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. > > What is the best practice/ techniques to do this? > > > > Thanks & Regards, > > Austin > > > > > > -- > Nitin Pawar > +
Austin Chungath 2012-05-03, 07:21
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Nitin Pawar 2012-05-03, 09:42
you can actually look at the distcp
http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> wrote: > Thanks for the suggestions, > My concerns are that I can't actually copyToLocal from the dfs because the > data is huge. > > Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a > namenode upgrade. I don't have to copy data out of dfs. > > But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, > which is based on 0.20 > Now it is actually a downgrade as 0.20.205's namenode info has to be used > by 0.20's namenode. > > Any idea how I can achieve what I am trying to do? > > Thanks. > > On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <[EMAIL PROTECTED] > >wrote: > > > i can think of following options > > > > 1) write a simple get and put code which gets the data from DFS and loads > > it in dfs > > 2) see if the distcp between both versions are compatible > > 3) this is what I had done (and my data was hardly few hundred GB) .. > did a > > dfs -copyToLocal and then in the new grid did a copyFromLocal > > > > On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <[EMAIL PROTECTED]> > > wrote: > > > > > Hi, > > > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > > > I don't want to lose the data that is in the HDFS of Apache hadoop > > > 0.20.205. > > > How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. > > > What is the best practice/ techniques to do this? > > > > > > Thanks & Regards, > > > Austin > > > > > > > > > > > -- > > Nitin Pawar > > > -- Nitin Pawar +
Nitin Pawar 2012-05-03, 09:42
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-03, 09:51
There is only one cluster. I am not copying between clusters.
Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I have only 2 TB of free space On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> wrote: > you can actually look at the distcp > > http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > > but this means that you have two different set of clusters available to do > the migration > > On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > > > Thanks for the suggestions, > > My concerns are that I can't actually copyToLocal from the dfs because > the > > data is huge. > > > > Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a > > namenode upgrade. I don't have to copy data out of dfs. > > > > But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, > > which is based on 0.20 > > Now it is actually a downgrade as 0.20.205's namenode info has to be used > > by 0.20's namenode. > > > > Any idea how I can achieve what I am trying to do? > > > > Thanks. > > > > On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <[EMAIL PROTECTED] > > >wrote: > > > > > i can think of following options > > > > > > 1) write a simple get and put code which gets the data from DFS and > loads > > > it in dfs > > > 2) see if the distcp between both versions are compatible > > > 3) this is what I had done (and my data was hardly few hundred GB) .. > > did a > > > dfs -copyToLocal and then in the new grid did a copyFromLocal > > > > > > On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi, > > > > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > > > > I don't want to lose the data that is in the HDFS of Apache hadoop > > > > 0.20.205. > > > > How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. > > > > What is the best practice/ techniques to do this? > > > > > > > > Thanks & Regards, > > > > Austin > > > > > > > > > > > > > > > > -- > > > Nitin Pawar > > > > > > > > > -- > Nitin Pawar > +
Austin Chungath 2012-05-03, 09:51
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Prashant Kommireddi 2012-05-03, 09:55
Seems like a matter of upgrade. I am not a Cloudera user so would not know
much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > There is only one cluster. I am not copying between clusters. > > Say I have a cluster running apache 0.20.205 with 10 TB storage capacity > and has about 8 TB of data. > Now how can I migrate the same cluster to use cdh3 and use that same 8 TB > of data. > > I can't copy 8 TB of data using distcp because I have only 2 TB of free > space > > > On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> > wrote: > > > you can actually look at the distcp > > > > http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > > > > but this means that you have two different set of clusters available to > do > > the migration > > > > On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> > > wrote: > > > > > Thanks for the suggestions, > > > My concerns are that I can't actually copyToLocal from the dfs because > > the > > > data is huge. > > > > > > Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a > > > namenode upgrade. I don't have to copy data out of dfs. > > > > > > But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, > > > which is based on 0.20 > > > Now it is actually a downgrade as 0.20.205's namenode info has to be > used > > > by 0.20's namenode. > > > > > > Any idea how I can achieve what I am trying to do? > > > > > > Thanks. > > > > > > On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar <[EMAIL PROTECTED] > > > >wrote: > > > > > > > i can think of following options > > > > > > > > 1) write a simple get and put code which gets the data from DFS and > > loads > > > > it in dfs > > > > 2) see if the distcp between both versions are compatible > > > > 3) this is what I had done (and my data was hardly few hundred GB) .. > > > did a > > > > dfs -copyToLocal and then in the new grid did a copyFromLocal > > > > > > > > On Thu, May 3, 2012 at 11:41 AM, Austin Chungath <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > Hi, > > > > > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > > > > > I don't want to lose the data that is in the HDFS of Apache hadoop > > > > > 0.20.205. > > > > > How do I migrate to CDH3u3 but keep the data that I have on > 0.20.205. > > > > > What is the best practice/ techniques to do this? > > > > > > > > > > Thanks & Regards, > > > > > Austin > > > > > > > > > > > > > > > > > > > > > -- > > > > Nitin Pawar > > > > > > > > > > > > > > > -- > > Nitin Pawar > > > +
Prashant Kommireddi 2012-05-03, 09:55
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-03, 10:25
Yes. This was first posted on the cloudera mailing list. There were no
responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say from 0.20 to 0.20.205 but here I am downgrading from 0.20.205 to 0.20 (cdh3) Is this possible? On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > Seems like a matter of upgrade. I am not a Cloudera user so would not know > much, but you might find some help moving this to Cloudera mailing list. > > On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > > > There is only one cluster. I am not copying between clusters. > > > > Say I have a cluster running apache 0.20.205 with 10 TB storage capacity > > and has about 8 TB of data. > > Now how can I migrate the same cluster to use cdh3 and use that same 8 TB > > of data. > > > > I can't copy 8 TB of data using distcp because I have only 2 TB of free > > space > > > > > > On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> > > wrote: > > > > > you can actually look at the distcp > > > > > > http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > > > > > > but this means that you have two different set of clusters available to > > do > > > the migration > > > > > > On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Thanks for the suggestions, > > > > My concerns are that I can't actually copyToLocal from the dfs > because > > > the > > > > data is huge. > > > > > > > > Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a > > > > namenode upgrade. I don't have to copy data out of dfs. > > > > > > > > But here I am having Apache hadoop 0.20.205 and I want to use CDH3 > now, > > > > which is based on 0.20 > > > > Now it is actually a downgrade as 0.20.205's namenode info has to be > > used > > > > by 0.20's namenode. > > > > > > > > Any idea how I can achieve what I am trying to do? > > > > > > > > Thanks. > > > > > > > > On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > i can think of following options > > > > > > > > > > 1) write a simple get and put code which gets the data from DFS and > > > loads > > > > > it in dfs > > > > > 2) see if the distcp between both versions are compatible > > > > > 3) this is what I had done (and my data was hardly few hundred GB) > .. > > > > did a > > > > > dfs -copyToLocal and then in the new grid did a copyFromLocal > > > > > > > > > > On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > I am migrating from Apache hadoop 0.20.205 to CDH3u3. > > > > > > I don't want to lose the data that is in the HDFS of Apache > hadoop > > > > > > 0.20.205. > > > > > > How do I migrate to CDH3u3 but keep the data that I have on > > 0.20.205. > > > > > > What is the best practice/ techniques to do this? > > > > > > > > > > > > Thanks & Regards, > > > > > > Austin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Nitin Pawar > > > > > > > > > > > > > > > > > > > > > -- > > > Nitin Pawar > > > > > > +
Austin Chungath 2012-05-03, 10:25
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Michel Segel 2012-05-03, 10:40
Well, you've kind of painted yourself in to a corner...
Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please tell me this isn't your production cluster... (Strong hint to Strata and Cloudea... You really want to accept my upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > Yes. This was first posted on the cloudera mailing list. There were no > responses. > > But this is not related to cloudera as such. > > cdh3 is based on apache hadoop 0.20 as the base. My data is in apache > hadoop 0.20.205 > > There is an upgrade namenode option when we are migrating to a higher > version say from 0.20 to 0.20.205 > but here I am downgrading from 0.20.205 to 0.20 (cdh3) > Is this possible? > > > On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > >> Seems like a matter of upgrade. I am not a Cloudera user so would not know >> much, but you might find some help moving this to Cloudera mailing list. >> >> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> >> wrote: >> >>> There is only one cluster. I am not copying between clusters. >>> >>> Say I have a cluster running apache 0.20.205 with 10 TB storage capacity >>> and has about 8 TB of data. >>> Now how can I migrate the same cluster to use cdh3 and use that same 8 TB >>> of data. >>> >>> I can't copy 8 TB of data using distcp because I have only 2 TB of free >>> space >>> >>> >>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> >>> wrote: >>> >>>> you can actually look at the distcp >>>> >>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >>>> >>>> but this means that you have two different set of clusters available to >>> do >>>> the migration >>>> >>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> Thanks for the suggestions, >>>>> My concerns are that I can't actually copyToLocal from the dfs >> because >>>> the >>>>> data is huge. >>>>> >>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a >>>>> namenode upgrade. I don't have to copy data out of dfs. >>>>> >>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3 >> now, >>>>> which is based on 0.20 >>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be >>> used >>>>> by 0.20's namenode. >>>>> >>>>> Any idea how I can achieve what I am trying to do? >>>>> >>>>> Thanks. >>>>> >>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < >> [EMAIL PROTECTED] >>>>>> wrote: >>>>> >>>>>> i can think of following options >>>>>> >>>>>> 1) write a simple get and put code which gets the data from DFS and >>>> loads >>>>>> it in dfs >>>>>> 2) see if the distcp between both versions are compatible >>>>>> 3) this is what I had done (and my data was hardly few hundred GB) >> .. >>>>> did a >>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal >>>>>> >>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < >> [EMAIL PROTECTED] >>>> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. >>>>>>> I don't want to lose the data that is in the HDFS of Apache >> hadoop >>>>>>> 0.20.205. >>>>>>> How do I migrate to CDH3u3 but keep the data that I have on >>> 0.20.205. >>>>>>> What is the best practice/ techniques to do this? >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Austin >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >> +
Michel Segel 2012-05-03, 10:40
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-03, 10:46
Yeah I know :-)
and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > Well, you've kind of painted yourself in to a corner... > Not sure why you didn't get a response from the Cloudera lists, but it's a > generic question... > > 8 out of 10 TB. Are you talking effective storage or actual disks? > And please tell me you've already ordered more hardware.. Right? > > And please tell me this isn't your production cluster... > > (Strong hint to Strata and Cloudea... You really want to accept my > upcoming proposal talk... ;-) > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > > > Yes. This was first posted on the cloudera mailing list. There were no > > responses. > > > > But this is not related to cloudera as such. > > > > cdh3 is based on apache hadoop 0.20 as the base. My data is in apache > > hadoop 0.20.205 > > > > There is an upgrade namenode option when we are migrating to a higher > > version say from 0.20 to 0.20.205 > > but here I am downgrading from 0.20.205 to 0.20 (cdh3) > > Is this possible? > > > > > > On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[EMAIL PROTECTED] > >wrote: > > > >> Seems like a matter of upgrade. I am not a Cloudera user so would not > know > >> much, but you might find some help moving this to Cloudera mailing list. > >> > >> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> > >> wrote: > >> > >>> There is only one cluster. I am not copying between clusters. > >>> > >>> Say I have a cluster running apache 0.20.205 with 10 TB storage > capacity > >>> and has about 8 TB of data. > >>> Now how can I migrate the same cluster to use cdh3 and use that same 8 > TB > >>> of data. > >>> > >>> I can't copy 8 TB of data using distcp because I have only 2 TB of free > >>> space > >>> > >>> > >>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>>> you can actually look at the distcp > >>>> > >>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html > >>>> > >>>> but this means that you have two different set of clusters available > to > >>> do > >>>> the migration > >>>> > >>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> > >>>> wrote: > >>>> > >>>>> Thanks for the suggestions, > >>>>> My concerns are that I can't actually copyToLocal from the dfs > >> because > >>>> the > >>>>> data is huge. > >>>>> > >>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a > >>>>> namenode upgrade. I don't have to copy data out of dfs. > >>>>> > >>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3 > >> now, > >>>>> which is based on 0.20 > >>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be > >>> used > >>>>> by 0.20's namenode. > >>>>> > >>>>> Any idea how I can achieve what I am trying to do? > >>>>> > >>>>> Thanks. > >>>>> > >>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < > >> [EMAIL PROTECTED] > >>>>>> wrote: > >>>>> > >>>>>> i can think of following options > >>>>>> > >>>>>> 1) write a simple get and put code which gets the data from DFS and > >>>> loads > >>>>>> it in dfs > >>>>>> 2) see if the distcp between both versions are compatible > >>>>>> 3) this is what I had done (and my data was hardly few hundred GB) > >> .. > >>>>> did a > >>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal > >>>>>> > >>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < > >> [EMAIL PROTECTED] > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. > >>>>>>> I don't want to lose the data that is in the HDFS of Apache > >> hadoop > >>>>>>> 0.20.205. > >>>>>>> How do I migrate to CDH3u3 but keep the data that I have on > >>> 0.20.205. > >>>>>>> What is the best practice/ techniques to do this? +
Austin Chungath 2012-05-03, 10:46
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Michel Segel 2012-05-03, 11:25
Ok... When you get your new hardware...
Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the directories/files to the new cluster. As you gain space, decommission a node, rebalance, add node to new cluster... It's a slow process. Should I remind you to make sure you up you bandwidth setting, and to clean up the hdfs directories when you repurpose the nodes? Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:46 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > Yeah I know :-) > and this is not a production cluster ;-) and yes there is more hardware > coming :-) > > On Thu, May 3, 2012 at 4:10 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> Well, you've kind of painted yourself in to a corner... >> Not sure why you didn't get a response from the Cloudera lists, but it's a >> generic question... >> >> 8 out of 10 TB. Are you talking effective storage or actual disks? >> And please tell me you've already ordered more hardware.. Right? >> >> And please tell me this isn't your production cluster... >> >> (Strong hint to Strata and Cloudea... You really want to accept my >> upcoming proposal talk... ;-) >> >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: >> >>> Yes. This was first posted on the cloudera mailing list. There were no >>> responses. >>> >>> But this is not related to cloudera as such. >>> >>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache >>> hadoop 0.20.205 >>> >>> There is an upgrade namenode option when we are migrating to a higher >>> version say from 0.20 to 0.20.205 >>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) >>> Is this possible? >>> >>> >>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[EMAIL PROTECTED] >>> wrote: >>> >>>> Seems like a matter of upgrade. I am not a Cloudera user so would not >> know >>>> much, but you might find some help moving this to Cloudera mailing list. >>>> >>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> There is only one cluster. I am not copying between clusters. >>>>> >>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage >> capacity >>>>> and has about 8 TB of data. >>>>> Now how can I migrate the same cluster to use cdh3 and use that same 8 >> TB >>>>> of data. >>>>> >>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of free >>>>> space >>>>> >>>>> >>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> you can actually look at the distcp >>>>>> >>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >>>>>> >>>>>> but this means that you have two different set of clusters available >> to >>>>> do >>>>>> the migration >>>>>> >>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the suggestions, >>>>>>> My concerns are that I can't actually copyToLocal from the dfs >>>> because >>>>>> the >>>>>>> data is huge. >>>>>>> >>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a >>>>>>> namenode upgrade. I don't have to copy data out of dfs. >>>>>>> >>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3 >>>> now, >>>>>>> which is based on 0.20 >>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be >>>>> used >>>>>>> by 0.20's namenode. >>>>>>> >>>>>>> Any idea how I can achieve what I am trying to do? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < >>>> [EMAIL PROTECTED] >>>>>>>> wrote: >>>>>>> >>>>>>>> i can think of following options >>>>> +
Michel Segel 2012-05-03, 11:25
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Edward Capriolo 2012-05-03, 15:25
Honestly that is a hassle, going from 205 to cdh3u3 is probably more
or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving stuff. On Thu, May 3, 2012 at 7:25 AM, Michel Segel <[EMAIL PROTECTED]> wrote: > Ok... When you get your new hardware... > > Set up one server as your new NN, JT, SN. > Set up the others as a DN. > (Cloudera CDH3u3) > > On your existing cluster... > Remove your old log files, temp files on HDFS anything you don't need. > This should give you some more space. > Start copying some of the directories/files to the new cluster. > As you gain space, decommission a node, rebalance, add node to new cluster... > > It's a slow process. > > Should I remind you to make sure you up you bandwidth setting, and to clean up the hdfs directories when you repurpose the nodes? > > Does this make sense? > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 3, 2012, at 5:46 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > >> Yeah I know :-) >> and this is not a production cluster ;-) and yes there is more hardware >> coming :-) >> >> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <[EMAIL PROTECTED]>wrote: >> >>> Well, you've kind of painted yourself in to a corner... >>> Not sure why you didn't get a response from the Cloudera lists, but it's a >>> generic question... >>> >>> 8 out of 10 TB. Are you talking effective storage or actual disks? >>> And please tell me you've already ordered more hardware.. Right? >>> >>> And please tell me this isn't your production cluster... >>> >>> (Strong hint to Strata and Cloudea... You really want to accept my >>> upcoming proposal talk... ;-) >>> >>> >>> Sent from a remote device. Please excuse any typos... >>> >>> Mike Segel >>> >>> On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: >>> >>>> Yes. This was first posted on the cloudera mailing list. There were no >>>> responses. >>>> >>>> But this is not related to cloudera as such. >>>> >>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache >>>> hadoop 0.20.205 >>>> >>>> There is an upgrade namenode option when we are migrating to a higher >>>> version say from 0.20 to 0.20.205 >>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) >>>> Is this possible? >>>> >>>> >>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[EMAIL PROTECTED] >>>> wrote: >>>> >>>>> Seems like a matter of upgrade. I am not a Cloudera user so would not >>> know >>>>> much, but you might find some help moving this to Cloudera mailing list. >>>>> >>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> There is only one cluster. I am not copying between clusters. >>>>>> >>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage >>> capacity >>>>>> and has about 8 TB of data. >>>>>> Now how can I migrate the same cluster to use cdh3 and use that same 8 >>> TB >>>>>> of data. >>>>>> >>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of free >>>>>> space >>>>>> >>>>>> >>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >>>>>>> you can actually look at the distcp >>>>>>> >>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >>>>>>> >>>>>>> but this means that you have two different set of clusters available >>> to >>>>>> do >>>>>>> the migration >>>>>>> >>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for the suggestions, >>>>>>>> My concerns are that I can't actually copyToLocal from the dfs >>>>> because >>>>>>> the >>>>>>>> data is huge. >>>>>>>> >>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a >>>>>>>> namenode upgrade. I don't have to copy data out of dfs. +
Edward Capriolo 2012-05-03, 15:25
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Suresh Srinivas 2012-05-03, 16:26
This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor, decommission some of the nodes and create a new cluster with those nodes and do distcp. Could you share with us the reasons you want to migrate from Apache 205? Regards, Suresh On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > Honestly that is a hassle, going from 205 to cdh3u3 is probably more > or a cross-grade then an upgrade or downgrade. I would just stick it > out. But yes like Michael said two clusters on the same gear and > distcp. If you are using RF=3 you could also lower your replication to > rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving > stuff. > > > On Thu, May 3, 2012 at 7:25 AM, Michel Segel <[EMAIL PROTECTED]> > wrote: > > Ok... When you get your new hardware... > > > > Set up one server as your new NN, JT, SN. > > Set up the others as a DN. > > (Cloudera CDH3u3) > > > > On your existing cluster... > > Remove your old log files, temp files on HDFS anything you don't need. > > This should give you some more space. > > Start copying some of the directories/files to the new cluster. > > As you gain space, decommission a node, rebalance, add node to new > cluster... > > > > It's a slow process. > > > > Should I remind you to make sure you up you bandwidth setting, and to > clean up the hdfs directories when you repurpose the nodes? > > > > Does this make sense? > > > > Sent from a remote device. Please excuse any typos... > > > > Mike Segel > > > > On May 3, 2012, at 5:46 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: > > > >> Yeah I know :-) > >> and this is not a production cluster ;-) and yes there is more hardware > >> coming :-) > >> > >> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <[EMAIL PROTECTED] > >wrote: > >> > >>> Well, you've kind of painted yourself in to a corner... > >>> Not sure why you didn't get a response from the Cloudera lists, but > it's a > >>> generic question... > >>> > >>> 8 out of 10 TB. Are you talking effective storage or actual disks? > >>> And please tell me you've already ordered more hardware.. Right? > >>> > >>> And please tell me this isn't your production cluster... > >>> > >>> (Strong hint to Strata and Cloudea... You really want to accept my > >>> upcoming proposal talk... ;-) > >>> > >>> > >>> Sent from a remote device. Please excuse any typos... > >>> > >>> Mike Segel > >>> > >>> On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > >>> > >>>> Yes. This was first posted on the cloudera mailing list. There were no > >>>> responses. > >>>> > >>>> But this is not related to cloudera as such. > >>>> > >>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache > >>>> hadoop 0.20.205 > >>>> > >>>> There is an upgrade namenode option when we are migrating to a higher > >>>> version say from 0.20 to 0.20.205 > >>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) > >>>> Is this possible? > >>>> > >>>> > >>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi < > [EMAIL PROTECTED] > >>>> wrote: > >>>> > >>>>> Seems like a matter of upgrade. I am not a Cloudera user so would not > >>> know > >>>>> much, but you might find some help moving this to Cloudera mailing > list. > >>>>> > >>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[EMAIL PROTECTED]> > >>>>> wrote: > >>>>> > >>>>>> There is only one cluster. I am not copying between clusters. > >>>>>> > >>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage > >>> capacity > >>>>>> and has about 8 TB of data. > >>>>>> Now how can I migrate the same cluster to use cdh3 and use that > same 8 > >>> TB > >>>>>> of data. > >>>>>> > >>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of > free > >>>>>> space > >>>>>> > >>>>>> > >>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar < > [EMAIL PROTECTED]> > >>>>>> wrote: > >>>>>> > >>>>>>> you can actually look at the distcp +
Suresh Srinivas 2012-05-03, 16:26
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Michel Segel 2012-05-03, 23:00
Ok... So riddle me this...
I currently have a replication factor of 3. I reset it to two. What do you have to do to get the replication factor of 3 down to 2? Do I just try to rebalance the nodes? The point is that you are looking at a very small cluster. You may want to start the be cluster with a replication factor of 2 and then when the data is moved over, increase it to a factor of 3. Or maybe not. I do a distcp to. Copy the data and after each distcp, I do an fsck for a sanity check and then remove the files I copied. As I gain more room, I can then slowly drop nodes, do an fsck, rebalance and then repeat. Even though this us a dev cluster, the OP wants to retain the data. There are other options depending on the amount and size of new hardware. I mean make one machine a RAID 5 machine, copy data to it clearing off the cluster. If 8TB was the amount of disk used, that would be 2.6666 TB used. Let's say 3TB. Going raid 5, how much disk is that? So you could fit it on one machine, depending on hardware, or maybe 2 machines... Now you can rebuild initial cluster and then move data back. Then rebuild those machines. Lots of options... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 11:26 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote: > This probably is a more relevant question in CDH mailing lists. That said, > what Edward is suggesting seems reasonable. Reduce replication factor, > decommission some of the nodes and create a new cluster with those nodes > and do distcp. > > Could you share with us the reasons you want to migrate from Apache 205? > > Regards, > Suresh > > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more >> or a cross-grade then an upgrade or downgrade. I would just stick it >> out. But yes like Michael said two clusters on the same gear and >> distcp. If you are using RF=3 you could also lower your replication to >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving >> stuff. >> >> >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <[EMAIL PROTECTED]> >> wrote: >>> Ok... When you get your new hardware... >>> >>> Set up one server as your new NN, JT, SN. >>> Set up the others as a DN. >>> (Cloudera CDH3u3) >>> >>> On your existing cluster... >>> Remove your old log files, temp files on HDFS anything you don't need. >>> This should give you some more space. >>> Start copying some of the directories/files to the new cluster. >>> As you gain space, decommission a node, rebalance, add node to new >> cluster... >>> >>> It's a slow process. >>> >>> Should I remind you to make sure you up you bandwidth setting, and to >> clean up the hdfs directories when you repurpose the nodes? >>> >>> Does this make sense? >>> >>> Sent from a remote device. Please excuse any typos... >>> >>> Mike Segel >>> >>> On May 3, 2012, at 5:46 AM, Austin Chungath <[EMAIL PROTECTED]> wrote: >>> >>>> Yeah I know :-) >>>> and this is not a production cluster ;-) and yes there is more hardware >>>> coming :-) >>>> >>>> On Thu, May 3, 2012 at 4:10 PM, Michel Segel <[EMAIL PROTECTED] >>> wrote: >>>> >>>>> Well, you've kind of painted yourself in to a corner... >>>>> Not sure why you didn't get a response from the Cloudera lists, but >> it's a >>>>> generic question... >>>>> >>>>> 8 out of 10 TB. Are you talking effective storage or actual disks? >>>>> And please tell me you've already ordered more hardware.. Right? >>>>> >>>>> And please tell me this isn't your production cluster... >>>>> >>>>> (Strong hint to Strata and Cloudea... You really want to accept my >>>>> upcoming proposal talk... ;-) >>>>> >>>>> >>>>> Sent from a remote device. Please excuse any typos... >>>>> >>>>> Mike Segel >>>>> >>>>> On May 3, 2012, at 5:25 AM, Austin Chungath <[EMAIL PROTECTED]> >> wrote: >>>>> >>>>>> Yes. This was first posted on the cloudera mailing list. There were no +
Michel Segel 2012-05-03, 23:00
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-07, 10:27
Thanks,
So I decided to try and move using distcp. $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] 12/05/07 14:57:38 INFO tools.DistCp: destPath=hdfs://localhost:8021/tmp_copy With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client 63, server = 61) I found that we can do distcp like above only if both are of the same hadoop version. so I tried: $ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy 12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp] 12/05/07 15:02:44 INFO tools.DistCp: destPath=hdfs://localhost:60070/tmp_copy But this process seemed to be hangs at this stage. What might I be doing wrong? hftp://<dfs.http.address>/<path> hftp://localhost:50070 is dfs.http.address of 0.20.205 hdfs://localhost:60070 is dfs.http.address of cdh3u3 Thanks and regards, Austin On Fri, May 4, 2012 at 4:30 AM, Michel Segel <[EMAIL PROTECTED]>wrote: > Ok... So riddle me this... > I currently have a replication factor of 3. > I reset it to two. > > What do you have to do to get the replication factor of 3 down to 2? > Do I just try to rebalance the nodes? > > The point is that you are looking at a very small cluster. > You may want to start the be cluster with a replication factor of 2 and > then when the data is moved over, increase it to a factor of 3. Or maybe > not. > > I do a distcp to. Copy the data and after each distcp, I do an fsck for a > sanity check and then remove the files I copied. As I gain more room, I can > then slowly drop nodes, do an fsck, rebalance and then repeat. > > Even though this us a dev cluster, the OP wants to retain the data. > > There are other options depending on the amount and size of new hardware. > I mean make one machine a RAID 5 machine, copy data to it clearing off the > cluster. > > If 8TB was the amount of disk used, that would be 2.6666 TB used. > Let's say 3TB. Going raid 5, how much disk is that? So you could fit it > on one machine, depending on hardware, or maybe 2 machines... Now you can > rebuild initial cluster and then move data back. Then rebuild those > machines. Lots of options... ;-) > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 3, 2012, at 11:26 AM, Suresh Srinivas <[EMAIL PROTECTED]> > wrote: > > > This probably is a more relevant question in CDH mailing lists. That > said, > > what Edward is suggesting seems reasonable. Reduce replication factor, > > decommission some of the nodes and create a new cluster with those nodes > > and do distcp. > > > > Could you share with us the reasons you want to migrate from Apache 205? > > > > Regards, > > Suresh > > > > On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo <[EMAIL PROTECTED] > >wrote: > > > >> Honestly that is a hassle, going from 205 to cdh3u3 is probably more > >> or a cross-grade then an upgrade or downgrade. I would just stick it > >> out. But yes like Michael said two clusters on the same gear and > >> distcp. If you are using RF=3 you could also lower your replication to > >> rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving > >> stuff. > >> > >> > >> On Thu, May 3, 2012 at 7:25 AM, Michel Segel <[EMAIL PROTECTED] > > > >> wrote: > >>> Ok... When you get your new hardware... > >>> > >>> Set up one server as your new NN, JT, SN. > >>> Set up the others as a DN. > >>> (Cloudera CDH3u3) > >>> > >>> On your existing cluster... > >>> Remove your old log files, temp files on HDFS anything you don't need. > >>> This should give you some more space. > >>> Start copying some of the directories/files to the new cluster. > >>> As you gain space, decommission a node, rebalance, add node to new > >> cluster... > >>> > >>> It's a slow process. > >>> > >>> Should I remind you to make sure you up you bandwidth setting, and to +
Austin Chungath 2012-05-07, 10:27
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-07, 11:14
ok that was a lame mistake.
$ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy I had spelled hdfs instead of "hftp" $ hadoop distcp hftp://localhost:50070/docs/index.html hftp://localhost:60070/user/hadoop 12/05/07 16:38:09 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/docs/index.html] 12/05/07 16:38:09 INFO tools.DistCp: destPath=hftp://localhost:60070/user/hadoop With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Not supported at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Any idea why this error is coming? I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 (/user/hadoop) Thanks & Regards, Austin On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]> wrote: > Thanks, > > So I decided to try and move using distcp. > > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy > 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] > 12/05/07 14:57:38 INFO tools.DistCp: > destPath=hdfs://localhost:8021/tmp_copy > With failures, global counters are inaccurate; consider running with -i > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client > 63, server = 61) > > I found that we can do distcp like above only if both are of the same > hadoop version. > so I tried: > > $ hadoop distcp hftp://localhost:50070/tmp hdfs://localhost:60070/tmp_copy > 12/05/07 15:02:44 INFO tools.DistCp: srcPaths=[hftp://localhost:50070/tmp] > 12/05/07 15:02:44 INFO tools.DistCp: > destPath=hdfs://localhost:60070/tmp_copy > > But this process seemed to be hangs at this stage. What might I be doing > wrong? > > hftp://<dfs.http.address>/<path> > hftp://localhost:50070 is dfs.http.address of 0.20.205 > hdfs://localhost:60070 is dfs.http.address of cdh3u3 > > Thanks and regards, > Austin > > > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> Ok... So riddle me this... >> I currently have a replication factor of 3. >> I reset it to two. >> >> What do you have to do to get the replication factor of 3 down to 2? >> Do I just try to rebalance the nodes? >> >> The point is that you are looking at a very small cluster. >> You may want to start the be cluster with a replication factor of 2 and >> then when the data is moved over, increase it to a factor of 3. Or maybe >> not. >> >> I do a distcp to. Copy the data and after each distcp, I do an fsck for a >> sanity check and then remove the files I copied. As I gain more room, I can >> then slowly drop nodes, do an fsck, rebalance and then repeat. >> >> Even though this us a dev cluster, the OP wants to retain the data. >> >> There are other options depending on the amount and size of new hardware. >> I mean make one machine a RAID 5 machine, copy data to it clearing off >> the cluster. >> >> If 8TB was the amount of disk used, that would be 2.6666 TB used. >> Let's say 3TB. Going raid 5, how much disk is that? So you could fit it >> on one machine, depending on hardware, or maybe 2 machines... Now you can >> rebuild initial cluster and then move data back. Then rebuild those >> machines. Lots of options... ;-) >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 3, 2012, at 11:26 AM, Suresh Srinivas <[EMAIL PROTECTED]> >> wrote: >> >> > This probably is a more relevant question in CDH mailing lists. That >> said, >> > what Edward is suggesting seems reasonable. Reduce replication factor, >> > decommission some of the nodes and create a new cluster with those nodes +
Austin Chungath 2012-05-07, 11:14
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Nitin Pawar 2012-05-07, 11:29
things to check
1) when you launch distcp jobs all the datanodes of older hdfs are live and connected 2) when you launch distcp no data is being written/moved/deleteed in hdfs 3) you can use option -log to log errors into directory and user -i to ignore errors also u can try using distcp with hdfs protocol instead of hftp ... for more you can refer https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd if it failed there should be some error On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> wrote: > ok that was a lame mistake. > $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy > I had spelled hdfs instead of "hftp" > > $ hadoop distcp hftp://localhost:50070/docs/index.html > hftp://localhost:60070/user/hadoop > 12/05/07 16:38:09 INFO tools.DistCp: > srcPaths=[hftp://localhost:50070/docs/index.html] > 12/05/07 16:38:09 INFO tools.DistCp: > destPath=hftp://localhost:60070/user/hadoop > With failures, global counters are inaccurate; consider running with -i > Copy failed: java.io.IOException: Not supported > at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) > at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) > at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > > Any idea why this error is coming? > I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 > (/user/hadoop) > > Thanks & Regards, > Austin > > On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > > > Thanks, > > > > So I decided to try and move using distcp. > > > > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy > > 12/05/07 14:57:38 INFO tools.DistCp: > srcPaths=[hdfs://localhost:54310/tmp] > > 12/05/07 14:57:38 INFO tools.DistCp: > > destPath=hdfs://localhost:8021/tmp_copy > > With failures, global counters are inaccurate; consider running with -i > > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol > > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client > > > 63, server = 61) > > > > I found that we can do distcp like above only if both are of the same > > hadoop version. > > so I tried: > > > > $ hadoop distcp hftp://localhost:50070/tmp > hdfs://localhost:60070/tmp_copy > > 12/05/07 15:02:44 INFO tools.DistCp: > srcPaths=[hftp://localhost:50070/tmp] > > 12/05/07 15:02:44 INFO tools.DistCp: > > destPath=hdfs://localhost:60070/tmp_copy > > > > But this process seemed to be hangs at this stage. What might I be doing > > wrong? > > > > hftp://<dfs.http.address>/<path> > > hftp://localhost:50070 is dfs.http.address of 0.20.205 > > hdfs://localhost:60070 is dfs.http.address of cdh3u3 > > > > Thanks and regards, > > Austin > > > > > > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <[EMAIL PROTECTED] > >wrote: > > > >> Ok... So riddle me this... > >> I currently have a replication factor of 3. > >> I reset it to two. > >> > >> What do you have to do to get the replication factor of 3 down to 2? > >> Do I just try to rebalance the nodes? > >> > >> The point is that you are looking at a very small cluster. > >> You may want to start the be cluster with a replication factor of 2 and > >> then when the data is moved over, increase it to a factor of 3. Or maybe > >> not. > >> > >> I do a distcp to. Copy the data and after each distcp, I do an fsck for > a > >> sanity check and then remove the files I copied. As I gain more room, I > can > >> then slowly drop nodes, do an fsck, rebalance and then repeat. > >> > >> Even though this us a dev cluster, the OP wants to retain the data. > >> > >> There are other options depending on the amount and size of new > hardware. > >> I mean make one machine a RAID 5 machine, copy data to it clearing off Nitin Pawar +
Nitin Pawar 2012-05-07, 11:29
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Adam Faris 2012-05-07, 14:37
Hi Austin,
I don't know about using CDH3, but we use distcp for moving data between different versions of apache grids and several things come to mind. 1) you should use the -i flag to ignore checksum differences on the blocks. I'm not 100% but want to say hftp doesn't support checksums on the blocks as they go across the wire. 2) you should read from hftp but write to hdfs. Also make sure to check your port numbers. For example I can read from hftp on port 50070 and write to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml and hdfs in core-site.xml on apache releases. 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support security? If security is enabled on 0.20.205 and CDH3 does not support security, you will need to disable security on 0.20.205. This is because you are unable to write from a secure to unsecured grid. 4) use the -m flag to limit your mappers so you don't DDOS your network backbone. 5) why isn't your vender helping you with the data migration? :) Otherwise something like this should get you going. hadoop -i -ppgu -log /tmp/mylog -m 20 distcp hftp://mynamenode.grid.one:50070/path/to/my/src/data hdfs://mynamenode.grid.two:9000/path/to/my/dst -- Adam On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: > things to check > > 1) when you launch distcp jobs all the datanodes of older hdfs are live and > connected > 2) when you launch distcp no data is being written/moved/deleteed in hdfs > 3) you can use option -log to log errors into directory and user -i to > ignore errors > > also u can try using distcp with hdfs protocol instead of hftp ... for > more you can refer > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd > > > > if it failed there should be some error > On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> wrote: > >> ok that was a lame mistake. >> $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy >> I had spelled hdfs instead of "hftp" >> >> $ hadoop distcp hftp://localhost:50070/docs/index.html >> hftp://localhost:60070/user/hadoop >> 12/05/07 16:38:09 INFO tools.DistCp: >> srcPaths=[hftp://localhost:50070/docs/index.html] >> 12/05/07 16:38:09 INFO tools.DistCp: >> destPath=hftp://localhost:60070/user/hadoop >> With failures, global counters are inaccurate; consider running with -i >> Copy failed: java.io.IOException: Not supported >> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) >> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) >> >> Any idea why this error is coming? >> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 >> (/user/hadoop) >> >> Thanks & Regards, >> Austin >> >> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]> >> wrote: >> >>> Thanks, >>> >>> So I decided to try and move using distcp. >>> >>> $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy >>> 12/05/07 14:57:38 INFO tools.DistCp: >> srcPaths=[hdfs://localhost:54310/tmp] >>> 12/05/07 14:57:38 INFO tools.DistCp: >>> destPath=hdfs://localhost:8021/tmp_copy >>> With failures, global counters are inaccurate; consider running with -i >>> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol >>> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client >> >>> 63, server = 61) >>> >>> I found that we can do distcp like above only if both are of the same >>> hadoop version. >>> so I tried: >>> >>> $ hadoop distcp hftp://localhost:50070/tmp >> hdfs://localhost:60070/tmp_copy >>> 12/05/07 15:02:44 INFO tools.DistCp: >> srcPaths=[hftp://localhost:50070/tmp] >>> 12/05/07 15:02:44 INFO tools.DistCp: +
Adam Faris 2012-05-07, 14:37
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-08, 05:55
Thanks Adam,
That was very helpful. Your second point solved my problems :-) The hdfs port number was wrong. I didn't use the option -ppgu what does it do? On Mon, May 7, 2012 at 8:07 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > Hi Austin, > > I don't know about using CDH3, but we use distcp for moving data between > different versions of apache grids and several things come to mind. > > 1) you should use the -i flag to ignore checksum differences on the > blocks. I'm not 100% but want to say hftp doesn't support checksums on the > blocks as they go across the wire. > > 2) you should read from hftp but write to hdfs. Also make sure to check > your port numbers. For example I can read from hftp on port 50070 and > write to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml and > hdfs in core-site.xml on apache releases. > > 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support > security? If security is enabled on 0.20.205 and CDH3 does not support > security, you will need to disable security on 0.20.205. This is because > you are unable to write from a secure to unsecured grid. > > 4) use the -m flag to limit your mappers so you don't DDOS your network > backbone. > > 5) why isn't your vender helping you with the data migration? :) > > Otherwise something like this should get you going. > > hadoop -i -ppgu -log /tmp/mylog -m 20 distcp > hftp://mynamenode.grid.one:50070/path/to/my/src/data > hdfs://mynamenode.grid.two:9000/path/to/my/dst > > -- Adam > > On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: > > > things to check > > > > 1) when you launch distcp jobs all the datanodes of older hdfs are live > and > > connected > > 2) when you launch distcp no data is being written/moved/deleteed in hdfs > > 3) you can use option -log to log errors into directory and user -i to > > ignore errors > > > > also u can try using distcp with hdfs protocol instead of hftp ... for > > more you can refer > > > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd > > > > > > > > if it failed there should be some error > > On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> > wrote: > > > >> ok that was a lame mistake. > >> $ hadoop distcp hftp://localhost:50070/tmp > hftp://localhost:60070/tmp_copy > >> I had spelled hdfs instead of "hftp" > >> > >> $ hadoop distcp hftp://localhost:50070/docs/index.html > >> hftp://localhost:60070/user/hadoop > >> 12/05/07 16:38:09 INFO tools.DistCp: > >> srcPaths=[hftp://localhost:50070/docs/index.html] > >> 12/05/07 16:38:09 INFO tools.DistCp: > >> destPath=hftp://localhost:60070/user/hadoop > >> With failures, global counters are inaccurate; consider running with -i > >> Copy failed: java.io.IOException: Not supported > >> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) > >> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) > >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) > >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > >> > >> Any idea why this error is coming? > >> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 > >> (/user/hadoop) > >> > >> Thanks & Regards, > >> Austin > >> > >> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]> > >> wrote: > >> > >>> Thanks, > >>> > >>> So I decided to try and move using distcp. > >>> > >>> $ hadoop distcp hdfs://localhost:54310/tmp > hdfs://localhost:8021/tmp_copy > >>> 12/05/07 14:57:38 INFO tools.DistCp: > >> srcPaths=[hdfs://localhost:54310/tmp] > >>> 12/05/07 14:57:38 INFO tools.DistCp: > >>> destPath=hdfs://localhost:8021/tmp_copy > >>> With failures, global counters are inaccurate; consider running with -i > >>> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol +
Austin Chungath 2012-05-08, 05:55
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Adam Faris 2012-05-08, 18:22
Hi Austin,
I'm glad that helped out. Regarding the -p flag for distcp, here's the online documentation http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index You can also get this info from running 'hadoop distcp' without any flags. -------- -p[rbugp] Preserve r: replication number b: block size u: user g: group p: permission -------- -- Adam On May 7, 2012, at 10:55 PM, Austin Chungath wrote: > Thanks Adam, > > That was very helpful. Your second point solved my problems :-) > The hdfs port number was wrong. > I didn't use the option -ppgu what does it do? > > > > On Mon, May 7, 2012 at 8:07 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > >> Hi Austin, >> >> I don't know about using CDH3, but we use distcp for moving data between >> different versions of apache grids and several things come to mind. >> >> 1) you should use the -i flag to ignore checksum differences on the >> blocks. I'm not 100% but want to say hftp doesn't support checksums on the >> blocks as they go across the wire. >> >> 2) you should read from hftp but write to hdfs. Also make sure to check >> your port numbers. For example I can read from hftp on port 50070 and >> write to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml and >> hdfs in core-site.xml on apache releases. >> >> 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support >> security? If security is enabled on 0.20.205 and CDH3 does not support >> security, you will need to disable security on 0.20.205. This is because >> you are unable to write from a secure to unsecured grid. >> >> 4) use the -m flag to limit your mappers so you don't DDOS your network >> backbone. >> >> 5) why isn't your vender helping you with the data migration? :) >> >> Otherwise something like this should get you going. >> >> hadoop -i -ppgu -log /tmp/mylog -m 20 distcp >> hftp://mynamenode.grid.one:50070/path/to/my/src/data >> hdfs://mynamenode.grid.two:9000/path/to/my/dst >> >> -- Adam >> >> On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: >> >>> things to check >>> >>> 1) when you launch distcp jobs all the datanodes of older hdfs are live >> and >>> connected >>> 2) when you launch distcp no data is being written/moved/deleteed in hdfs >>> 3) you can use option -log to log errors into directory and user -i to >>> ignore errors >>> >>> also u can try using distcp with hdfs protocol instead of hftp ... for >>> more you can refer >>> >> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd >>> >>> >>> >>> if it failed there should be some error >>> On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> >> wrote: >>> >>>> ok that was a lame mistake. >>>> $ hadoop distcp hftp://localhost:50070/tmp >> hftp://localhost:60070/tmp_copy >>>> I had spelled hdfs instead of "hftp" >>>> >>>> $ hadoop distcp hftp://localhost:50070/docs/index.html >>>> hftp://localhost:60070/user/hadoop >>>> 12/05/07 16:38:09 INFO tools.DistCp: >>>> srcPaths=[hftp://localhost:50070/docs/index.html] >>>> 12/05/07 16:38:09 INFO tools.DistCp: >>>> destPath=hftp://localhost:60070/user/hadoop >>>> With failures, global counters are inaccurate; consider running with -i >>>> Copy failed: java.io.IOException: Not supported >>>> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) >>>> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963) >>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672) >>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) >>>> >>>> Any idea why this error is coming? >>>> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 >>>> (/user/hadoop) +
Adam Faris 2012-05-08, 18:22
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3Austin Chungath 2012-05-09, 11:25
$DuplicationException: Invalid input, there are duplicated files in the
sources: hftp://ub13:50070/tmp/Rtmp1BU9Kb/file6abc6ccb6551/_logs/history, hftp://ub13:50070/tmp/Rtmp3yCJhu/file1ca96d9331/_logs/history Any idea what is the problem here? They are different files how are they conflicting? Thanks & Regards On Tue, May 8, 2012 at 11:52 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > Hi Austin, > > I'm glad that helped out. Regarding the -p flag for distcp, here's the > online documentation > > http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index > > You can also get this info from running 'hadoop distcp' without any flags. > -------- > -p[rbugp] Preserve > r: replication number > b: block size > u: user > g: group > p: permission > -------- > > -- Adam > > On May 7, 2012, at 10:55 PM, Austin Chungath wrote: > > > Thanks Adam, > > > > That was very helpful. Your second point solved my problems :-) > > The hdfs port number was wrong. > > I didn't use the option -ppgu what does it do? > > > > > > > > On Mon, May 7, 2012 at 8:07 PM, Adam Faris <[EMAIL PROTECTED]> wrote: > > > >> Hi Austin, > >> > >> I don't know about using CDH3, but we use distcp for moving data between > >> different versions of apache grids and several things come to mind. > >> > >> 1) you should use the -i flag to ignore checksum differences on the > >> blocks. I'm not 100% but want to say hftp doesn't support checksums on > the > >> blocks as they go across the wire. > >> > >> 2) you should read from hftp but write to hdfs. Also make sure to check > >> your port numbers. For example I can read from hftp on port 50070 and > >> write to hdfs on port 9000. You'll find the hftp port in hdfs-site.xml > and > >> hdfs in core-site.xml on apache releases. > >> > >> 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 > support > >> security? If security is enabled on 0.20.205 and CDH3 does not support > >> security, you will need to disable security on 0.20.205. This is > because > >> you are unable to write from a secure to unsecured grid. > >> > >> 4) use the -m flag to limit your mappers so you don't DDOS your network > >> backbone. > >> > >> 5) why isn't your vender helping you with the data migration? :) > >> > >> Otherwise something like this should get you going. > >> > >> hadoop -i -ppgu -log /tmp/mylog -m 20 distcp > >> hftp://mynamenode.grid.one:50070/path/to/my/src/data > >> hdfs://mynamenode.grid.two:9000/path/to/my/dst > >> > >> -- Adam > >> > >> On May 7, 2012, at 4:29 AM, Nitin Pawar wrote: > >> > >>> things to check > >>> > >>> 1) when you launch distcp jobs all the datanodes of older hdfs are live > >> and > >>> connected > >>> 2) when you launch distcp no data is being written/moved/deleteed in > hdfs > >>> 3) you can use option -log to log errors into directory and user -i to > >>> ignore errors > >>> > >>> also u can try using distcp with hdfs protocol instead of hftp ... for > >>> more you can refer > >>> > >> > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd > >>> > >>> > >>> > >>> if it failed there should be some error > >>> On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> > >> wrote: > >>> > >>>> ok that was a lame mistake. > >>>> $ hadoop distcp hftp://localhost:50070/tmp > >> hftp://localhost:60070/tmp_copy > >>>> I had spelled hdfs instead of "hftp" > >>>> > >>>> $ hadoop distcp hftp://localhost:50070/docs/index.html > >>>> hftp://localhost:60070/user/hadoop > >>>> 12/05/07 16:38:09 INFO tools.DistCp: > >>>> srcPaths=[hftp://localhost:50070/docs/index.html] > >>>> 12/05/07 16:38:09 INFO tools.DistCp: > >>>> destPath=hftp://localhost:60070/user/hadoop > >>>> With failures, global counters are inaccurate; consider running with > -i > >>>> Copy failed: java.io.IOException: Not supported > >>>> at > org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457) +
Austin Chungath 2012-05-09, 11:25
|