|
|
-
hdfs system crashes when loading files bigger than local space left
Vitaliy Semochkin 2010-07-14, 11:16
Hi,
I have a small cluster with 5 nodes and one node is working as NameNode as DataNode same time. On NameNode I load amount of data (100GB) to hdfs that is bigger than local space on the node left
Sometimes hadoop allows to load amount of data bigger than local space on the node. Sometimes hadoop crashes and I have to reformat hdfs.
The only robust solution I found for this problem is to remove namenode from slaves list before loading data restart cluster upload data add namenode to slaves list restart cluster
in this case I never had hdfs crush.
Did anyone found more elegant solution for my problem?
Thanks in Advance, Vitaliy
-
Re: hdfs system crashes when loading files bigger than local space left
Allen Wittenauer 2010-07-14, 15:23
On Jul 14, 2010, at 4:16 AM, Vitaliy Semochkin wrote: > Sometimes hadoop allows to load amount of data bigger than local space on the node. > Sometimes hadoop crashes and I have to reformat hdfs.
a) Have you set a reserved size for hdfs?
b) Are you loading data from the datanode?
-
Re: hdfs system crashes when loading files bigger than local space left
Vitaliy Semochkin 2010-07-15, 08:11
>a) Have you set a reserved size for hdfs? Yes. I set 128Mb as reserved size.
b) Are you loading data from the datanode? Yes. But the datanode is running on same node as namenode (i have very small cluster, only 5 servers and wasting one node only for namenode/jobtracker seemed unreasonable to me)
On Wed, Jul 14, 2010 at 7:23 PM, Allen Wittenauer <[EMAIL PROTECTED]>wrote:
> > On Jul 14, 2010, at 4:16 AM, Vitaliy Semochkin wrote: > > Sometimes hadoop allows to load amount of data bigger than local space on > the node. > > Sometimes hadoop crashes and I have to reformat hdfs. > > a) Have you set a reserved size for hdfs? > > b) Are you loading data from the datanode? > > >
-
Re: hdfs system crashes when loading files bigger than local space left
Allen Wittenauer 2010-07-15, 17:26
On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:
> >a) Have you set a reserved size for hdfs? > Yes. I set 128Mb as reserved size.
That is likely way too small.
> b) Are you loading data from the datanode? > Yes. But the datanode is running on same node as namenode (i have very small cluster, only 5 servers and wasting one node only for namenode/jobtracker seemed unreasonable to me)
Where the NN is running is irrelevant to this particular problem.
The problem is that if you start your data load on a machine also running a datanode process, the data will get put onto that node first. This will cause your DFS to be majorly unbalanced.
It is much better to load the data from another host outside the grid.
-
Re: hdfs system crashes when loading files bigger than local space left
Vitaliy Semochkin 2010-07-16, 10:15
On Thu, Jul 15, 2010 at 9:26 PM, Allen Wittenauer <[EMAIL PROTECTED]>wrote:
> > On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote: > > > >a) Have you set a reserved size for hdfs? > > Yes. I set 128Mb as reserved size. > > That is likely way too small.
Will setting 512Mb be better in case the whole volume size is only 190Gb? > > b) Are you loading data from the datanode? > > Yes. But the datanode is running on same node as namenode (i have very > small cluster, only 5 servers and wasting one node only for > namenode/jobtracker seemed unreasonable to me) > > Where the NN is running is irrelevant to this particular problem. > > The problem is that if you start your data load on a machine also running a > datanode process, the data will get put onto that node first. This will > cause your DFS to be majorly unbalanced. > > It is much better to load the data from another host outside the grid. >
Does hadoop detect/distinct the client that uploads data from datanode and not from datanode? lets say I execute
hadoop -put someFile hdfs://namenode.mycompany.com/
from namenode.mycompany.com and from some other pc. Will it be any different for hadoop and will hadoop orgonize data more balanced in the last case?
Thank you very much for replies, Vitaliy S
-
Re: hdfs system crashes when loading files bigger than local space left
Allen Wittenauer 2010-07-16, 18:07
On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote: > That is likely way too small. > Will setting 512Mb be better in case the whole volume size is only 190Gb?
I'd recommend at least 5gb. I'm also assuming this same disk space isn't getting used for MapReduce.
> Does hadoop detect/distinct the client that uploads data from datanode and not from datanode? > lets say I execute
Yes.
> hadoop -put someFile hdfs://namenode.mycompany.com/ > > from namenode.mycompany.com and from some other pc. Will it be any different for hadoop and will hadoop orgonize data more balanced in the last case?
Yes.
Again, namenode is irrelevant. Do not do put's from a datanode if you want the data to be reasonably balanced.
-
Re: hdfs system crashes when loading files bigger than local space left
Vitaliy Semochkin 2010-07-21, 10:02
On Fri, Jul 16, 2010 at 10:07 PM, Allen Wittenauer <[EMAIL PROTECTED] > wrote:
> > On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote: > > That is likely way too small. > > Will setting 512Mb be better in case the whole volume size is only 190Gb? > > I'd recommend at least 5gb. I'm also assuming this same disk space isn't > getting used for MapReduce.
Thank you for advise. I'll increase the amount to 6 gb (hope it will be enough). Same disk is used for MapReduce but M/R is not executed during loading. > > Does hadoop detect/distinct the client that uploads data from datanode > and not from datanode? > > lets say I execute > > Yes. > > > hadoop -put someFile hdfs://namenode.mycompany.com/ > > > > from namenode.mycompany.com and from some other pc. Will it be any > different for hadoop and will hadoop orgonize data more balanced in the last > case? > > Yes. > > Again, namenode is irrelevant.
I was doing it from namenode which was acting as datanode as well. > Do not do put's from a datanode if you want the data to be reasonably > balanced.
Thank you very much. Will perform putting from pc outside the hadoop cluster. Regards, Vitaliy S
|
|