In my experience, if you define multiple data dirs, then what HDFS does is
something similar to "df <dir>" for each dir. If those dirs happen to be on
the same partition, then it basically adds up the size of the partition X
the number of dirs you listed. So if you have ~.9TB drive and HDFS shows
~2.6TB data then I'd imagine you have three DFS data dirs defined.
On another note:
>I'm assuming eventually HDFS will attempt to put too much data on this
node, and things will go Very Badly.
Don't use space reported in the GUI as an indicator of cluster health. The
situation you are referencing can happen even when the correct capacity is
reported for a node. You have to keep in mind that balancing load/data
between nodes is more of a manual process (via running the balancer). So
just because the namenode knows how much space is on each node, that doesn't
mean that data will be evenly distributed.
So even if whats reported in the GUI is right, you should still be
monitoring things on a finer grained level than what is shown there.
On Mon, Jun 13, 2011 at 12:01 PM, Time Less <[EMAIL PROTECTED]> wrote:
> I have a datanode with a ~900GB hard drive in it:
> Filesystem Size Used Avail Use% Mounted on
> /dev/hda1 878G 384G 450G 47% /
> But the NameNode GUI shows 2.57TB:
> Node Last
> Contact Admin State Configured
> Capacity (TB) Used
> (TB) Non DFS
> Used (TB) Remaining
> (TB) Used
> (%) Remaining
> (%) Blocks hadoopnode2
> In Service
> I have three other nodes that are identical to this one, but they are all
> correctly defined in size. Does anyone know what would cause this? I'm
> assuming eventually HDFS will attempt to put too much data on this node, and
> things will go Very Badly.
> Tim Ellis
> Data Architect, Riot Games
> ps - Another mystery: the other three nodes have 0.56TB data each on them,
> but this one has only 0.37TB.