Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Discrepancy in the values of consumed disk space by hadoop


Copy link to this message
-
Re: Discrepancy in the values of consumed disk space by hadoop
Harsh J 2013-08-09, 14:04
There isn't a "discrepancy", but read on: DFS Used counts disk spaces
across DNs. FSCK counts file lengths on HDFS. The former includes
replicated data sizes, plus block checksum metadata consumed space.
The latter does not.

A small (but probably significant) percentage of your files are using
replication factors of more than the default of 2, so a simple
division would probably not work in showing the relation.

Also worth checking if your DN configured data directories have any
older subdirectories lying under them from past installs, if you're
sure that the small percentage of higher replica factor using files
are small enough and shouldn't be using as much more space.

On Fri, Aug 9, 2013 at 2:28 PM, Yogini Gulkotwar
<[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I have a CDH4 hadoop cluster setup with 3 datanodes and a data replication
> factor of 2.
>
> When I try to check the consumed dfs space, I get different values using the
> "hdfs dfsadmin -report" and "hdfs fsck" command.
> Could anyone please help me understand the reason behind the discrepancy in
> the values?
>
>  I get the following output:
>
> # sudo -u hdfs hdfs dfsadmin -report
>
>
> Configured Capacity: 321252989337600 (292.18 TB)
> Present Capacity: 264896108259328 (240.92 TB)
> DFS Remaining: 264665811648512 (240.71 TB)
> DFS Used: 230296610816 (214.48 GB)
> DFS Used%: 0.09%
> Under replicated blocks: 19
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 3 (3 total, 0 dead)
>
> Live datanodes:
> Name: (slave1)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 107084329779200 (97.39 TB)
> DFS Used: 77728510976 (72.39 GB)
> Non DFS Used: 18784664751104 (17.08 TB)
> DFS Remaining: 88221936517120 (80.24 TB)
> DFS Used%: 0.07%
> DFS Remaining%: 82.39%
> Last contact: Fri Aug 09 13:26:38 IST 2013
>
>
> Name: (slave3)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 107084329779200 (97.39 TB)
> DFS Used: 76206287872 (70.97 GB)
> Non DFS Used: 18786185925632 (17.09 TB)
> DFS Remaining: 88221937565696 (80.24 TB)
> DFS Used%: 0.07%
> DFS Remaining%: 82.39%
> Last contact: Fri Aug 09 13:26:37 IST 2013
>
>
> Name:(slave2)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 107084329779200 (97.39 TB)
> DFS Used: 76361811968 (71.12 GB)
> Non DFS Used: 18786030401536 (17.09 TB)
> DFS Remaining: 88221937565696 (80.24 TB)
> DFS Used%: 0.07%
> DFS Remaining%: 82.39%
>
> --------------------------------------------------------------------------------------------------------------------------
> # sudo -u hdfs hadoop fsck /
>
>
> Connecting to namenode via http://master1:50070
>
>
> Status: HEALTHY
>  Total size: 75245213337 B
>  Total dirs: 3203
>  Total files: 7893
>  Total blocks (validated): 7642 (avg. block size 9846272 B)
>  Minimally replicated blocks: 7642 (100.0 %)
>  Over-replicated blocks: 0 (0.0 %)
>  Under-replicated blocks: 19 (0.24862601 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 2
>  Average block replication: 2.0024862
>  Corrupt blocks: 0
>  Missing replicas: 133 (0.86162215 %)
>  Number of data-nodes: 3
>  Number of racks: 1
> FSCK ended at Fri Aug 09 14:01:47 IST 2013 in 266 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> # sudo -u hdfs hadoop fs -count -q /
>   2147483647      2147472547            none             inf         3203
> 7897        75245470999 /
>
>
>
> Thanks & Regards,
> Yogini Gulkotwar
> Flutura Decision Sciences & Analytics, Bangalore
> Email: [EMAIL PROTECTED]
> Website: www.fluturasolutions.com

--
Harsh J