Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Zero Byte file in HDFS


Copy link to this message
-
Re: Zero Byte file in HDFS
Hi Abshikek
       I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Abhishek Pratap Singh <[EMAIL PROTECTED]>
Date: Mon, 26 Mar 2012 14:20:18
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Zero Byte file in HDFS

Hi All,

I was just going through the implementation scenario of avoiding or
deleting Zero byte file in HDFS. I m using Hive partition table where the
data in partition come from INSERT OVERWRITE command using the SELECT from
few other tables.
Sometimes 0 Byte files are being generated in those partitions and during
the course of time the amount of these files in the HDFS will increase
enormously, decreasing the performance of hadoop job on that table /
folder. I m looking for best way to avoid generation or deleting the zero
byte file.

I can think of few ways to implement this

1) Programmatically using the Filesystem object and cleaning the zero byte
file.
2) Using Hadoop fs and Linux command combination to identify the zero byte
file and delete it.
3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

Kindly guide on efficient ways to achieve the same.

Regards,
Abhishek

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB