Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> read a changing hdfs file


Copy link to this message
-
Re: read a changing hdfs file
As far as I understand (and experts can correct me), the file being written
will be visible once one HDFS block size worth of data is written. This
applies to subsequent writing as well. Basically a block size worth of data
is the level of coherency, the size/unit of data for which data durability
is guaranteed. You can forcefully call the sync (*hsync/hflush) method to
flush your writes to the file system so they become visible as you write
them but then it has a cost in the form of lesser performance. So basically
it is dependent on your application and requirements i.e. trade-off between
performance and data visibility/durability.

*Read more about the definition, differences and use of the appropriate
method here:
http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html

Regards,
Shahab
On Tue, Aug 20, 2013 at 5:36 PM, Wu, Jiang2 <[EMAIL PROTECTED]> wrote:

>  Hi,
>
> I did some experiments to read a changing hdfs file. It seems that the
> reading takes a snapshot at the file opening moment, and will not read any
> data appended to the file afterwards. It’s different from what happens when
> reading a changing local file. My code is as follows
>
>                         Configuration conf = new Configuration();
>                         InputStream in = null;
>                         try {
>                                 FileSystem fs > FileSystem.get(URI.create("hdfs://MyCluster/"),
>                                                 conf);
>                                 in = fs.open(new Path("/tmp/test.txt"));
>                                 Scanner scanner=new Scanner(in);
>                                 while(scanner.hasNextLine()){
>
> System.out.println("+++++++++++++++++++++++++++++++ read
> "+scanner.nextLine());
>                                 }
>
> System.out.println("+++++++++++++++++++++++++++++++ reader finished ");
>                         } catch (IOException e) {
>                                 // TODO Auto-generated catch block
>                                 e.printStackTrace();
>                         } finally {
>                                 IOUtils.closeStream(in);
>                         }
>
> I’m wondering if this is the designed hdfs reading behavior, or can be
> changed by using different API or configuration? What I expect is the same
> behavior as a local file reading: when a reader reads a file while another
> writer is writing to the file, the reader will receive all data written by
> the writer.
>
> Thanks,
> Jiang
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB