Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Calling sync for every record in sequencefile.writer


Copy link to this message
-
Calling sync for every record in sequencefile.writer
Hello,

For a given part file (e..g part-m-0000), i would like to record the
position of key written to this file.

To get this position, i wrote something
//out.sync()
currentposition=out.getLength();
record_current_position(key, currentposition)
out.append(key, value);

where out is SequenceFile.Writer

Now, if I leave the first line uncommented, for small files, getLength()
does not change from key to key.
if i call sync, for every key, it changes to accurately reflect the
position.
Is there some other function i can use to get the current position (like a
file's 'tell' function)

But calling sync for every record would be costly?

How much?(I dont expect an answer to the last question).
if it makes a difference i have block compression turned on.

I noticed that Mapfile.writer does something similar(calls getLength) and
would reduce to the above operation i.e. call getLength for every key-value
pair if i set the index to 1. So would this impact Mapfile.writer?

Cheers
Sapsi
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB