I have a setup where data is fed into hbase using flume. When performing inserts in blocks of 1 million, I have noticed that there is constantly less than 1 million being inserted into the database. usually around 33k rows short.
I'm using the flume RegexHbase sink and neither the flume logs or the hbase logs show any errors.
Any idea how best to track down how the rows are going missing?
As far as I know, I'm using SimpleRowKeyGenerator.getUUIDKey to generate the rowkey
I tried an update to that which was to build a row key using a combination of SimpleRowKeyGenerator.getUUIDKey and SimpleRowKeyGenerator.getUUIDKey and SimpleRowKeyGenerator.getNanoTimestampKey but that made no difference either.
On Wednesday 26 Mar 2014 08:16:05 Stack wrote: -Ian Brooks Senior server administrator - Sensewhere
As this is currently just test data the date im entering is basically the same for all rows with the exception of the last 3 digits of a static number being rand(100,255)
I think timestamps is being set by hbase ( i cant see anythign in the RegexHbaseEventSerializer that set the time in the put call). all the machines are running ntpd so time shoudl be fairly accurate, though they are vms so they will probably be a second or two out.
I havent tried a basic load into hbase without flume yet, ill try that tomorrow.
On Wednesday 26 Mar 2014 09:37:09 Stack wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext