-TableOutputFormat not efficient than direct HBase API calls?
edward choi 2011-06-22, 02:22
I am writing an Hadoop application that uses HBase as both source and sink.
There is no reducer job in my application.
I am using TableOutputFormat as the OutputFormatClass.
I read it on the Internet that it is experimentally faster to directly
instantiate HTable and use HTable.batch() in the Map
than to use TableOutputFormat as the Map's OutputClass
So I looked into the source code,
It looked like TableRecordWriter does not support batch updates, since
TableRecordWriter.write() called HTable.put(new Put()).
Am I right on this matter? Or does TableOutputFormat automatically do batch
Or is there a specific way to do batch updates with TableOutputFormat?
Any explanation is greatly appreciated.