-Re: best approach for write and immediate read use case
Ted Yu 2013-08-23, 10:20
Can you tell us the average size of your records and how much heap is given to the region servers ?
On Aug 23, 2013, at 12:11 AM, Gautam Borah <[EMAIL PROTECTED]> wrote:
> Hello all,
> I have an use case where I need to write 1 million to 10 million records
> periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> Once the insert is completed, these records are queried immediately from
> another program - multiple reads.
> So, this is one massive write followed by many reads.
> I have two approaches to insert these records into the HBase table -
> Use HTable or HTableMultiplexer to stream the data to HBase table.
> Write the data to HDFS store as a sequence file (avro in my case) - run map
> reduce job using HFileOutputFormat and then load the output files into
> HBase cluster.
> Something like,
> LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> loader.doBulkLoad(new Path(outputDir), hTable);
> In my use case which approach would be better?
> If I use HTable interface, would the inserted data be in the HBase cache,
> before flushing to the files, for immediate read queries?
> If I use map reduce job to insert, would the data be loaded into the HBase
> cache immediately? or only the output files would be copied to respective
> hbase table specific directories?
> So, which approach is better for write and then immediate multiple read