Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> best approach for write and immediate read use case

Copy link to this message
Re: best approach for write and immediate read use case
Can you tell us the average size of your records and how much heap is given to the region servers ?


On Aug 23, 2013, at 12:11 AM, Gautam Borah <[EMAIL PROTECTED]> wrote:

> Hello all,
> I have an use case where I need to write 1 million to 10 million records
> periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> table.
> Once the insert is completed, these records are queried immediately from
> another program - multiple reads.
> So, this is one massive write followed by many reads.
> I have two approaches to insert these records into the HBase table -
> Use HTable or HTableMultiplexer to stream the data to HBase table.
> or
> Write the data to HDFS store as a sequence file (avro in my case) - run map
> reduce job using HFileOutputFormat and then load the output files into
> HBase cluster.
> Something like,
>  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
>  loader.doBulkLoad(new Path(outputDir), hTable);
> In my use case which approach would be better?
> If I use HTable interface, would the inserted data be in the HBase cache,
> before flushing to the files, for immediate read queries?
> If I use map reduce job to insert, would the data be loaded into the HBase
> cache immediately? or only the output files would be copied to respective
> hbase table specific directories?
> So, which approach is better for write and then immediate multiple read
> operations?
> Thanks,
> Gautam