Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> best approach for write and immediate read use case


Copy link to this message
-
Re: best approach for write and immediate read use case
Can you tell us the average size of your records and how much heap is given to the region servers ?

Thanks

On Aug 23, 2013, at 12:11 AM, Gautam Borah <[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I have an use case where I need to write 1 million to 10 million records
> periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> table.
>
> Once the insert is completed, these records are queried immediately from
> another program - multiple reads.
>
> So, this is one massive write followed by many reads.
>
> I have two approaches to insert these records into the HBase table -
>
> Use HTable or HTableMultiplexer to stream the data to HBase table.
>
> or
>
> Write the data to HDFS store as a sequence file (avro in my case) - run map
> reduce job using HFileOutputFormat and then load the output files into
> HBase cluster.
> Something like,
>
>  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
>  loader.doBulkLoad(new Path(outputDir), hTable);
>
>
> In my use case which approach would be better?
>
> If I use HTable interface, would the inserted data be in the HBase cache,
> before flushing to the files, for immediate read queries?
>
> If I use map reduce job to insert, would the data be loaded into the HBase
> cache immediately? or only the output files would be copied to respective
> hbase table specific directories?
>
> So, which approach is better for write and then immediate multiple read
> operations?
>
> Thanks,
> Gautam