Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> best approach for write and immediate read use case


Copy link to this message
-
Re: best approach for write and immediate read use case
Can you tell us the average size of your records and how much heap is given to the region servers ?

Thanks

On Aug 23, 2013, at 12:11 AM, Gautam Borah <[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I have an use case where I need to write 1 million to 10 million records
> periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> table.
>
> Once the insert is completed, these records are queried immediately from
> another program - multiple reads.
>
> So, this is one massive write followed by many reads.
>
> I have two approaches to insert these records into the HBase table -
>
> Use HTable or HTableMultiplexer to stream the data to HBase table.
>
> or
>
> Write the data to HDFS store as a sequence file (avro in my case) - run map
> reduce job using HFileOutputFormat and then load the output files into
> HBase cluster.
> Something like,
>
>  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
>  loader.doBulkLoad(new Path(outputDir), hTable);
>
>
> In my use case which approach would be better?
>
> If I use HTable interface, would the inserted data be in the HBase cache,
> before flushing to the files, for immediate read queries?
>
> If I use map reduce job to insert, would the data be loaded into the HBase
> cache immediately? or only the output files would be copied to respective
> hbase table specific directories?
>
> So, which approach is better for write and then immediate multiple read
> operations?
>
> Thanks,
> Gautam
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB