Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulk Load question.


Copy link to this message
-
Re: Bulk Load question.
Suraj Varma 2011-03-20, 21:40
> Is there a way to split this
> across regions in the beginning?

Since you didn't mention your HBase version, I'm assuming you are
using 0.90.1 or later.
If so, yes, there is a way to pre-split the regions. See this:
http://hbase.apache.org/book/important_configurations.html#d0e1975

Also - as Harsh mentioned, the bulkload tool might be even better, so
take a look at that as well:
http://hbase.apache.org/bulk-loads.html

--Suraj

On Sat, Mar 19, 2011 at 8:48 AM, Vivek Krishna <[EMAIL PROTECTED]> wrote:
> I have around 20 GB of data to be dumped into a hbase table.
>
> Initially, I had a simple java program to put the values in a batch of
> (5000-10000) records.  I tried concurrent inserts and each insert took about
> 15 seconds to write.  Which is very slow and was taking ages.
>
> Next approach was to use importtsv, this started off with a set of maps and
> after few minutes, I started getting RetriesException and errors out in a
> while.
>
> Of these experiments, I noticed that the master node was handing all the
> traffic.  I understand that initially it dumps data in one node and then
> splits across multiple nodes as data comes in.  Is there a way to split this
> across regions in the beginning?
>
> Or any other thoughts on how to handle inserts of large amounts of data?
> Viv
>