|
|
-
Re: Bulk Load question.Suraj Varma 2011-03-20, 21:40
> Is there a way to split this
> across regions in the beginning? Since you didn't mention your HBase version, I'm assuming you are using 0.90.1 or later. If so, yes, there is a way to pre-split the regions. See this: http://hbase.apache.org/book/important_configurations.html#d0e1975 Also - as Harsh mentioned, the bulkload tool might be even better, so take a look at that as well: http://hbase.apache.org/bulk-loads.html --Suraj On Sat, Mar 19, 2011 at 8:48 AM, Vivek Krishna <[EMAIL PROTECTED]> wrote: > I have around 20 GB of data to be dumped into a hbase table. > > Initially, I had a simple java program to put the values in a batch of > (5000-10000) records. I tried concurrent inserts and each insert took about > 15 seconds to write. Which is very slow and was taking ages. > > Next approach was to use importtsv, this started off with a set of maps and > after few minutes, I started getting RetriesException and errors out in a > while. > > Of these experiments, I noticed that the master node was handing all the > traffic. I understand that initially it dumps data in one node and then > splits across multiple nodes as data comes in. Is there a way to split this > across regions in the beginning? > > Or any other thoughts on how to handle inserts of large amounts of data? > Viv > |