Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Constant error when putting large data into HBase


Copy link to this message
-
Re: Constant error when putting large data into HBase
Hi Ed,

Without having looked at the logs, this sounds like the common case of overloading a single region due to your sequential row keys. Either hash the keys, or salt them - but the best bet here is to use the bulk loading feature of HBase (http://hbase.apache.org/bulk-loads.html). That bypasses this problem and lets you continue to use sequential keys.

Lars
On Dec 1, 2011, at 12:21 PM, edward choi wrote:

> Hi Lars,
>
> Okay here goes some details.
> There are 21 tasktrackers/datanodes/regionservers
> There is one Jobtracker/namenode/master
> Three zookeepers.
>
> There are about 200 million tweets in Hbase.
> My mapreduce code is to aggregate tweets by their generated date.
> So in the map stage, I write out tweet date as the key, and document id as
> the value (document id is randomly generated by hash algorithm)
> In the reduce stage, I put the data into a table. The key(which is the
> tweet date) is the table rowid, and values(which are document id's) as the
> column values.
>
> Now, map stage is fine. I get to 100% map. But during reduce stage, one of
> my regionserver fails.
> I don't know what the exact symptom is. I just get:
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
>> 1 action: servers with issues: lp171.etri.re.kr:60020,
>
> About "some node always die" <== scratch this.
>
> To be precise,
> I narrowed down the range of data that I wanted to process.
> I tried to put tweets that was generated only at 2011/11/22.
> Now the reduce code will produce a row with "20111122" as the rowid, and a
> bunch of document id's as the column value. (I use 32byte string as the
> document id. I append 1000 document id for a single column)
> So the region that my data will be inserted will have "20111122" between
> the Start Key and End Key.
> The regionserver that contains that specific region fails. That is the
> point. If I move that region to another regionserver using hbase shell,
> then that regionserver fails.
> With the same log output.
> After 4 failures, the job is force-cancelled and the put operation was not
> done.
>
> Now, even with the failure, the regionserver is still online. It is not
> dead(sorry for my use of word 'die').
>
> I have pasted Jobtracker log, tasktracker(one that failed) log,
> regionserver(one that failed) log using PasteBin.
> The job started at 2011-12-01 17:14:43 and was killed at 2011-12-01
> 17:20:07.
>
> JobTracker Log
> <script src="http://pastebin.com/embed_js.php?i=n6sp8Fyi"></script>
>
> TaskTracker Log
> <script src="http://pastebin.com/embed_js.php?i=RMFc41D5"></script>
>
> RegionServer Log
> <script src="http://pastebin.com/embed_js.php?i=UpKF8HwN"></script>
>
> And finally, according to the logs I pasted, I see other lines with DEBUG
> or INFO. So I thought this was okay.
> Is there a way to change WARN level log to some other level log? If you'd
> let me know, I will paste another set of logs.
>
> Thanks,
> Ed
>
> 2011/12/1 Lars George <[EMAIL PROTECTED]>
>
>> Hi Ed,
>>
>> You need to be more precise I am afraid. First of all what does "some node
>> always dies" mean? Is the process gone? Which process is gone?
>> And the "error" you pasted is a WARN level log that *might* indicate some
>> trouble, but is *not* the reason the "node has died". Please elaborate.
>>
>> Also consider posting the last few hundred lines of the process logs to
>> pastebin so that someone can look at it.
>>
>> Thanks,
>> Lars
>>
>>
>> On Dec 1, 2011, at 9:48 AM, edward choi wrote:
>>
>>> Hi,
>>> I've had a problem that has been killing for some days now.
>>> I am using CDH3 update2 version of Hadoop and Hbase.
>>> When I do a large amount of bulk loading into Hbase, some node always
>> die.
>>> It's not just one particular node.
>>> But one of many nodes fail to serve eventually.
>>>
>>> I set 4 gigs of heap space for master, and regionservers. I monitored the
>>> process and when any node fails, it has not used all the heaps yet.
>>> So it is not a heap space problem.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB