Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Efficient Tablet Merging [SEC=UNOFFICIAL]


+
Dickson, Matt MR 2013-10-02, 03:58
+
Eric Newton 2013-10-02, 13:05
+
Dickson, Matt MR 2013-10-02, 22:35
+
Eric Newton 2013-10-03, 03:28
+
Adam Fuchs 2013-10-03, 12:07
+
Dickson, Matt MR 2013-10-03, 06:29
+
Eric Newton 2013-10-03, 13:51
+
Dickson, Matt MR 2013-10-04, 00:43
+
Eric Newton 2013-10-04, 01:20
+
Dickson, Matt MR 2013-10-04, 03:20
Copy link to this message
-
Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
Great details... but I need to sleep.  I'll dig in more tomorrow.  Sorry!

On Thu, Oct 3, 2013 at 11:20 PM, Dickson, Matt MR
<[EMAIL PROTECTED]> wrote:
> UNOFFICIAL
>
> Hi Eric,
> Our answers are in blue. Just a note that we do have the write ahead log
> disabled for ingest performance.
> We have a public holiday on Monday, so we may be delayed in our response.
>
> Cheers
> Matt
>
> ________________________________
> From: Eric Newton [mailto:[EMAIL PROTECTED]]
> Sent: Friday, 4 October 2013 11:20
>
> To: [EMAIL PROTECTED]
> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>
> Any errors on those servers?  Each server should be checking periodically
> for compactions, some crazy errors might escape error handling, though that
> is rare these days.
> In the tserver debug log there is a repeating error of  "Internal error
> processing applyUpdates
> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are
> held"
>
> Also found in the tserver log:
> ERROR: Failed to find midpoint Filesystem closed
> WARN: Tablet .... has too many files, batch lookup cannont run
>
> Are you experiencing any table level errors?  Unable to read or write files?
> No table level errors or read errors
>
>
> How full is HDFS?
> 32%
>
> If you scan the !METADATA table, are you seeing any trend in the tablets
> that have problems?
> By getting the extent id of the tablets that are large and then finding the
> range of that tablet by using 'getsplits -v' I have scanned the !METADATA
> table and can see a massive number of *.rf files associated with the range.
> Is there anything particular I should look at.
>
> At this point, we're looking for logged anomalies, the earlier the better.
> Anything red or yellow on the monitor pages.
> I ran one of the scans that hang and then see the following:
>
> Several "WARN Exception sying java.lang.reflect.InvocationTargetException"
>
> Several "ERROR  Unexpected error writing to log, retrying attempt 1
>     InvocationTargetException
>     Caused by LeaseExpiredException: Lease mismatch on /accumulo/wal/...
> owned by DFSClient_NOMAPREDUCE_56390516_13 but is accessed by
> DFSClient_NOMAPREDUCE_1080760417_13"
>
> "ERROR TTransportException: javav.net.SocketTimeoutException: ... while
> waiting for channel to be ready for write. ...."
>
> Bunch of "WARN Tablet 234234234 has too many files..."
>
>
>
>
>
>
>
> On Thu, Oct 3, 2013 at 8:43 PM, Dickson, Matt MR
> <[EMAIL PROTECTED]> wrote:
>>
>> UNOFFICIAL
>>
>> We have restarted the tablet servers that contain tablets with high
>> volumes of files and did not see any majc's run.
>>
>> Some more details are:
>> On 3 of our nodes we have 10-15 times the number of entries that are on
>> the other nodes.  When I view the tablets for one of these nodes there are 2
>> tablets with almost 10 times the the number of entries as the others.
>>
>> When we query on the date rowid's the queries are now hanging and there
>> are several scans running on the 3 nodes that have higher entries and they
>> are not completing, can I cancel these?
>>
>> In the logs we are getting "tablet ..... has too many files, batch lookup
>> can not run"
>>
>> At this point I'm stuck for ideas, so any suggestions would be great.
>>
>> ________________________________
>> From: Eric Newton [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, 3 October 2013 23:52
>>
>> To: [EMAIL PROTECTED]
>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>>
>> You should have a major compaction running if your tablet has too many
>> files.  If you don't, something is wrong. It does take some time to re-write
>> 10G of data.
>>
>> If many merges occurred on a single tablet server, you may have these
>> many-file tablets on the same server, and there are not enough major
>> compaction threads to re-write those files right away.  If that's true, you
>> may wish to restart the tablet server in order to get the tablets pushed to
>> other idle servers.
>>
>> Again, if you don't have major compactions running, you will want to start
+
Kristopher Kane 2013-10-04, 02:02
+
Dickson, Matt MR 2013-10-03, 03:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB