unsubscribe
____________________________________
Anthony P. Scism
Info Tech-Risk Mgmt/Client Sys - Capacity Planning
Work: 402-544-0361 Mobile: 402-707-4446

From:   "Durity, Sean R" <[EMAIL PROTECTED]>
To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Date:   09/19/2017 09:25 AM
Subject:        RE: Multi-node repair fails after upgrading to 3.0.14

This email originated from outside of the company. Please use discretion
if opening attachments or clicking on links.
Required maintenance for a cluster should not be this complicated and
should not be changing so often. To me, this is a major flaw in Cassandra.
 
 
Sean Durity
 
From: Steinmaurer, Thomas [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 19, 2017 2:33 AM
To: [EMAIL PROTECTED]
Subject: RE: Multi-node repair fails after upgrading to 3.0.14
 
Hi Kurt,
 
thanks for the link!
 
Honestly, a pity, that in 3.0, we can’t get the simple, reliable and
predictable way back to run a full repair for very low data volume CFs
being kicked off on all nodes in parallel, without all the magic behind
the scene introduced by incremental repairs, even if not used, as
anticompaction even with –full has been introduced with 2.2+ J
 
 
Regards,
Thomas
 
From: kurt greaves [mailto:[EMAIL PROTECTED]]
Sent: Dienstag, 19. September 2017 06:24
To: User <[EMAIL PROTECTED]>
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs
still triggers anti-compaction on non-repaired SSTables (if I'm reading
that right), so might need to make sure you don't run multiple repairs at
the same time across your nodes (if your using vnodes), otherwise could
still end up trying to run anti-compaction on the same SSTable from 2
repairs.
 
Anyone else feel free to jump in and correct me if my interpretation is
wrong.
 
On 18 September 2017 at 17:11, Steinmaurer, Thomas <
[EMAIL PROTECTED]> wrote:
Jeff,
 
what should be the expected outcome when running with 3.0.14:
 
nodetool repair –full –pr keyspace cfs
 
·         Should –full trigger anti-compaction?
·         Should this be the same operation as nodetool repair –pr
keyspace cfs in 2.1?
·         Should I be able to  run this on several nodes in parallel as in
2.1 without troubles, where incremental repair was not the default?
 
Still confused if I’m missing something obvious. Sorry about that. J
 
Thanks,
Thomas
 
From: Jeff Jirsa [mailto:[EMAIL PROTECTED]]
Sent: Montag, 18. September 2017 16:10

To: [EMAIL PROTECTED]
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
Sorry I may be wrong about the cause - didn't see -full
 
Mea culpa, its early here and I'm not awake
--
Jeff Jirsa
 

On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas <
[EMAIL PROTECTED]> wrote:
Hi Jeff,
 
understood. That’s quite a change then coming from 2.1 from an operational
POV.
 
Thanks again.
 
Thomas
 
From: Jeff Jirsa [mailto:[EMAIL PROTECTED]]
Sent: Montag, 18. September 2017 15:56
To: [EMAIL PROTECTED]
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
The command you're running will cause anticompaction and the range borders
for all instances at the same time
 
Since only one repair session can anticompact any given sstable, it's
almost guaranteed to fail
 
Run it on one instance at a time
--
Jeff Jirsa
 

On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas <
[EMAIL PROTECTED]> wrote:
Hi Alex,
 
I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel
and this may pop up now:
 
0.176.38.128 (progress: 1%)
[2017-09-18 07:59:17,145] Some repair failed
[2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
error: Repair job has failed with the error message: [2017-09-18
07:59:17,145] Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message:
[2017-09-18 07:59:17,145] Some repair failed
        at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
        at
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
        at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
        at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
        at
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)
 
2017-09-18 07:59:17 repair finished
 
 
If running the above nodetool call sequentially on all nodes, repair
finishes without printing a stack trace.
 
The error message and stack trace isn’t really useful here. Any further
ideas/experiences?
 
Thanks,
Thomas
 
From: Alexander Dejanovski [mailto:[EMAIL PROTECTED]]
Sent: Freitag, 15. September 2017 11:30
To: [EMAIL PROTECTED]
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
Right, you should indeed add the "--full" flag to perform full repairs,
and you can then keep the "-pr" flag.
 
I'd advise to monitor the status of your SSTables as you'll probably end
up with a pool of SSTables marked as repaired, and another pool marked as
unrepaired which won't be compacted together (hence the suggestion of
running subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means
unrepaired and any other value (a timestamp) means the SSTable has been
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster
would still not mark all SSTables as repaired, but I can't say if that
behavior has changed in latest versions.
 
Having separate pools of SStables that cannot be compacted means that you
might have tombstones that don't
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB