0

I have created the following structure:

  • Two hardware servers with Windows Server 2012R2 installed, SQL Server 2012R2 installed. Then Windows went fully updated and SQL Server was patched to SP4, current version is reported as 11.0.7493.
  • WSFC formed on both servers, added a file share witness elsewhere
  • Two standalone SQL Server instances, one on each server, with enabled AlwaysOn
  • One AlwaysOn availability group with one database and one listener

This works as intended when it comes to connecting to the database and SSMS-driven (T-SQL-driven) manual failover. (Had to resolve an issue with local SQL Server logins having different SIDs, since the app used SQL Server authentication, but it works) Now I have tried to simulate an SQL Server crash by stopping the server - BAM, AAG totally failed. Investigation with Get-ClusterLog showed that the WSFC said "Not failing over group XXX, failoverCount 3, failoverThresholdSetting 1, lastFailover 1601/01/01-00:00:00.000". Okay I said, let's wait 6 hours (the default timeout on a WSFC resource to clean failover count), tried again - BAM failoverCount raised to 4. I have then tried lowering the failover period to 1 hours and threshold to 5 - again nothing and failover count raised again beyond the threshold. I went Googling and discovered some info that this timeout can be lowered to zero effectively insta-resetting the failover count - NO WAY, it still grows whenever I tried to simulate a failover. However, when I just restart the now-primary cluster node together with the SQL server, the AAG properly moves to the remaining node and the local database replica becomes primary.

So, what to do and how to make SQL Server 2012 AAG fail over to the other node in case the SQL Server goes down while the host remains operational?

As a side note, why does the last failover time shows zeroes? Maybe this is the case, or a part of the symptoms that shows where to look at?

Vesper
  • 754
  • 1
  • 9
  • 29
  • Off topic: belongs on dba.stackexchange.com Q&A for database professionals who wish to improve their database skills and learn from others in the community – TomTom Sep 04 '20 at 06:21
  • 1
    @TomTom I object - DBA care about what does the database do, system administrators care about HA which is the issue here. – Vesper Sep 04 '20 at 06:59
  • Ah, so in your world DBA's are not responsible for maintaining SQL level features like this? Interesting. SA's generally have ZERO ideaa about how sql databases work and this is a purely SQL feature that NOT ONLY is used for HA (scale out). – TomTom Sep 04 '20 at 07:17
  • @TomTom Yes, in my world which is small load production this feature is not required by the DBA or application, however eventually I'll be adding a read only replica for resource offload (even current config allows RO requests on secondary replica). So I'm using AlwaysOn for HA&DR. And this company does not have a DBA as a role, it's split in two, architecture is on devs, system part is on me. – Vesper Sep 04 '20 at 07:24
  • Well, whatever you think - I see all those dozens of answers here and your refusal to move the question to the dba site stating it does not belong there and somehow... i just miss all the answers here. See the problem? It does not belong here because the people HERE are not the db specialists and if you want help you go to the dba site. – TomTom Sep 04 '20 at 07:44
  • @TomTom See, I have just visited DBA for [sql-server-2012] and [availability-groups], they clearly have less knowledge about failover process than me, because it's out of their scope (WSFC is not something an average DBA has experience with). Therefore I assumed that whoever has more experience with WSFC than me can shed light on why does failover count not being reset after failover period expires, even if it might need to delve into any specifics of a SQL Server 2012 AAG resource group type. – Vesper Sep 04 '20 at 08:16

0 Answers0