1

Overview

We have DAG with two nodes of Exchange 2016 CU12 running on up-to-date Windows Server 2016.

Search is failing on any mailbox residing in any database mounted on one of the nodes. Other node works properly.

Event 1012 is reported repeatedly by MSExchangeIS on the affected host, with the followng content (this particular repeatedly generated by SearchQueryStxProbe health monitor):

Exchange Server Information Store has encountered an error while executing a full-text index query ("and(subject:string("SearchQueryStxProbe*", mode="and"), folderid:string("3753F38349D8A943AE346EACDBD8B91300000000010C0000"))"). Error information: System.ServiceModel.EndpointNotFoundException: The message could not be dispatched because the service at the endpoint address 'net.pipe://localhost/3867' is unavailable for the protocol of the address.

Problem appears to be completely unrelated to the content index, the event itself and further diagnostics suggest some problem with some part of search service not running properly on this specific host.

Checklist

  • There are 48 databases. Symptoms present on all of them equally, as long as these are mounted on affected host.
  • ContentIndexState is reported healthy on all databases by both hosts.
  • Search probes SearchQueryFailureMonitor and SearchQueryStxMonitor return unhealthy state on the affected host.
  • Test-ExchangeSearch returns literally nothing on either of the hosts. No result objects, no errors, nothing but a progress bar for a while. Never used this tool, thus don't know much what to expect.
  • Microsoft knowledgebase on the Search Health Set is a joke (in mild words).
  • Problem is unaffected by service- or server-level restarts.
  • Search works with all databases when database moved to the second DAG node.

Google does return numerous posts on a wide variety of issues resulting in Event 1012. Unfortunately, the 1012 is apparently covering a wide area of problems. Not one issue matches my event details or presents similar side symptoms while providing any solution or clue as to what too look for.

Comparative analysis

With lack of any reasonable documentation, further steps were limited to comparative analysis of the two hosts - the healthy and the failing one.

Following event data, I've checked for the TCP 3867 binding. On the failed host, the port is unbound. On the healthy host, the port is bound by an instance of the the service-run noderunner.exe process, one with following arguments:

"C:\Program Files\Microsoft\Exchange Server\V15\Bin\Search\Ceres\Runtime\1.0\NodeRunner.exe" 
--noderoot "C:\Program Files\Microsoft\Exchange Server\V15\Bin\Search\Ceres\HostController\Data\Nodes\Fsis\IndexNode1" 
--addfrom "C:\Program Files\Microsoft\Exchange Server\V15\Bin\Search\Ceres\HostController\Data\Nodes\Fsis\IndexNode1\Configuration\Local\Node.ini" 
--tracelog "C:\Program Files\Microsoft\Exchange Server\V15\Bin\Search\Ceres\HostController\Data\Nodes\Fsis\IndexNode1\Logs\NodeRunner.log"

I've compared the referred files and paths on both hosts:

  • NodeRunner.log file is not being generated on either nodes.
  • File structure is identical and average file sizes are similar.
  • Any plain text files show identical content baring the host name references.
  • File permissions are identical.

Thus, no obvious differences. Also, no significant difference between search catalogs on replicated databases.

Question

Anyone had a similar problem? Anyone solved in? Anyone has a hint, where to look? Any log files or diagnostic tools?

  • noderunner.exe is part of the Search functionality for Exchange, as you probably already know. By the sounds of it, you probably either missing the search component or it's corrupted in this or other shape of form. In my opinion, troubleshooting this issue might be just waste of time, unless you really want to. Have you considered setting up a new server instead of this one? Another option is to update to CU13 and see if this fixes the issue. – Vick Vega Sep 29 '19 at 20:47
  • Well, it did happen once for some reason. Waste not to diagnose it, especially if it later happens again. The server is in a quite early stage of production switchover and only few users affected. – Michał Sacharewicz Sep 30 '19 at 06:31

2 Answers2

1

I recommend you follow the steps in this article Exchange 2013: FAST Search Technology Failed

Or what about creating a new DB and moving the mailbox to this new DB? Are there any errors?If you can move this mailbox, then test the moved mailbox in OWA.

Hope it works.

Beverly Gao
  • 126
  • 4
  • This is one of the articles not fitting my profile. The ``ContentIndexState`` at my site returns ``Healthy`` for all databases. Also, it mentions tcp:3847 actively refusing connection, while mine is tcp:3867 not responding. – Michał Sacharewicz Sep 30 '19 at 06:38
  • I am considering trying this one, though only if nothing else helps. As for creating new database, it won't help much when the service already fails to bind the tcp port with the existing 24 mounted databases. – Michał Sacharewicz Sep 30 '19 at 06:40
0

I know this is old, but in case this helps someone: I ran into this exact same problem on Exchange 2016 after updating to CU18. On my four servers, noderunner.exe was also only listening on a couple of ports but not 3867. It turned out that having Sophos antivirus running during the upgrade broke something about noderunner.exe's startup or configuration or something. The solution was uninstalling Sophos antivirus (which did not fix it just by uninstalling the AV), and then re-running setup /mode:upgrade /iacceptexchangeserverlicenseterms to re-upgrade the server. After that, all of the databases' search indexes became functional again and noderunner.exe was listening on all the appropriate ports. I hope this helps someone!

Steve
  • 1
  • 1