Managed Availability – an administrative overview

1. Configuration of Managed Availability

After installing Exchange 2013 in production, there might be some HealthSets in an Unhealthy state.

2. AlertValue “Unhealthy”

The first step you have to do is a HealthReport from the entire server:
Get-HealthReport -Server | where {$_.alertvalue -ne “Healthy”}

Note: The property “NotApplicable” shows whether Monitors have been disabled by Set-ServerComponentState for their component. Most Monitors are not dependent on this, and report “NotApplicable”.

Let’s take a look at HealthSet “Autodiscover.Protocol” and why it’s in an Unhealthy state.
To get all information about the Autodiscover.Protocol HealthSet, we have to analyze the Monitoring Item Identity:

Get-MonitoringItemIdentity -Identity Autodiscover.Protocol -Server | ft name,itemtype, Targetresource –AutoSize

We can see all appropriate Probes, Monitors, and Responders regarding the Autodiscover.Protocol HealthSet.

Because the Autodiscover.Protocol HealthSet has 9 Monitors, we check which Monitors are in an Unhealthy state:

Get-ServerHealth –Identity -HealthSet Autodiscover.Protocol

The Monitor AutodiscoverSelfTestMonitor is in an Unhealthy state.
To get all the information about the appropriate Monitor (AutodiscoverSelfTestMonitor), we collect the information directly from the Eventlog in a readable output:

(Get-WinEvent -ComputerName -LogName Microsoft-Exchange-ActiveMonitoring/Monitordefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ? {$_.Name -like “AutodiscoverSelfTestMonitor”}

This output has two important values for us:

An Unhealthy state will be transition within 600 seconds to Unhealthy. The ScenarioDescription parameter tells the administrator more details about the issue. “Validate EWS health is not impacted by any issues”.
In most cases you can retake the test with the following cmdlet:

Invoke-MonitoringProbe Autodiscover.Protocol\AutodiscoverSelfTestProbe -Server | fl

Actually not all protocols can be invoked manually with the Invoke-MonitoringProbe cmdlet. We hope Microsoft will fix this for the future.
We know that EWS has some issues and go further to take a look at the Probe definition for the AutodiscoverSelfTestMonitor:

(Get-WinEvent -ComputerName -LogName Microsoft-Exchange-ActiveMonitoring/Probedefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ? {$_.Name -like “AutodiscoverSelfTestProbe”}

You can see the configuration how the Probe will test the Autodiscover.Protocol HealthSet.
The next step is to collect the events in the ProbeResult event log crimson channel and filter them. In this example, you look for failure results related to AutodiscoverSelfTestProbe:
$Errors = (Get-WinEvent -ComputerName -LogName Microsoft-Exchange-ActiveMonitoring/ProbeResult –FilterXPath “*[UserData[EventXML[ResultName=’AutodiscoverSelfTestProbe/’][ResultType=’4′]]]” | % {[XML]$_.toXml()}).event.userData.eventXml
$Errors | select -Property *time,result*,error*,*context

ResultTypes:

1 = Timeout
2 = Poisoned
3 = Succeeded
4 = Failed
5 = Quarantined
6 = Rejected

You can see the complete error message at “ExecutionContext”.

Note: to identify the correct ResultName parameter, you can take a look at the event log directly:

Expand Applications and Services Logs – Microsoft – Exchange– ActiveMonitoring – ProbeResult and search for “AutodiscoverSeltTestMonitor” and filter only the Error level events. On the correct error event, click on details and take a look at the Error line:

System.Exception: Autodiscover Service failed to return the ExternalEWSUrl at Microsoft.Exchange.Monitoring.ActiveMonitoringProbe

Result:

The ExternalEWSUrl value is empty. You have to set the value (it must not be available from the Internet) to avoid Managed Availability error messages.
It’s not recommended to disable the AutodiscoverSelfTestProbe and their appropriate Monitors and Responders, because there are a lot of more important tests. So don’t set a global or local override against the Probes, Monitors, and Responders from the Autodiscover.Protocol HealthSet.

Update 08/26/2014: Microsoft fixed this "issue" with Exchange Server 2013 Cumulative Update 6 (CU6) and it's not relevant if the ExternalEWSUrl is either set or not.

3. Local Managed Availability .xml monitoring files

Some HealthSets, such as the FEP HealthSet are local .xml files. FEP is the ForeFront service and some of you would want to disable this HealthSet on the servers.
Browse to %ExchangeInstallationPath%\Microsoft\Exchange\V15\Bin\Monitoring\Config, search for FEPActiveMonitoringContext.xml and open the file with an editor, such as Notepad.
Change line 12 and replace Enabled = True to Enabled = False
Restart the Microsoft Exchange Health Management service on the server where you modified the .xml file.

4. Inform Managed Availability about the repairing process

To inform MA (and for example SCOM) that you are in a repairing process, use the following cmdlet and define with the -Name Parameter the appropriate Monitor:

Set-ServerMonitor –Server -Name Maintenance –Repairing $true

After repairing:

Set-ServerMonitor –Server -Name Maintenance –Repairing $false

To avoid automatically recovery actions, you should disable the managed service using Set-ServerComponentState:

Set-ServerComponentState –Component RecoveryActionsEnabled –Identity -State Inactive –Requester Functional

After finishing recovery you have to enable the RecoveryActionsEnabled Component with the following cmdlet:

Set-ServerComponentState –Component RecoveryActionsEnabled –Identity -State Active –Requester Functional

5. Overrides

You can set overrides: server overrides, which apply to a single server, and global overrides, which apply to all Exchange server within your organization. You apply overrides using the Add-ServerMonitoringOverride or Add-GlobalMonitoringOverride cmdlet.

Important: you have to disable all the appropriate Probe, Monitor, and Responder!

Example to deactivate the Public Folder monitoring from the Microsoft KB database (Exchange 2013 CU3):

Add-GlobalMonitoringOverride –Identity “Publicfolders\PublicFolderLocalEWSLogonEscalate” –ItemType “Responder” –PropertyName Enabled –PropertyValue 0 –ApplyVersion “15.0.775.38”

Add-GlobalMonitoringOverride –Identity “Publicfolders\PublicFolderLocalEWSLogonMonitor” –ItemType “Monitor” –PropertyName Enabled –PropertyValue 0 –ApplyVersion “15.0.775.38”

Add-GlobalMonitoringOverride –Identity “Publicfolders\PublicFolderLocalEWSLogonProbe” –ItemType “Probe” –PropertyName Enabled –PropertyValue 0 –ApplyVersion “15.0.775.38”

Also you can change the ExtensionAttributes from every Probe:

ExtensionAttributes:

127.0.0.125InboundProxyProbe<DataAddAttributions=”false”>X-Exchange-Probe-Drop-Message:FrontEnd-CAT-250 Subject:Inbound proxy probeNone

Local override:

Add-ServerMonitoringOverride -Server ServerName -Identity FrontEndTransport\OnPremisesInboundProxy -ItemType Probe -PropertyName ExtensionAttributes -PropertyValue ‘Exch1.contoso.com25InboundProxyProbeX-Exchange-Probe-Drop-Message:FrontEnd-CAT-250Subject:Inbound proxy probeNone’ -Duration 45.00:00:00

Global override:

Add-GlobalMonitoringOverride -Identity “EWS.Proxy\EWSProxyTestProbe” -ItemType Probe -PropertyName TimeoutSeconds -PropertyValue 25 –ApplyVersion “15.0.712.24”

6. Managed Availability and server reboots

Responders only execute in the event if a monitor is marked in a Unhealty state  and will try to recover that component. Managed Availability provides multi-stage recovery actions:

1. Restart the application pool
2. Restart the service
3. Restart the server
4. Take the server offline so that it no longer accepts traffic

There are several types of responders available:
Restart Responder, Rest AppPool Responder, Failover Responder, Bugcheck Responder, Offline Responder, Escalate Responder, and Specialized Component Responders.
Today we talk primary about the Restart Responders. But you can also read the entire Managed Availability documentation which is available at http://blogs.technet.com/b/exchange/archive/2012/09/21/lessons-from-the-datacenter-managed-availability.aspx.

Restart responders are subject to throttling policies. This means, the responder definition contains a section “ThrottlePolicyXML” and you can override them, if you like. For example, we use the “StoreServiceKillServer” responder. To view the definitions, use the following cmdlet via EMS:

(Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/ResponderDefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “StoreServiceKillServer”}

There are many parameters, such as ServiceName, CreatedTime, Enabled, MaxRetryAttempts, AlertMask, and so on. Importan for us is the following section from the restart responder definition:
ThrottlePolicyXml :

< ForceReboot ResourceName=””>
< ThrottleConfig Enabled=”True” LocalMinimumMinutesBetweenAttempts=”720″ LocalMaximumAllowedAttemptsInOneHour=”-1″ LocalMaximumAllowedAttemptsInADay=”1″ GroupMinimumMinutesBetweenAttempts=”600″ GroupMaximumAllowedAttemptsInADay=”4″ />
< /ForceReboot>

The thresholds are self-explanatory. The only difference is “Local” and “Group”. Local means one Exchange server, group means more than one Exchange server in your organization. You have to check and configure the setting for your needs.

To prevent a reboot, create a local or global override:
Example:

I was looking for a “*ForceReboot*”  by Managed Availability and found the following Requester:

(Get-WinEvent -LogName Microsoft-Exchange-ManagedAvailability/* | % {[XML]$_.toXml()}).event.userData.eventXml| ?{$_.ActionID -like “*ForceReboot*”} | ft RequesterName ServiceHealthMSExchangeReplForceReboot

Add-GlobalMonitoringOverride -Identity Exchange\ServiceHealthMSExchangeRepIForceReboot -ItemType Responder -PropertyName Enabled -PropertyValue 0 –Duration 60:00:00:00

To check the configuration changes, use the following cmdlet:

(Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ServiceHealthMSExchangeRepIForceReboot “} | ft name,enabled

This prevents the server from a force reboot. Enabled must be “0” (instead of 1).

Leave a comment