Search This Blog

Monday, June 30, 2014

Troubleshoot DNS Event ID 4013: The DNS server was unable to load AD integrated DNS zones

Some Microsoft and external content have recommended setting the registry value Repl Perform Initial Synchronizations to 0 in order to bypass initial synchronization requirements in Active Directory. The specific registry subkey and the values for that setting are as follows:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters
Value name:  Repl Perform Initial Synchronizations
Value type:  REG_DWORD
Value data: 0
This configuration change is not recommended for use in production environments or in any environment on an ongoing basis. The use of Repl Perform Initial Synchronizations should be used only in critical situations to resolve temporary and specific problems. The default setting should be restored after such problems are resolved.

Viable alternatives include:
  • Remove references to stale domain controllers.

  • Make offline or non-functioning domain controllers operational.

  • Domain controllers hosting AD-integrated DNS zones should not point to a single domain controller and especially only to themselves as preferred DNS for name resolution.

    DNS name registration and name resolution for domain controllers is a relatively lightweight operation that is highly cached by DNS clients and servers.

    Configuring domain controllers to point to a single DNS server's IP address, including the 127.0.0.1 loopback address, represents a single point of failure. This is somewhat tolerable in a forest with only one domain controller but not in forests with multiple domain controllers.

    Hub-site domain controllers should point to DNS servers in the same site as them for preferred and alternate DNS server and then finally to itself as an additional alternate DNS server.

    Branch site domain controllers should configure the preferred DNS server IP address to point to a hub-site DNS server, the alternate DNS server IP  address to point to an in-site DNS server or one in the closest available site, and finally to itself using the 127.0.0.1 loopback address or current static IP address.

    Pointing to hub-site DNS servers reduces the number of hops required to get critical domain controller SRV and HOST records fully registered. Domain controllers within the hub site tend to get the most administrative attention, typically have the largest collection of domain controllers in the same site, and because they are in the same site, replicate changes between each other every 15 seconds (Windows Server 2003 or later) or every five minutes (Windows 2000 Server), making such DNS records "well-known".

    Dynamic domain controller SRV and host A and AAAA record registrations may not make it off-box if the registering domain controller in a branch site is unable to outbound replicate.

    Member computers and servers should continue to point to site-optimal DNS servers as preferred DNS and may point to off-site DNS servers for additional fault tolerance.

    Your ultimate goal is to prevent everything from replication latency and replication failures to hardware failures, software failures, operational practices, short and long-term power outages, fire, theft, flood, earthquakes and terrorist events from causing a denial of service while balancing costs, risks and network utilization.
  • Make sure that destination domain controllers can resolve source domain controllers using DNS (i.e. avoid fallback)

    Your primary goal should be to ensure that domain controllers can successfully resolve the guided CNAME records to host records of current and potential source domain controllers thereby avoiding high latency introduced by name resolution fallback logic.

    Domain controllers should point to DNS servers that:
    • Are available at Windows startup
    • Host, forward, or delegate the _msdcs.<forest root domain> and primary DNS suffix zones for current and potential source domain controllers
    • Can resolve the current CNAME GUID records (for example, dded5a29-fc25-4fd8-aa98-7f472fc6f09b._msdcs.contoso.com) and host records of current and potential source domain controllers.
Missing, duplicate, or stale CNAME and host records all contribute to this problem. Scavenging is not enabled on Microsoft DNS servers by default, increasing the probability of stale host records. At the same time, DNS scavenging can be configured too aggressively, causing valid records to be prematurely purged from DNS zones.

  • Optimize domain controllers for name resolution fallback

    The inability to properly configure DNS so that domain controllers could resolve the domain controller CNAME GUID records to host records in DNS was so common that Windows Server 2003 SP1 and later domain controllers were modified to perform name resolution fallback from domain controller CNAME GUID to fully qualified hostname, and from fully qualified hostname to NetBIOS computer name in an attempt to ensure end-to-end replication of Active Directory partitions.

    The existence of NTDS replication Event ID 2087 and 2088 logged in the Directory Service event logs indicates that a destination domain controller could not resolve the domain controller CNAME GUID record to a host record and that name resolution fallback is occurring. See the following article in the Microsoft Knowledge Base for more information:

    824449 Troubleshooting Active Directory replication failures that occur because of DNS lookup failures, event ID 2087, or event ID 2088

    WINS, HOST files and LMHOST files can all be configured so that destination domain controllers can resolve the names of current and potential source domain controllers. Of the three solutions, the use of WINS is more scalable since WINS supports dynamic updates.

    The IP addresses and computer names for computers inevitably become stale, causing static entries in HOST and LMHOST files to become invalid over time. Experienced administrators and support professionals have spent hours trying to figure out why queries for one domain controller incorrectly resolved to another domain controller with no name query observed in a network trace, only to finally locate a stale host-to-IP mapping in a HOST or LMHOST file.
  • Change the startup value for the DNS server service to manual if booting into a known bad configuration

    If booting a domain controller in a configuration known to cause the slow OS startup discussed in this article, set the service startup value for the DNS Server service to manual, reboot, wait for the domain controller to advertise, then restart the DNS Server service.

    If the service startup value for DNS Server service is set to manual, Active Directory does not wait for the DNS Server service to start.
Additional considerations:
  • Avoid single points of failure.

    Examples of single points of failure include:
    • Configuring a DC to point to a single-DNS Server IP
    • Placing all DNS servers on guest virtual machines on the same physical host computer
    • Placing all DNS servers in the same physical site
    • Limiting network connectivity such that destination domain controllers have only a single network path to access a KDC or DNS Server
Install enough DNS servers for local, regional and enterprise-wide redundancy performance but not so many that management becomes a burden. DNS is typically a lightweight operation that is highly cached by DNS clients and DNS servers.

Each Microsoft DNS server running on modern hardware can satisfy 10,000-20,000 clients per server. Installing the DNS role on every domain controller can lead to an excessive number of DNS servers in your enterprise that can increase cost.

  • Stagger the reboots of DNS servers in your enterprise when possible.
    • The installation of some hotfixes, service packs and applications may require a reboot.
    • Some customers reboot domain controllers on a scheduled basis (every 7 days, every 30 days).
    • Schedule reboots, and the installation of software that requires a reboot, in a smart way to prevent the only DNS server or potential source replication partner that a destination domain controller points to for name resolution from being rebooted at the same time.
If Windows Update or management software is installing software requiring reboots, stagger the installs on targeted domain controllers so that half the available DNS servers that domain controllers point to for name resolution reboot at the same time.

  • Install UPS devices in strategic places to ensure DNS availability during short-term power outages.
  • Augment your UPS-backed DNS servers with on-site generators.

    To deal with extended outages, some customers have deployed on-site electrical generators to keep key servers online. Some customers have found that generators can power servers in the data center but not the on-site HVAC. The lack of air conditioning may cause local servers to shutdown when internal computer temperatures reach a certain threshold.

No comments:

Post a Comment