Active Directory Troubleshooting Help

A friend and former co-worker of mine (Sean Deuby) has some excellent Active Directory Troubleshooting guides available online for free.  These aren’t going to solve every problem for you but are great to ensure you have covered your basis when trying to troubleshoot Active Directory.  Take a look at the link to see all the great help he has.

You know…is the Internet great?  I mean really think about all the great things that are available at our finger tips, things like these great troubleshooting guides.  The Internet hasn’t always been great but I’d say over the last 5 years it has really blossomed well.  I know there is bad and harmful things out there but I really do believe that there is more good than bad…OK, time for me to stop thinking out loud again.  Smile

A Couple Quick Active Directory One-Liners

Here is a few one-liner commands to help get info on your Active Directory environment.  I don’t think there is any mind blowing commands here but they’ve helped me out.  There are literally hundreds of these around the web as well as PowerShell ones but these are the ones that I’ve been using lately. 

How to view the Domains you trust and see what those Domain SIDs are:

nltest /domain_trusts /v

A quick listing of your AD Sites:

dsquery * "CN=Sites,CN=Configuration,DC=forestRootDomain" -attr cn description location -filter (objectClass=site)

A quick listing of your AD sites and their Site Links and Costs (sure would be nice if you could spit this out to Visio or something):

dsquery * "CN=Sites,CN=Configuration,DC=forestRootDomain" -attr cn costdescription replInterval siteList -filter (objectClass=siteLink)

Compare time against your forest root PDCe:

w32tm /monitor /computers:ForestRootPDC

Find out which DC for a site is the ISTG:

dsquery * "CN=NTDS Site Settings,CN=siteName,CN=Sites,CN=Configuration,DC=forestRootDomain" -attr interSiteTopologyGenerator

Server Health Checks

I’d like to share some of the things I look at while do a health check on a server.  Its funny how few resources there are out there on the Internet.  I believe people keep this kind of stuff to them self because they are scared they are going to miss something and they will never live it down.  My response to that is, So What!  Heck, I don’t claim to know it all but why not share what I do know and maybe others can share via the Comments!!!

When I’m troubleshooting I like to compartmentalize what I”m looking for.  With that my health checks are set up the same way.  I also believe health checks are quick snapshots of the health of a server.  Sure there are tools that you can use to analyze systems further but in this case we are doing a quick health check.  Not all of these need to be done but some should, you get to decide.

CPU

Occasional high CPU spikes are ok as long as you are aware of the process causing this. A server should maintain 80% CPU utilization for an extended period of time.  If it does it may be time to upgrade.  Its a good idea to keep Task Manager open during the duration of your troubleshooting to see trends.

Check CPU Usage

  1. Open Task Manager

  2. Check the Processes tab, ensure there are no processes consuming excessive CPU

  3. Check the Performance tab, ensure there are no single CPU’s that have excessive CPU usage

Check CPU HW

  1. Open Device Manager (right click computer –> Manage)

  2. Ensure that no CPU’s have red X or yellow ! underneath the Processors

Processes

This is one area that you may not want to do for quick health checks but is something you should be familiar with.  Task Manager only gives you basic info on processes and you will find that you may need to dig a bit deeper.  For that I recommend Process Monitor from the great SysInternal tools.  Process Explorer can also be used.  In fact download and play with all these tools…they will save your bacon, I guarantee it.

In-Depth Check
SysInternals:

Copy Process Monitor locally, then launch it.

  1. Analyze each process and watch what operations open the reg keys, file etc.

Copy Process Explorer locally, then launch it.

  1. Analyze each process based upon the number of threads, handles, loaded DLL’s,etc.

Two great webcasts can be viewed here to see these types of tools in action.

Memory

General rule of thumb is to make sure the general memory utilization does not exceed 80%within a given period of time.

Check Memory Availability

    1. Open Task Manager
    2. Select the Performance tab

    3. Look at the Physical memory box,and multiply the total memory by .2

    4. If the total available memory is less than this number then the box is currently utilizing more than 80 percent of the memory.

Current utilization by process

  1. Select the Process tab

  2. Check the ‘show processes from all users’ box in the bottom left corner

  3. Click the column header ‘Mem Usage’ to sort the processes by memory utilization, highest to lowest. This will help you determine what processes are currently utilizing the memory on the box and can help you narrow your search for memory intensive processes.

Network

Check NIC HW

  1. Verify both ends of the network cable are securely seated in the port

  2. On the back of the server verify you have a green blinking link light on the NIC port

  3. Verify NIC HW is working properly by using Device Manager and ensure the active NICs are showing green

  4. Verify gateway, IP, subnet mask, DNS, DNS suffixes, etc. are properly configured.

  5. If everything is properly configured and HW is working, you should be able to get a ping response from the gateway.

Check Network Connections
Here are some other checks you should perform to ensure proper network connectivity:

  1. ipconfig /all will display all you TCP/IP settings including you MAC address

  2. ipconfig /flushdns will flush your dns resolver cache

  3. ipconfig/displaydns will display what is in your dns name cache

  4. Netstat -an command will show all the connections & ports from a machine

  5. Nbtstat command will show net bios tcp/ip connection stats

  6. Tracert <IP or DNS Name> command will show you the path the packet takes, the routers, and the response time for each hop.

  7. pathping <IP or DNS Name> command combines ping and tracert to the 100th degree.  It pings each hop 100 times and is great for testing wan connectivity

Disk Space

All kinds of bad stuff can happen when your disk space is filling up.  The best way to alleviate this is to write a script to notify you when you reach a certain threshold. In a future post I”ll share a method for you to do just that…however if there is a problem and you need to perform a health check then here is how you check the space the old fashion way.

To check disk space manually:

  1. Right Click on My Computer

  2. Select Manage

  3. Select Disk Management

  4. Validate each disk more than 10 percent free space

Event Logs

Event logs can reveal a more historical perspective on what is going on with the system and applications. Things to look for when troubleshooting event logs is to query either the system or the application logs and look for the presence of events that have a timestamp near the time of the issue you are troubleshooting.

Events have 3 categories in the event viewer:

  • Informational: Noted with a white icon and letter ‘i’. Successful operations are logged as informational. Usually not used in troubleshooting problems or failures

  • Warning: Noted with a yellow icon and exclamation point. These usually are looked up as they serve as predictive future failure indicators, such as disk space running low, dhcp ip address lease renewal failures, etc.

  • Error: Noted with a red circle icon and ‘x’. These are indications that something has failed outright and are a good starting point for troubleshooting.

When looking at event logs, use the information to determine the following:

  • Is the incident tied to a particular time or outage incident?

  • Is this a one-off, or has this particular error occurred multiple times in the past?

  • Does this error appear on other systems or is it unique to the system that has failed?

Also make sure you take a look at eventcombmt from Microsoft.  This tool allows you to search the logs of multiple machines.  The benefit to this is to see if a specific error or warning message is also occurring on other systems.  This can help rule out issues.

Services

Troubleshooting services should be limited to the specific that is affected by the problem being troubleshot. Each server will have specific services varying upon the types of applications running. You should document how your servers services are configured to and compare that to the server in question to see if anything is not configured correctly.

Cluster

Servers that host applications and services that require high availability should be clustered so that if one node fails the other can pick up the workload.  Clustered servers need the same type of health checks as stand-alone systems except you will want to check on the health of the cluster.

Check Cluster Resource Status

  1. Open Cluster Administrator: Log onto server, select Start –> Run –> cluadmin

  2. Check the Resources and ensure all are Online

  3. If Cluster Administrator does not open, ensure that the Cluster Service is running on the node.

  4. Cluster resource status can also be checked from a remote server. From a command prompt, just type – cluster res <cluster name>

Client Side Health

  1. Right click on My Computer, select Manage

  2. Open Device Manage

  3. Drill down to SCSI and RAID Controllers, verify that the HBA HW is visible and does not show any errors

  4. If it does not show up in Device Manager, you may need to re-scan for the HW, re-seat the fiber card, or re-install the driver.

  5. If the HBA is showing healthy in Device Manager, open the tool that you use to view configuration and settings for the fiber card and verify there aren’t any transmit/receive errors on link statistics or counters

Switch Health

  1. Make sure fiber is properly connected to each switch

  2. Make sure switch has no errors

  3. If you’re using zoning verify it is properly configured

Check Fiber and SAN Connectivity

  1. Log onto san appliance and verify that the SAN is in general good health and no major errors are present for the controllers, loops, switches, or ports.

  2. Ensure that the LUNs are presented to the servers in the cluster

NLBS

Some applications will require you to spread the load across multiple servers.  Web servers are a very popular choice to network load balance.  As with clusters we will need to check the status of the load balancing.

Check NLBS Status CMD Line

  1. From a command prompt on the local system, run ‘wlbs query’. This will give you the convergence status of the local node with the nlbs cluster.

  2. Other useful NLBS commands: wlbs stop (stops nlbs), wlbs start (starts nlbs), wlbs drainstop (drains node)

Check NLBS Configurations

  1. Open up the network properties –> Network Load Balancing, right click & select Properties

  2. On the Cluster Parameters tab, verify that the IP address is configured for the shared NLBS IP and that the subnet mask, domain, and operation mode are configured correct1y.

  3. On the Host Paramters tab, make sure each node of the cluster has a unique host identifier. Also verify the IP and subnet mask are configured for the local values.

  4. Also make sure that your switch has a static ARP entry if using multi-cast NLBS. The entry should be that of the virtual MAC of the cluster. To get the virtual MAC of the cluster, you can run the following command: WLBS IP2MAC <virtual IP address>

Name Resolution

To healthcheck name resolution, open a command prompt and enter the following

  • nslookup <servername>

Verify that the servername is correctly entered in DNS

If a record does not show up in the DNS query, or maps to a different name, perform a reverse lookup by IP address to see what name is associated with the IP address * nslookup <IP address>

If no name shows up associated with the IP address, log into the domain controller and check the DNS records for this particular name/ip address

  1. From a Domain Controller go to start–>run–>dnsmgmt.msc

  2. Expand the Forward Lookup Zones

  3. Expand the zone for you primary zone that holds the records for the system/s you are troubleshooting

Validate that the record exists. If it does not exist manually enter the record name and IP address by right clicking on this same zone,

  1. Select new host (a)

  2. Enter the name and IP address

  3. Check the box next to Create associated pointer (PTR) record

  4. Click add Host

Additionally log back into the node that you manually entered the record for and ensure that DNS is registering in DNS

  1. Right click on the My Network Places icon on the desktop and select Properties

  2. Double click on the primary adapter

  3. Select properties

  4. Highlight internet protocol (TCP/IP) and select properties

  5. Validate the IP addresses of the DNS servers are correct

  6. Select Advanced

  7. Select DNS tab

  8. Make sure the box is checked next to Register this connection’s address in DNS

As I wrap this up I realize there is so much more that can be done.  Each application type of server needs its own set off health checks.  For example web servers, terminal servers and database servers.  Remember this is just the baseline for each server and that other components can and should be layered on top of it.  Again I would love to hear from others so please feel free to add you comments below.

Is my Active Directory Backed Up?

There are a ton of methods to backup Active Directory.  I’m not going to get into each method with this post.  What I am going to do is share another little command that can be run to check to see if your Active Directory was backed up and when.

Before I discuss that command one point I would like to make is to be very careful about who you let backup and restore your Active Directory DB.  From a security standpoint this could be a major violation of your company’s security policy.  Think it about for a minute.  Let’s say I work in a support group in your company that provides backup and restore services for all systems, including Domain Controllers.  I could take that backup of Active Directory and restore it to a private system that I have.  Now I could use a number of tools to help try to crack into it.  Sure it may take a bit of time but I”ve got plenty of time.

If you have a group that is responsible for backups and restores on Domain Controllers then I believe you need to put some really good policies and guidelines in place to protect your most important asset…Active Directory.  I actually don’t like anyone backing up Active Directory that isn’t an Administrator and I always select the option that only and Administrator can restore the backup.  I understand that a rouge admin could do harm but at least there was some mitigation put in place.

Now, finally to the point.  Is my Active Directory backed up?  For this one we are going to run another Repadmin command.

repadmin /showbackup

This will show you when your last backup of Active Directory ran.  You don’t need to run it against a specific DC because Active Directory doesn’t care.  If you have child domains in your environment and want to run this against them all just put a  * at the end of the command and it will check all the domains.

Now go out there and make sure your Active Directory is backed up!!!

New Server Core Guide

Just saw over on the Server Core blog that Andrew posted some links to a couple excellent resources.  The first one is what I consider to be the Server Core Bible.  It has just about everything you can think of when it comes to configuring Server Core.  The next link is to a couple job aids that give you a quick look at some common commands. 

These job aids actually gives me some ideas on some things I’d like to create…now if I only had more time.