Troubleshooting Guide for Dell PowerEdge Server

Diagnosing Common Dell PowerEdge Server Issues

We get it. Troubleshooting your Dell PowerEdge Server can be a pain. Error messages and email alerts are coming at you, and searching through manuals, articles, and forums for help can be equally frustrating.

This troubleshooting guide aims to help you diagnose a few common errors that don't require a ton of engineering expertise to repair. Obviously, we can’t cover every Dell PowerEdge server issue out there, but we’ve put together some diagnostic help for a couple of the most frequent problems.

Server room showing a lot of amber lights

We have to stick a disclaimer in here that M Global can't take responsibility for the implementation of any of the advice given on this page. But if you find yourself in a bind with your Dell PowerEdge server and need help resolving your issue, give us a call! We are happy to help!

How to Do a Physical Inspection on Your Dell PowerEdge Server

A physical inspection of your PowerEdge server is one of the first important steps in diagnosing issues. You'll be able to identify common problems quickly, such as power supply issues and hard drive failures. You'll be keeping an eagle eye out for amber LEDs indicating errors as well as loose cables or connections.

  • Make sure you are looking at the correct system. Dell uses Service Tags rather than Serial Numbers, which is the usual go-to for a physical confirmation that you are at the right place. Some models, like the PowerEdge R630 shown, have an LCD display in the front of the device that shows the Service Tag.
  • Take a look at both the front and back of the device. Are there any cables that appear loose or unplugged? Examine the LED indicators on the disks, chassis and PSUs. If any LEDs are amber or orange, they may be indicating that there is an issue with the server.
  • Make a note of the component that the LED is on and continue your inspection of the server until complete. You may find further issues or LEDs that will provide a more complete picture of the situation, so it's important to complete the entire inspection before troubleshooting. Pictures can often be helpful if you are relaying information to a colleague or support provider.
dell-server-3
dell-server-power

Don't worry about making the changes while physically evaluating the system for failures. The goal of a physical inspection is to identify potential issues to investigate. The next step will involve confirming that a failure has occurred, which will help determine the criticality of the problem. A plan can then be put in place for replacement or repair, depending on the situation.  If you have hardware support through a vendor like M Global, you'll pass on the information to us, and we'll take it from there.

What's Your PowerEdge Problem?

Your physical inspection may have yielded some information about what's going on with your Poweredge server.

Power Supply Issues In a PowerEdge Server

While performing the physical inspection of the device, you may notice loose or missing cables. If something is unplugged, it's a good idea to check with your team to make sure it wasn't unplugged on purpose before plugging back in. More often than you'd think, a cable has come loose and a simple adjustment solves the problem. If everything appears to be connected correctly, take a look at the LED indicators.

 

dell-psu-2
dell-led-states

From the information gathered from watching the LED, you can potentially determine a PSU’s condition from the table below:

Green Normal
Blinking Green PSU firmware being updated
Blinking Green & Then Off PSU feature mismatch
Blinking Amber PSU error
Not On PSU is not receiving power

Hard Drive Issues on a Dell PowerEdge Server

Hard drive issues are another common failure that can be diagnosed through a physical inspection. The LED indicators will give a pretty good idea of what is happening.

The image shows the different parts of a hard drive:

1. The Physical Disk Activity Indicator

2. The Physical Disk Status Indicator

3.  Button Activating Latch

4. Disk Information

dell-hdex4

What does the PowerEdge Drive Status LED mean?

From the information gathered from watching the LED indicators, you can potentially determine a drive's condition from the information in the table below.

Blinks twice per second Drive identification, prepare for removal
Blinks green, then amber, then turns off Drive is predicted to fail
Blinks amber four times per second Drive has failed
Slow green blink Drive is rebuilding
Solid green Drive is online and normal
Three green blinks, then three amber blinks, then turn off Rebuild has failed
Off Ready for install/removal and drive not detected

Failed Hard Drive on a PowerEdge Server

Hard drives are one of the most common failures. The good news - it may not necessarily spell disaster. Most PowerEdge servers come with a PowerEdge RAID Controller (PERC) card. The PERC card allows PowerEdge servers to be configured with multiple different RAID (Redundant Array of Independent Disks) levels. The typical tradeoff for additional redundancy is less storage, so optimizing the levels to suit your needs and comfort level is important.

While running a system with a failed drive will usually have very minimal impact, don't put off taking care of a drive replacement. Most critical server hard drive issues happen when a new drive failure occurs while a different drive is already in a failed state.

Replacing a failed disk on a PowerEdge Server

The hard drive replacement process has become streamlined over time. It typically involves identifying the failed disk, pressing a button to release a lever to remove the hard drive, and then doing the reverse to install the new drive. If you need additional help with a failed hard drive in a PowerEdge server, reach out to us!

dell-drive-3

How to Collect Log Files for a PowerEdge Server

Collecting log files is an important and necessary step, especially if you are getting help from a hardware support provider.  The steps to generate SupportAssist logs can differ depending on your version of iDRAC.  The links below will take you to our PowerEdge log collection directions.

Need more help with your Dell PowerEdge Server?

If you need further help with your PowerEdge server, we'd be happy to jump in!

If you're unsure how severe your issues are, check out our severity list below:

1 - Severe — system is down and unable to perform its duties.  Example: OS cannot start or system completely unresponsive.

2 - Medium — system is up but with a performance impact.  Example: Write cache has failed and the system is performing slowly.

3 - Low — system is still running but there is a component that is predicted to fail.  Example: ECC error on a DIMM, predictive failure on a disk. Performance is not impacted.

Our diagnostic guide covers lower impact problems (level "3" or "Low") since troubleshooting more severe failures can get complicated. Some situations can be made worse and have serious consequences that can impact your business.

That's where the expert knowledge of our engineers come in. We are happy to step in at any stage and help!

Get Help Now.

Call us at 855-304-4600, fill out the form below.