Don't Panic: How to Prevent Hardware Emergency Chaos

We've seen a hardware emergency or two in our time (it's kind of our job), and we've witnessed the good, the bad, and the ugly in various situations. As we reflect on what made the biggest difference in diverse types of hardware failures, it's clear that preventing failures from becoming full-blown chaos is the best idea.

But even in the midst of an IT disaster, you can still stay calm and get sh*! done. We've gathered tips that can help prevent failures from turning into chaos and what you can do when you're already in the middle of it. If you'd rather jump to a section, the links are below. Just don't give way to the blank stare of panic that can paralyze you.

Prevent Hardware Emergency Chaos

How To Keep Your Cool When Sh*! Is Hitting The Fan

Surprised cat with wide panic eyes

How to Prevent Hardware Emergency Chaos

While you may not be able to prevent hardware failures from happening entirely, you can keep them from turning into full-blown disasters with a few preventive measures.

Make A Disaster Plan

Roman style statue sitting on its base with laptop on lap

Taking the time to create a plan for your data center environment that can handle failures with minimal disruptions and downtime can save your bacon when issues come up. We are happy to help our customers with a strategy, especially if we've already helped out in an emergency situation.

Your unique environment will dictate the specifics of your disaster plan, but a simple place to start is by covering the following:

The Who - Who is the point person on your internal team? Who else needs to be involved? Do you have a vendor partner who will be helping (like M Global)? Make sure you have appropriate contact information for any involved parties.

What - What assets are involved in this plan? Do you have equipment not at this site that needs to be considered in the plan? Having an inventory list, a clear idea of critical resources, and any other environment-specific details documented will make things so much easier. For instance, if your devices are part of a colocation data center, how you inventory parts might differ from a data center located on-site at your company building.

How - Are you going to do replication? Tape backup? Cloud backup? Disk to disk? Hot spares? What are you going to back up? How are you going to restore it? Reviewing your redundancy and backup strategies can be a big task but well worth it. When asked, "Do you have a backup?" You want to be able to say yes. Trust us.

Pro Tip - Label everything. For instance, it's a good idea to label bad parts if you are not disposing of them immediately. What a pain to accidentally reinstall a bad part, thinking it was a spare! (Yes, we've seen it happen.) Label cables, too. You don't want to be wondering where that random cable came from in the middle of troubleshooting. And, you might not be the one dealing with the issue.

For more information, check out our tips and recommendations on redundancy levels and backup schedules.

Get Set up For Success

Beyond a thorough disaster plan, it's also a good idea to make sure you have your vendor partner relationships dialed in. Take a close look at your SLA — does it reflect your needs accurately? For instance, a 9x5xNBD may not be the best fit in a mission-critical environment where any downtime is detrimental. And don't forget your parts stocking strategy - SLA fulfillment depends heavily on the parts being available when you need them. We've got a quick and easy guide to help you choose the right SLA if you're not quite sure.

If you don't currently have a contract and don't intend to get one, it's still a good idea to familiarize yourself with a provider that can help in an emergency. TPM providers vary in the policies and procedures for taking on clients without a contract, so finding one that can and will help you in an emergency before things become desperate can save time and stress.

Pro Tip: Have your contact list in order. Make sure you have current and correct vendor and supplier information. The last thing you want to do is hunt back through emails when everything is on fire!

Preventative Maintenance Can Save Your Behind

It might feel like we harp on this topic a lot, but it's true that preventative maintenance can prevent problems from starting. We have a whole post about what you should do and why for preventative maintenance in the data center.

Here are just a few things you can do to prevent the panic-inducing chaos:

  • Follow OEM Preventive Maintenance Guidance — the manufacturer's advice is there for a reason.
  • Perform regular system health checks — you'll get a heads-up on potential problems before they become an issue.
  • Clean regularly — dust, debris, and cobwebs can significantly impact performance.
  • Report and act on issues promptly — don't wait until the system is down before dealing with issues.
Robot with internal parts showing, holds a wrench in each hand

Pro tip: Get on a maintenance schedule. The size and complexity of your environment could dictate the length of time between maintenance check-ins, but once a month is a good place to start. At the very least, you'll want to check there are no new rats living on top of your system - gross! Extra points if you document the maintenance you've performed. It makes our job so much easier when helping you with any kind of issue if we know when and what maintenance has been done.

Don't Hoard Information or Responsibility

A dragon coiled around a treasure hoard.

Whether on purpose or not, most of us have been guilty of hoarding the burden of a situation gone wrong. Often, being unwilling to share the load or communicate can lead to unnecessary stress, burnout, longer resolution time, and the higher possibility that tasks could go unfinished (you are, after all, only human).

It is essential to find the right balance between sharing with those who can bring a fresh perspective and avoiding complicating matters by involving unnecessary people or sharing superfluous details. You don't want a "too many cooks in the kitchen" sort of situation, so use discretion with this tip.

Don't be a hoarder when communicating with your provider, either.  Let your vendor know if you are swapping, doing DR tests, installing new devices, moving, powering anything off, or making other hardware-related changes. Looping the provider in allows them to make preparations, such as ensuring resources are available should something go wrong. It's like a patient not telling their doctor important health information that could assist in an accurate diagnosis. Share with us all the details you can think of, even if they seem unimportant.

And if you don't have all the information you need, please ask! For example, if you feel mostly confident about swapping the part yourself, great! We will support you every step of the way, but don't be afraid to ask for a refresher if you aren't 100% sure. We're always happy to give step-by-step instructions (or whatever level of guidance you need.) Honestly, we'd much rather help you play a successful role in the resolution than potentially have more issues, especially in the case of an emergency.

Pro Tip: Document everything - even if it seems silly, useless, or like you will most definitely remember it. (You probably won't.) Things like credentials, locations, network diagrams, and more can come in very handy. We recommend sticking it in a spreadsheet and keeping the information current.

If possible, make sure the "knowledge transfer" happens when someone leaves or retires. We've seen it all too often, especially with older equipment like AS400s, where the new guy has no idea what's happening or how to deal with such old equipment.  It makes a critical situation so much more challenging.

How to stay calm when sh*! is hitting the fan

What about when the hardware emergency is already upon you? How can you keep your cool and get the fastest possible resolution?

Keep Your Emotions in Check

Panic is contagious, so don't infect others and don't let others pass it on to you. Panic breeds hysteria and confusion, which can create fertile ground for poor decision-making. Remaining calm and clear-headed is your best bet for a good outcome.

That applies to your provider, too. It benefits us both when we can stay composed and collected in the during a difficult situation. It doesn’t mean that we don’t care.

Business people stick figures running followed by stick figure holding a sign that says Crisis.

Just like you wouldn't want the EMT to start freaking out at the scene of an accident or the surgeon panicking right before your surgery, keeping calm helps us be more effective and solve problems faster (that benefits you, too).

Panic and its cousins, alarm and fear, can prompt you to go off script, leaving the sensible, logical plan in the dust for chaos and reckless decision-making. Troubleshooting, logistics, scheduling, and other elements of resolutions can become a nightmare if everyone is in a frenzy and not thinking clearly.

Pro Tip: Take a few deep breaths before doing anything else. If you've followed the above recommendations and have a disaster plan and proper documentation, you'll know what to do next. If not, and you are flying blind, it sucks to be you right now, but there is still hope. Move on to the next tip.

Collect Information BEFORE Contacting Your Provider

The more information you can gather and share when communicating with your provider, the faster and easier the troubleshooting will go. Here's a quick checklist of info we will likely ask for, so speed things up by doing it beforehand.

Information Gathering Checklist

  • Have device serial #s handy.
  • Gather system logs and/or screenshots of errors.
  • Perform a visual inspection, which includes checking the LED status and verifying device connections, power cables, and all other connected elements, including cable ports, drives, controllers, network cards, etc.
  • Check connected pieces of hardware to make sure that they are not the issue. For instance, if your server is not working but the switch is experiencing a failure (this gets more complicated if you have multiple vendors supporting different devices).
  • Confirm that there are no external contributing factors (i.e. the ISP is down, the power is out at the site, etc.)
  • Document any troubleshooting steps done internally (which controller had a port issue, which disk was reseated, why, etc.)

Pro Tip: Gathering the information before opening the service ticket will prevent wasting time with the back-and-forth emails that happen when we don't have the necessary information. Including a picture of the device or screenshots of error codes can be very helpful, too.

Gold star for you if there are no further questions we need to ask before getting started troubleshooting.

Resist The Urge To Call — Email Instead

Don't change things up in an emergency — it'll throw everyone (including you) off their game. Instead, communicate with your provider as you would normally. We prefer communicating via email, even in emergency situations. For some, calling in may feel more satisfying (and that's fine if that's you), but it generally slows things down. We can't assess the situation, check resources, speak to the internal team, coordinate logistics or scheduling until we're done talking on the phone. And often, we'll need to have the customer email us with the necessary information anyway, so it adds extra step.

Cutout of email notification symbol

Email is the most effective, thorough, and fastest way to let us know what's going on. It also allows the customer to take a breath, collect their thoughts, and articulate the problem while giving us the information they gathered from the checklist above.

Email also allows us to ensure the necessary resources and experts are available and that everyone working on the situation has access to the information they need simultaneously, speeding up the process.

Service Ticket Email Template

Here's the information we need for opening a service ticket. Putting it in your initial email helps us jump right into troubleshooting and alerts the logistics team of what is coming.

Information Required:

  • Caller’s Name, Phone Number, and Email Address
  • Site Contact’s Name, Number, and Email Address
  • Site Address:
  • Contract #:
  • Model #
  • Serial #
  • Description of the problem
  • Severity Level (Low/Medium/High)

Pro Tip: Copy and paste this into your email templates so it's ready to go when you need it.

Breathe Easy & Let Us Get to Work

When you can trust your provider, you don’t have to let the panic cloud your knowledge of past results. Just to ease your mind a little further, MGS emails are watched like hawks. You will not be overlooked. We always acknowledge receipt and will respond with the next steps right away (usually within 15 minutes). You'll never be left wondering what's going on.

Also, you can rest easy knowing we won't be winging it — we will be formulating a solid plan of action (have you figured out that we're super into making a plan??) and will update you with all the information and details so you will be in the loop on what is happening and when.

If you're not working with a provider you feel you can trust, the solution to that is obvious.

Stick figure at desk dreaming of vacation

Pro Tip: If you have special communication requests (such as requesting a phone call from us within a time frame or anything else you need), feel free to ask them in your initial email. If calling in feels most comfortable, that's fine, too. Let us know beforehand, and we can set up a protocol that involves the best of both worlds.

Let M Global Help

When things get difficult, we've got you covered!

Whether you're in the thick of it now and need help, or you are getting your ducks in a row before it hits the fan, we would love to help! We're all about turning challenges into successes and obstacles into opportunities for growth. We love creating solutions for our clients, no matter how difficult the challenge.

We want you to consider us an extension of your team, a trusted resource and an advisor. Fill out the form or give us a call at 855-304-4600 to find out more.

Got an IT Nightmare? Let Us Help.

Anime of Angie with scrolls
Kyle - anime version
Bill - anime version
Omari - anime version

Author Note:

By Angie Stephens with contributions from experts at M Global.