Getting Started: Disaster Recovery Planning

Without Destroying Your Budget

Large and small disasters happen all the time. Events ranging from purely local disasters such as local flooding caused by a fire down the block, to a city-wide flu epidemic, or a region-wide blizzard, all have the potential to put companies out of business.

In our disaster planning work with many different types of organizations, we have seen that too many of them make the recovery process harder for themselves - or even impossible - by not planning ahead for disaster recovery. While they may take steps to try to prevent disasters, they ignore the reality that prevention won't always work.

As a result, these organizations fail to take prudent and inexpensive preparatory actions to facilitate the recovery of their business operations.

DISASTER CATEGORIES

The first of two fundamental hurdles to overcome when planning for disaster recovery is to realize that the seemingly large variety of possible disasters can actually be reduced to a manageable number. In point of fact, all disasters can be grouped into one or more of only three categories. These are:

  • loss of information,
  • loss of access,
  • loss of personnel.

RECOVERY TIME PERIODS

The second hurdle to overcome is in accepting the fact that "business-as-usual" will be suspended at the time of the disaster. In fact, the people who are usually in charge may not even be available! For example, several years ago, there was a gas line explosion at a bank in the midwest. In the explosion, all employees were either killed or injured, the president was among those killed, and the executive vice president was left to try to manage the recovery from his hospital bed.

What you have to accept is that there will be two time periods which must be planned for following a disaster. First will be the immediate, disorganized, "limited-operation" time span, which will then be followed by a period of "makeshift-operations," which can be quite lengthy until normal operations can be resumed.

Typically, following a physical disaster, the limited-operations time span can extend for up to a week or more, while the makeshiftoperations time span can last for several months until normal operations are restored.

This need to recover in phases is typically very difficult for management to accept. Often, when asked to prioritize among the organization's services or products, our clients' first reactions are to consider them all equal. Following that, people are often unrealistic in their estimation of how fast departments can accomplish their tasks. In one of our client situations, the organization had planned to relocate a key department to a hotsite four hours away - without realizing that most of the affected people were single parents, who couldn't possibly go there!

Once management has a proper mind-set to build upon, the objective of the planning process is to systematically sort out the various issues and priorities so that a cost-effective plan can be developed which is in perspective to the level of loss exposure which the organization is risking.

The process itself can be summarized in the following steps:

  • provide top-management guidelines,
  • identify serious risks,
  • prioritize the operations to be maintained and how to maintain them,
  • assign the disaster team,
  • take a complete inventory,
  • know where to get help,
  • document the plan,
  • review with key employees, test the plan, and train all employees.

Each of these is discussed below.

Top management guidelines:

Input from top management is required to keep the planning process in perspective and to insure participation by everyone within the organization. Top management also has to indicate the length of time during which time the organization is willing to accept disrupted service and the amount of money the organization is willing to invest in procuring standby equipment, paper forms, testing, etc. as part of being prepared for an emergency. Input from management is also important in assigning priorities to which operations will be maintained during the limited-operations time span and which will be recovered later.

Our experience has been that even though employees thought they knew the answers which management would give regarding these priorities, invariably this stage in planning produces the most surprises and shows how little communication often occurs between management levels.

Identifying serious risks:

This is a "brainstorming" process, which is best accomplished working with the employees themselves during department or group meetings. It serves the dual role of starting to build the awareness of the employees to the issue of disaster planning as well as surfacing potential risk areas about which management may not have been aware. For example, one of our clients who performs extensive money wire transfers, discovered that in the event of telephone service interruption, the emergency "callback" number they had given to their wire-transfer service agency was in the same building as their normal telephone number. Obviously, in a disaster, neither line would be available. The client immediately had the number changed to one in another building - but would never

have known of the problem without going through the process with lower-level employees.

Prioritize the operations:

Most managers never think about it, but for the typical organization, the highest priority is payroll. Even if this is performed by an outside service, there is usually a terminal for remote input of the payroll data. So, in the event that there is a disruption, either at the source of the data, or at the payroll processor, there must be a delegation of authority to someone (remember, the president, owner, etc. may well not be available) to be able to issue substitute manual advance checks.

In general, top management will have to decide, depending on the kind of organization, how long they are willing to operate without being able to perform each of their daily operations, such as accepting customer credit applications, receiving deliveries, etc., in addition to their more obvious operations such as buying and selling. Banks need policies on accessing safe deposit boxes, sending out mortgage bills, commercial night depository, etc., in additions to just worrying about deposits and withdrawals.

Based on these priorities, the organization can plan out how long to suspend each operation, and designate either a manual backup mode or a longer lead-time approach for each function.

These priorities also guide the organization in setting the frequency of off-site storage of backup files. For example, in order to meet emergency requirements, some files which might normally be stored off-site on a weekly basis might instead be stored on a more frequent basis.

Assign the disaster team:

Disasters always seem to happen at the worst possible times, when the fewest personnel are available. Therefore, it is crucial that as part of the disaster plan, management appoint one person in charge of recovery, and one person as second-in-command. Following this, as many specific tasks as possible within the plan should be pre-assigned. In the wake of hurricane Hugo, with most telephone service knocked out, one company in South Carolina which had not preassigned tasks, reported that it took four days just to assemble their key personnel. That is certainly not the way to endear yourself to your customers or clients! The best basic rule of thumb is that when disaster occurs, employees should know what they are responsible for, and are not responsible for, who is in charge, and who is the designated alternate in charge.

Inventory:

While most organizations have records covering the make and model numbers of their equipment, at the time of purchase, they are usually not updated and almost never kept off-site. Taking inventory should include emergency vendor contacts for all equipment (including microfilmers, specialty mailing and other equipment - not just computer hardware and software), descriptions and formats of all data files, and copies of all business forms used, along with the vendor contact for each.

Know where to get help:

Actively collect any additional names of service or equipment providers as you come across them.

Documentation:

The plan should be written down - remembering that if the core document is longer than 15-20 pages it will never be read or used, along with the various assignments, updated inventory, and all key phone numbers. Key personnel should have a copy of this documentation at home.

Review, Training, and Testing:

After completion, the plan needs to be reviewed with all employees on a regular basis. This does not have to be a lengthy procedure, and it offers a first-level "blink test" as to the reasonableness of the plan - as our client with the staff of single parents found out. Basic training also does not have to be time-consuming, although employees should at least know where the fire extinguishers are located and have seen a demo on how to use them.

More extensive training may be required in the event that there has not been enough cross-training to allow employees to replace a missing co-worker.

With respect to testing, a full-blown test of the plan may not be feasible, although moves, relocations, or unplanned shutdowns should be treated and evaluated as tests of your recovery ability.

To conclude, all of these activities can basically be characterized as "in-advance decision-making." Their cost is very little, yet they yield the immediate benefits of:

  • improving communications within the organization,
  • highlighting vulnerable points in the organization's operations,
  • ensuring that the organization has its best possible chances of surviving disaster.

Finally, the underlying philosophy in our approach to disaster recovery planning it is that you can get a lot done without a lot of expense, that you can benefit greatly by thinking through as much as possible beforehand, and that you should assign responsibilities and make management decisions now - rather than wait until you're in your parking lot, leaning against a fire engine in the middle of the night!

Getting Started     --    Disaster Recovery Planning    --    Preparing