Business continuity plan
Once the risk and threat and business impact analysis (BIA) are performed, the next step is the business continuity plan. A business continuity plan is a roadmap that defines the processes and procedures that protect operational continuity and the integrity of information and the systems that provide that information. The goal of a good plan is disaster prevention, a tall order in today’s environment but can be done given enough money or creative genius.
Similar terms that are used in lieu of a business continuity plan include business resumption plan, continuity plan, contingency plan, disaster recovery plan and recovery plan.
IT is the technical arm of enforcement for the business continuity plan. The plan intends to protect systems, applications, data, and infrastructures, most of which fall under the IT umbrella. Therefore IT must play a substantial role in plan development, and in fact, in many companies IT is wholly responsible for the plan.
While several people and departments participate in the plan, ultimately, the Disaster Recovery Planner (DRP) is responsible for it (i.e., the planning, provisioning, documentation, and testing). The DRP must also provide the required training and socialize the business continuity procedures. DRPs must sometimes be clever in how they accomplish their goals. When there are few funds to accomplish the required goals, the DRP might consider facilities and technologies that can be used for business continuity as well as other applications (i.e., dual-use facilities). For example, an organization might use a corporate training location as a user recovery facility or use auditoriums, cafeterias, and warehouses as alternate data center locations. A company could also partner with a non-competitive company whose systems mirror their own to provide reciprocal services.
A business continuity plan outlines how an organization’s assets are protected (e.g., the systems environment, the data, the users, and the network).
Testing The Business Continuity Plan
Once a plan has been designed and implemented, one must test it to see if it achieves the desired outcome. Of course, the best time to test a plan is during a disaster, but we hope that that never happens.
Testing should be performed regularly and under different scenarios to prove the plan’s effectiveness. An individual other than the plan developer should create a scenario and call a spontaneous test (without disrupting normal operation). Examples the various testing methods are below.
First, we must examine the testing objectives. The objectives of any test are to validate that which is theory and that which is practical. By testing the plan (theory) we can validate the processes within to see if they will actually work (practicum). If they do not, we must keep modifying and testing the plan until it works. When testing, always push the plan to its limits to figure out what it can do under stress. Finally, test the plan regularly. Information systems, networks, and network use change constantly.
There are several testing methods that can be used independently or together to test the plan under various conditions. Some tests do not disturb normal operations, while others can only be performed when the user community is idle/offline. Testing should involve all critical team members so that everyone knows their responsibilities.
- Cycle testing: Cycle testing does not disturb normal operations, and it comes very close to completely testing the plan. Cycle testing uses multiple methods (of those described below) except for interruptive testing. It runs the plan through the entire cycle of testing as opposed to individual tests. Cycle testing is the most complex form of testing except for full interruption testing.
- Checklist testing: The DRP team uses a checklist to test the plan in a recovery scenario. An individual not on the DRP team should create the scenario. The DRP checklist defines the steps to take upon disaster declaration. If the DRP team finds an issue not addressed in the plan, it is incorporated appropriately. This method is not disruptive to operations.
- Walk-through testing: This method is most commonly used in conjunction with checklist testing. A scenario is defined, and team members move through the checklist and walk through the motions without interfering with normal operations. Each team member has specific responsibilities as defined in his/her checklist.
- Simulation testing: A knowledgeable person creates a secret fictional disaster scenario. This exercise should be as realistic as possible. Realistic does not mean that the team does not know this is a drill; after all you are testing the plan, not the people. The team receives the scenario, declares the disaster, and begins the checklist process.
- Parallel testing: Parallel testing is the simple process of engaging multiple test methods (e.g., walk-through, checklist, and simulation). By combining multiple test methods, we can get a better evaluation of the plan.
- Full interruption testing: This method is the most disruptive but the most revealing. Full interruption testing (aka offline testing) suspends production operations for a period of time. The process might go as follows: The DRP declares a fictional disaster. Using the checklist, the DRP team invokes recovery operations using the actual systems that would be involved. This means that we restore from our backup systems and possibly use hot and cold sites. These events usually occur when user services are slow/idle (e.g., holidays).
Declaring and Recovering from Disaster
Before disaster recovery can occur we must assess the scope of the disaster. Has a disaster occurred or is it something else? What is the extent of the event? Who is affected? The disaster recovery plan should include these questions as the prelude to recovery. Once we determine the above, a formal declaration is made, which puts the recovery plan into motion. The plan executes data recovery strategies to get operations back to normal (relatively speaking) as quickly as possible.
|<mp3>http://podcast.hill-vt.com/podsnacks/2007q2/bcp.mp3%7Cdownload</mp3> | Business continuity planning (BCP)|
|<mp3>http://podcast.hill-vt.com/podsnacks/2007q1/rpo-rto.mp3%7Cdownload</mp3> | RPO versus RTO|