Re: Fault Tolerence on SBS2003 Prem.

From: Jeff Middleton [SBS-MVP] (jeff_at_cfisolutions.com)
Date: 05/19/04


Date: Tue, 18 May 2004 19:05:04 -0500

You may want to look over these two older posts:

http://www.google.com/groups?selm=eTRmP6frBHA.2088%40tkmsftngp03&output=gplain

http://www.google.com/groups?selm=edEM3sjhAHA.1728%40tkmsftngp02&output=gplain

Admittedly, there's some overlap in logic and text from what I did the other
day, but there's some exploration of the idea that adding more servers
doesn't improve uptime or cost management. Some of the concepts a dated in
NT domain discussion, but the logic follows pretty well.

"Jeff L" <newsgroupsremoveandunderscore_jeff@availabletech.net> wrote in
message news:OYNpEITPEHA.540@TK2MSFTNGP11.phx.gbl...
> Dude (aka. Jeff Middleton) wrote a book. Very informative.
>
> Anyonyone ever tried using Application Center 2000 to cluster. I
understand
> the SBS2003 cannot be clustered on its own and that you cannot run two SBS
> servers on the same network but this is a cloning situation and I wouldn't
> rule it out until I had tried it or some one knew it wouldn't for sure..
>
> I know it is just two expensive but I would still like to know if it would
> work. Perhaps we could convice Microsoft to bring the cost down a little.
>
> Regards,
> Jeff Loucks
> Available Technology ®
> Solutions For Professionals ®
> www.availabletechnology.com
>
>
> "Jeff Middleton [SBS-MVP]" <jeff@cfisolutions.com> wrote in message
> news:elfzLmBPEHA.2960@TK2MSFTNGP10.phx.gbl...
> > Too often this topic is approached without defining any scale or costs.
It
> > leads to some interesting debate, but not nearly as much useful
strategic
> > information for a practical decision. Fault Tolerance and Disaster
> Recovery
> > are important topics, no doubt, but they are topics that need to be
mapped
> > on a scale of cost management, business priorities, and technical
resource
> > realities.
> >
> > The thing I noticed in your post, Chris, before I started, you are an MS
> > partner, therefore you are not likely asking this question for your own
> > consumption, rather for a strategic policy or plan for your own
customers,
> > right?
> >
> > There are no absolute answers when it comes to spending other people's
> > money, or managing other people's risks if you don't bother to find out
> what
> > THEY think about it. Therefore, I would recommend a really practical set
> of
> > questions you should ask your customers in order for you to design an FT
> and
> > DR plan that suits their needs. The key point that is so often missed is
> > that most really small businesses are not actually going to prefer to
pay
> a
> > contractor $12000 a year to avoid 1 lost day of work for the business.
I'm
> > not saying that it's never going to be the case, I'm offering that for
> most
> > small businesses, spending money on low probability risk protections
isn't
> > necessarily better than accepting the possibility of an unforeseen "snow
> > day" in which the network is down due to a failure.
> >
> > Of course, prevent any preventable condition that is easily cost
> justified,
> > but going to the extremes of labor intensive preventative steps isn't
the
> > best answer. There is always a compromise involved in any business
> > decisision, and it's the IT consultants job to include the owner of the
> > business in arriving at a suitable answer.
> >
> > So, let's take a look a list of questions you could ask the customer:
> >
> > 1. Hypothetically, if a technical failure were likely to occur once in
> each
> > year that caused the total loss of function for your business for a
period
> > of 4 hrs, how much would you be willing to spend in costs paid out
monthly
> > to make that time be reduced to 1 hr?
> > $200/mo?
> > $500/mo?
> > $1000/mo?
> > 2. Same question, but what if the cost were a one-time expense when you
> > bought your new server, essentially something that you could spend on
the
> > server equipment that improved the recovery time from an annual event
from
> > 4hrs to 1hr?
> > $200/mo?
> > $500/mo?
> > $1000/mo?
> > 3. Same question as the first one, but this time, what if the risk was a
> > failure that would cause you to lose an entire day of work, and the
> > improvement was only reducing the downtime from the full day to a 4hr
> > downtime?
> > $200/mo?
> > $500/mo?
> > $1000/mo?
> >
> > Next set of questions, this time on data loss and loss of operations:
> >
> > 1. If your company suffered an unpreventable incident that cause your
> server
> > to crash at some point during the day, and a recovery of the system
> required
> > a choice between longer downtime to recover data changes that day, how
> would
> > you prioritize the following:
> >
> > 4 hrs into the day's work, the crash occurs and in order to recover the
> > first 4hrs of data changes, you must keep the server down for 4 hrs., or
> > forfeit that technical data recovery process by returning to the start
of
> > day condition within an hour and having your staff reconstruct the data
by
> > re-entry? Would you prefer to miss the rest of the day's work, or give
up
> > the data changes recovery?
> >
> > Same scenario, but what if the crash implied loss of 1 week of data, but
> > required 1 day of technical work?
> >
> > What is the maximum number of hours or days of work you would feel
> > comfortable in your company's ability for reconstructing if a technical
> > recovery was cost prohibitive or simply unavailable?
> >
> > 2. List the types of information that you business maintains as data you
> > expect to keep stored on your server, and assign a value on a scale of
1 -
> 5
> > (5 is critical) what your priority would be in recovering information if
> the
> > cost was excessively high, but unavoidable:
> >
> > Email
> > Electronic Faxes
> > Word, Excel, Powerpoint files
> > Accounting/Line of business application based data
> > Technical or creative business records (Autocad files, scanned
documents,
> > graphic design)
> > Contact lists, electronic calendar schedules
> > Legal documents
> > Records with Federal/State/Local requirements to maintain
> > Files and Data you possess which represent your own customer's
> > investment/expense to create
> > Photographs, Digital Video
> >
> > 3. If it were possible to keep your business operating in some partial
> sense
> > for a day, 2 days, 3 days or a week, identify from the list you created
> just
> > above, how long your business could operate without the ability to
update
> or
> > use each of those items, but assuming that you would regain use without
> loss
> > of the historical information, just the time delay?
> >
> >
> > From these questions, and others like them, you should be able to
develop
> a
> > profile of the customer's needs that helps you to understand what costs
> and
> > tradeoffs for downtime they are willing to choose....if they have the
> option
> > to make the choice.
> >
> > Many small businesses will prefer to take a chance on missing a day of
> work
> > if it saves them $5000 per year simply because many small businesses
> operate
> > on a basis where a delay of a day in work isn't that expensive to them.
> Not
> > many businesses would prefer to miss a day or a week of work, but that's
> not
> > the question here. The question for the owner is if you must pay in
> > advance...forever losing that money invested for a risk with only a low
> > probability of impact...would you simply pay or would you take your
> chances?
> >
> > At some point, this translates back to risk aversion and return on
> > investment information that helps decide the budgets and expectations
this
> > owner has.
> >
> > The IT consultant's job is to then translate the options back to
available
> > technology and strategic planning initiatives.
> >
> > Clearly, most IT consultants should make some basic decisions going in
> that
> > are just part of a baseline assumption if at all possible:
> > - a UPS on the server
> > - a regular system and data backup
> > - hardware that can be maintained by a vendor who will be around in a
year
> > or more
> > - a reasonably standard installation that can be recreated and repaired
> >
> > But when it comes to the FT and DR issues, most of the issues are going
to
> > be measured as
> > - downtime for maintenance
> > - invested cost of equipment which provides no added value, only FT/DR
> > functionality
> > - routine fee costs of preventative maintenance
> >
> > ...that vs.
> > - cost of critical response
> > - response time for a loss of operations event
> > - downtime for recovery
> > - data loss tradeoffs for technical recovery
> > - unavoidable data loss due to a "window of time" during which there is
no
> > data protection in-place
> > - emergency expedite cost for equipment replacement vs. stocking spare
> > parts.
> > - unforeseen downtime that an FT/DR plan doesn't address
> >
> > When you present all this information, if the FT/DR plan calls for
taking
> a
> > customer from a $3000 server up to needing another $3000 server, plus
> > another $3000 worth of other hardware and software, plus $12,000/yr in
> > preventative labor....you might find the owner just doesn't see this as
a
> > great idea to invest in so that you have a "nothing can go wrong plan"
> which
> > in fact, isn't really a fact anyway. You know, if the power goes off,
even
> > if you have that backup generator in the yard to run the server, if you
> > can't run the workstations, telephones and air
> conditioning/heat....chances
> > are the owner is sending the staff home anyway.
> >
> > If the DR plan calls for rebuilding a server in 1hr by spending
$15000/yr
> in
> > prep work for that event, could just be the owner would rather take a
day
> to
> > go golfing and let the staff go home while you do a $2000 repair day on
> the
> > server. The owner might even forfeit the previous day's work rather than
> > paying $3000 for more stuff or services. You don't know if you don't
ask.
> >
> > And in the final level of details, the ones that other's posted thoughts
> on
> > this thread with, there are very many good practical steps you can take
to
> > improve FT/DR that include better hardware to begin with, reliable
backup
> > operations, and strategic DR snapshots with drive images.
> >
> > As a rule of thumb, sort of arbitrary, but I start with getting a
> validation
> > from the owner that most of my customers are able to survive a 4hr
> downtime,
> > unless they identify why that's not the case in fact. A single server
> > environment with a contractor as IT support should generally be able to
> > address a four hour recovery in most situations, and the server should
be
> > designed with that thought in mind. However, if you look at the nightly
> > back, you may well realize that if it takes 6 hrs to restore from tape,
a
> 4
> > hr recovery may be hard to hit, right?
> >
> > In this case, having a second server and splitting the roles of the
> servers
> > is probably the most likely way to cut the risks in half, or at least,
> split
> > the risk by improving survival of some more critical operations. I
rarely
> > find that having a duplicate server sitting cold at a customer office is
> > more valuable than having that second server operating in a valuable
role,
> > but that's not an absolute situation. In offices where I have 4 or more
> > servers, I usually do have a strategic plan for switching roles of
> servers,
> > or bringing in a suitable alternate package of hardware as needed.
> >
> > In the long run, the single most valuable skill an IT person can have in
a
> > DR role is experience in rebuilding an installation on different
hardware,
> > and the experience to know how long that will take them given a specific
> set
> > of tools. Identify what those tools are for your experience and
technical
> > level, then practice with them. Quote your customer based upon this
> > experience and set of tools. For instance, it's my baseline preference
to
> > have the following available to me at every customer server site:
> >
> > - FT drives, preferably RAID5 because a drive failure still isn't an
> > emergency if there's a hot spare, and a mirror is likely to cause a boot
> > failure even if the system/data is still preserved
> > - nightly backups to removeable media such as tapes
> > - UPS on the server
> > - Server is not used as a workstation or by local logons
> > - A system partition drive image has been prepared at some point in the
> last
> > yr., or during the last lift of major Service Pack update level.
> > - The server is running AV on the local system
> > - The server is running a backup program that provide job by job logging
> > history, not just last job
> > - I have either at my office or at the customer's location, another
> computer
> > which is reasonably suitable to load that disk image as needed.
> > - I have either at my office or at the customer's location, another
drive
> > suitable to boot that drive image.
> >
> > The last two items in the list deserve a little more detail.
> >
> > Many people do not know or have the technical skill to reliably or
> > consistently implement a server recovery on different server hardware,
or
> > even from a different set of boot drive hardware. For instance, I
prepare
> > all my servers to boot from a drive image either on the native
production
> > boot controller (typically a SCSI RAID) or in addition, from the onboard
> > EIDE controller. It's pretty simple to make this happen. Once you finish
> > installing the server OS, or at any time in the future, if you simply
plug
> > in an EIDE drive and then perform a complete boot cycle and shutdown,
you
> > will probably now be able to install a drive image of the RAID as a
> restore
> > onto EIDE drive in that same computer and boot from the EIDE with not
> > additional steps. (This assumes that the SCSI subsystem drives are not
> > attached at the time, otherwise you do need to indicate boot preference
> for
> > one of the two bootable subsystems)
> >
> > Furthermore, booting a drive image on different server hardware
(different
> > motherboard) isn't really that complicated either. In fact, I think it's
a
> > good idea for an IT consultant to be aware of what similar and
dissimilar
> > server hardware they handle that is compatible to transfer and boot
from.
> As
> > a general rule, if you have the same boot controller (SCSI card or PCI
> EIDE
> > or SATA card), you probably can boot most any dual processor based
> > motherboard with the same Windows install provided the motherboard is
> recent
> > within the last 4 yrs and therefore ACPI compliant. Similar comment for
> > single CPU P4 is going to boot from a Windows Dual CPU install. Note
that
> if
> > the motherboard is dual CPU socket, it's not relevant that you have only
a
> > single CPU, it's a Dual CPU motherboard.
> >
> > The point at which you realize that most of the over-zealous FT/DR plans
> are
> > just overkill is when you see the look on a customer's face when you
find
> > any means to get them back up and running quickly, using whatever
> resources
> > you have available.....and they are not paying a fortune for those
> > resources. Most customers really don't care if you use a PIII
workstation
> > running on EIDE as a temporary workaround to having their production
Dual
>
> > Xeon server with RAID5 go down by a lightning strike, or burst
waterpipe,
> or
> > fire in the server closet. All that matters is that you can get them
back
> up
> > and running on something.
> >
> > Building a technical nerve center that involves multiple servers adding
to
> > the upgrade costs, the maintenance costs, and the purchase costs isn't
> > really the right answer for most businesses. The right answer is letting
> > them have a way to contribute to determining the costs to run their
> > business, to participate in the risk analysis, and to have an IT
> contractor
> > who is both competent technically, and has a sense of business reality
as
> > well.
> >
> >
> >
> > "Chris White - Stirling" <cw@NOSPAMstirtech.co.uk> wrote in message
> > news:da6201c43b76$30e6fa80$a401280a@phx.gbl...
> > > Hi,
> > >
> > > I had my questions sort of answered in my last post a few
> > > days ago. Anyway i wondered what is the best advice for
> > > fault tolerence on SBS2003 Premium?
> > >
> > > Cheers.
> > >
> > > Regards
> > >
> > > Chris White
> > > Stirling Technical Engineering Ltd.
> > > Microsoft Partners
> > >
> > >
> >
> >
>
>



Relevant Pages

  • Re: Fault Tolerence on SBS2003 Prem.
    ... Too often this topic is approached without defining any scale or costs. ... Fault Tolerance and Disaster Recovery ... contractor $12000 a year to avoid 1 lost day of work for the business. ... server equipment that improved the recovery time from an annual event from ...
    (microsoft.public.windows.server.sbs)
  • Re: why have a server?...advice needed
    ... responsible mainly for the SG&A costs (the other buyers will be overseeing ... Virtually none of the business ... But without a server, this gets very complicated. ... why not use an online backup tool for each computer (e.g. ...
    (microsoft.public.windows.server.general)
  • Re: Fault Tolerence on SBS2003 Prem.
    ... Perhaps we could convice Microsoft to bring the cost down a little. ... There is always a compromise involved in any business ... > server equipment that improved the recovery time from an annual event from ...
    (microsoft.public.windows.server.sbs)
  • Re: SBS2003 Partitioning
    ... It doesn't matter how small the business is it is a bad practice to sell a workstation for use as a server. ... Businesses come to rely on a server very quickly and within a matter of months may no longer be able to function effectively if the server goes down. ... There is no real advantage to moving the page file from the system partition in a small server with one drive/array, and yes it's a good idea to keep Exchange and SQL on separate partitions but given the size of drive you intend on using probably two partitions, at most three, is your best option. ...
    (microsoft.public.windows.server.sbs)
  • Re: Fault Tolerence on SBS2003 Prem.
    ... > Too often this topic is approached without defining any scale or costs. ... There is always a compromise involved in any business ... > server equipment that improved the recovery time from an annual event from ...
    (microsoft.public.windows.server.sbs)