Re: 18 months of trouble with Small Business Server

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Leythos wrote:
In article <4a45d057$0$191$e4fe514c@xxxxxxxxxxxxxx>, fulco@xxxxxxxxxxxxx says...
Thanks for your naive, unhelpful and unfounded comment Leythos!
I am an IT Professional!

In my opinion Compatible Hardware and Software should work without major problems.


There was nothing Naive or unhelpful about my comments.

Based on what I snipped, and what you posted, your problems were a clear sign of hardware issues that have NOTHING to do with SBS, but your subject claims 18 months of SBS problems.

What you've shown is the typical throw a system together and it should work because it's all just parts anyway attitude - this is a sign of a noob, only very new IT people fall into that trap.

I read your post, it shows that you bought hardware that was not tested by yourself before you installed it in production, that was most likely not bought from a large vendor that builds systems and ALSO tests them before making them available... This is another rookie mistake.

I strongly encourage you to work with a IT Professional for a while, to learn from them, and then you won't run into the types of problems you list.

I still find you comments unhelpful, they don’t give any hint about solving the remaining problems.
With naïve and unfounded I mean: you don’t have all the information. You don’t know anything about the troubleshooting routes taken. And what steps Intel and Adaptec have taken.

All hardware and SBS 2003 (and later SBS 2008) software are compatible: from the start every component has been tested by the manufactures with the Small Business Server.
When you look at the Intel support website for the hardware product (S5000PSL) you will find lists of Hardware and Software they tested to be compatible. These lists contain exact details, like the firmware version they tested and found compatible.
This isn’t a “typical thrown together system”. There even is a “wizard” on the Intel website that helps you compile a complete system. So except for the hard disks (but they are also found on the compatibility list) , this is a High Quality server. The components are all of a high standard. My distributer (Ingram Micro) assembled the hardware.

I only can conclude: even if a part is on the compatibility list, which contains only tested parts, there is no guaranty that there will be no problems.

Of course I tested the system before making it our production system.
I didn’t found an ISDN adapter which did work, so I gone for another option.
There were no other problems during this period.

The crashes where not reproducible.
Also there was long time between them (days to weeks).
Troubleshooting them took a long time.

Problem-1 the Crashes and Slow-writes due to the Barracuda firmware problems where difficult to pinpoint: the hardware was 100% compatible. My experience tells me that the first point to look at is: Human error (like switching of the system, without Shutdown, messing around with software, infecting the system with spyware and viruses) . Secondly: Software errors due to faults, missing patches, missing hot fixes, files becoming damaged). Thirdly: defect hardware.
So these points where tested, but the problems did remain.
To eliminate defect disks I replaced a few. Some were send to Seagate.
There were no BSOD (only system ‘hangs’, and due to Problem-2 restart failures, most of the time I found the system hanging trying a Network Boot), so no hints what caused the crash.
The ‘Slow Copy’ only happens if files larger than 4 Gb are written to the Barracuda. And of course this doesn’t happen every time. But we don’t have very many files lager than 4Gb. But if user 1 copies 2 Gb to the same disk User 2 copied 2 Gb in the lasts few minutes, then you get the problem. Also all workstations use Ethernet for Server access. My first thought if a users complains about slow copying is: network problems (cables, switch, drivers). So these have to be eliminated. And there were other non related network issue, We have Mac OS X system, XP Systems and Vista systems. There still are two network issues: If the Intel LAN drivers set is installed the network connection to some Macs becomes extremely slow. One of our XP systems (with Network on board) could only reach about a few Kb/s. This could only be solved with another network-card.
Then you know it has something to do with the amount copied, then the issue is reproducible on the server.
This whole process was done in collaboration with Intel.
Only after Intel wrote about necessary firmware updates, with a description of the ‘slow copying issue’ it became clear that this problem could be due to hardware design problem. But it toke about 3 firmware updates over a period of a year before al hard disks had compatible firmware. Intel (and Adaptec) changed there compatible list every time.

Problem-2 hard disks disappearing, al drives form one cage missing, failing restarts, interfered with the other issues. This problem also happened with long periods in-between crashes. And of course no BSOD.
To pinpoint this issue I switch HBA bough new HBAs, and moved RAID set between cages. Testing for weeks, with different combinations. But there was no pattern emerging. Almost every combination of 4 drive cages and 12 hard disks gave no info. Sometime it seemed a specific RAID set was the culprit, but then the crash happen on another set.
The Intel HBA are not the normal size: most cages fit in one or more 5.25” slot. The Intel cages only fit in their Chassis. So It is impossible to fit another vendors cage. They also include expanders. So you also need to fit a expander board if you use other vendors cages. The RAID controller didn’t have an external connection. So testing other HBAs is quite difficult, and would change a lot of components.
18 months after I contacted Intel the “Drives getting lost” issue seems solved with the new HBA firmware (Solution 2).
Again the RAID Controller manufactures changed their lists.
Who would though of drive cages losing disks by design (-fault)?

I switched to the Adaptec 5805 (compatible with S500PSL, SBS2008) because their hard disk list was much larger. So there was a greater choice of compatible disks.

Maybe the switch to SBS 2008 was unnecessary.
On the other hand a 64-bit OS with 32Gb gave us a lot more power.
As a side effect most crashed now had BSOD, with mini and memory dumps.

I didn’t except new issues. But they where there.
At first I installed SBS 2008 on spare disks, so the production could run.
So this was done in the evening and weekends.

It took me a while to pinpoint what caused the crashes: at that point in time problems 1 and 2 where not yet solved.
So I installed SBS 2008 several time. Making a log of every step taken.
After a while If found that new crashes where caused by the ‘Adaptec Agent”. This Service send messages, snmp-traps to Windows.
I tested other JAVA versions, and other C++ libraries used by the Adaptec software.
End 2008 I contacted Adaptec for support.
To be sure I did some additional installations (clean): which resulted in exactly the same crash!
And you never guess: we got a new problems.
Adaptec introduced Power Management: Spinning down the disks used for backup or archive purposes’.
Not!
Some firmware/driver combinations worked, other not.
Replacing the 5805 seems to solve the failure of Power Management.
Still remaining is the crash that happens if the Adaptec Agent is enabled and a wrong time-zone in the management software.

Why did I call this post: 18 months of trouble with Small Business Server?

In my opinion Hardware and Software manufactures must do a lot more testing, before they use the term ‘Compatible’. The Vista-compatible legal case is other example.
I also find there need to be legal rules about quality, support, compatibility and compensation. Now they blame each other’s product, and refuse support.
They use Users, who paid a lot of money for their products, for trouble shooting.
MS needs, this is my opinion, to spend more time on making their products stable, rather than creating more operating systems.
In this day and age a crash without any information should be impossible. There kernel security is still not watertight, Again Intel, Adaptec and Microsoft all state this Server system is Compatible, and is tested by them.
And my final remark: take an Error from the System/Application or other Windows-Event-Log, use Google: every Error has many many links! What does this say about the quality of the software?
If we were talking about cars, there all where gone bust a long time ago.




.



Relevant Pages

  • Re: I think I wil wait on sp2
    ... If an item, hardware ... works under the non-sp2 XP setup the sp install has NO ... Windows is a Microsoft product. ... >> research co check compatibility of YOUR system. ...
    (microsoft.public.windowsxp.general)
  • Re: Replace OEM tuner or add better one
    ... I'm starting to wonder if it's possible that the embedded Intel Graphics ... compatibility prob, if I have one. ... > card encoder-DVD Decoder-Graphics card. ...
    (microsoft.public.windows.mediacenter)
  • Re: Python 2.6.1 urllib error on Mac os x PPC
    ... Presumably, if your original test failed, this snippet should fail, ... it fails on PPC and runs without errors on Intel: ... /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current ...
    (comp.lang.python)
  • Re: floppy drive not working
    ... was new hardware installed and it says in Device Manager that the ... so they are mentioned by manufacturer. ... Wait let me check out my set-up here on my Dell 8300, ... I tried to open the HCL compatibility ...
    (microsoft.public.windowsxp.setup_deployment)
  • Re: I think I wil wait on sp2
    ... However the hardware and much of the software usually is not Microsoft ... > software for SP-2 compatibility issues: ... NOW, 6 months later, I install SP2 and, ...
    (microsoft.public.windowsxp.general)