Skip navigation
1 4 5 6 7 8 Previous Next

Adventures in Linux

120 posts

Upgrading Evolution

Posted by Steve Carl Jun 20, 2007
Share This:

Every Evolution upgrade ends up as more or less starting over

 

Quick level set:

 

  • BMC uses MS Exchange 2003 for email.

     

  • I use Linux as a primary desktop at the office.

     

If you use Linux as your professional workstation in an MS Windows infra-structured world, and you have to do any cross calendaring with others in your shop not on Linux desktops you probably use Evolution.

 

You might use Korganizer, and its MS Exchange download capabilities, but more than likely it's lack of maturity relative to Evolution has left you back on Evolution, or just using the web interface. I make this statement not because I dislike KDE's Koffice toolset, but because the calendaring application Korganizer inside "Kontact" has had the same silly message about calendaring "uploads being experimental" for years now. And the time zone thing still doesn't work right. This feature is clearly getting no cycles.

 

I love Korganizer, especially when I want to sync to my PalmOS powered Sony SJ20. Working with MS Exchange is clearly not where they (the KDE folks) are headed with that. MS doesn't want them to, and there are other open email / calendaring projects that do want their support. It's easier to go where you are loved.

 

I admit I do get tired of constantly dealing with MS's undocumented interfaces. I can only imagine how the folks on the Evolution Connector or Samba projects feel about it. They do a good job in a hostile work environment, so I do not want anything I say here to take away from that. As long as Evolution works, I won't have to use OWA (Outlook Web Access) for calendaring. That is a "Good Thing" (tm).

 

The Evolution project clearly keep on working on the Exchange Connector despite any roadblocks. Back in Evolution 2.6 I was having severe problems with public folders. So far in 2.8 and now 2.10, that has not been an issue. So, something that worked was broken and then later fixed: a good sign of life.

 

What has been an issue is that every single time I change the Evolution release, I have to essentially start over.

 

Starting Over

 

Upgrades should work. In this day and age we know things won't stay the same, but that if they change, we expect the software to correctly deal with that. Standard Operating Procedure for upgrades of software is that you expect that if it is only a one version bump that the software will "Do the Right thing" as it pertains to configuration files. Ditto file formats. It doesn't matter if they did or did not change between any two given releases: the expectation is that software is upwardly compatible *as long as it is only one major release*. Example: When I installed Parallels 3.0 on my Apple, and fired it up for the first time, it looked at the disk image of the VM I wanted to boot and noted that it was in the old format. It further warned me that it would convert it, but that it was one-way: I was committed to 3.0 for that VM from then on. Then is asked if I wanted to proceed. I did, it whirred for a bit, then booted the OS without issue in the new disk format.

 

I know this being upward compatible thing is not a rule, written down anywhere, or cast in stone, or written in 3d holograms upon the night sky or anything. It just is the way it normally is, and it is only common sense. You don't want your end users all mad at you because they are having to go through major upgrade pains every time you rev your software. Look at how often Open Sources rev's! Talk about never getting anything done...

 

In September of 2006, I was working with 2.6.1 of Evolution. Shortly after than, 2.6.3. October brought in 2.8.1. I noted in the "Masher" series on of the reasons I did the Mashing was to get to the Ubuntu 7.04 code base ASAP, and it's Evolution 2.10.1 version.

 

Every single time I upgraded: be it inside the same Distro (Ubuntu 6.10 to Ubuntu 7.04), or on a different Distro (Fedora 6 to Mint 3) I had to start completely over on my Evolution config. Sometimes it would run for a while, and trick me, but sooner or later it would get unstable, or one thing or another would just refuse to work. This last time I was subscribed to a public calendar off MS Exchange, and after an upgrade I could not see the public calendar anymore. I could not delete it or re-add it. Nothing. Totally stuck in limbo. At the same time I was upgrading a different box, and it had the reverse issue: It could notsee my personal in-box, but could see the public calendar just fine.

 

Brain Drain

 

To fix this is always the same. I dislike re-booting to fix problems. I find that to be very MS Windowsy, cntrl-alt-delete thinking. Linux is better than that. After you see the script I run, you may not think so though:

 

echo "Here is the evo stuff still running"
# Level set what is running
ps -ef | grep -i evolution
ps -ef | grep -i bonobo
# Not everything dies the first try. Loop three times just to be sure. Cheap insurance.
for x in 1 2 3
do
  echo $x
  echo "Killing Evo with gconftool"
  gconftool-2 --shutdown
  echo "Kill Evo with evo-force"
  evolution --force-shutdown
  echo "killing spamd"
  pkill spamd
  echo "Killing Bonbobo, since it likes to hang around"
  pkill -9 bonobo
  echo "Killing Bonbobo AS, since it likes to hang around"
  pkill -9 bonobo-activation-server
  echo "syncing disks"
# Old school... but still needs to be done. Linux still caches disk writes.
  sync
  sleep 1
done
echo "Evo should be dead, Jim"
echo "If anything shows up after this line, somethings bad. Manual killing in order."
# Bonobo and Evolution cache account info in memory. They have to be gone before the config files are moved out of the way.
ps -ef | grep -i evolution
ps -ef | grep -i bonobo

 

 

cd
echo "Wiping gnome and evolution files from existence"
# I keep mailing lists like KDE and Evolution archived here, so I can get rid of the directory. I move it and copy things back later.
mv  ~/.evolution ~/.evolution.old
rm -Rf ~/.gconf/apps/evolution
sync ; sync ; sync

Ugly. Brutish. Hammer time.

 

Now I restart Evolution, and instead of a password prompt, I get a new account set up. Going through all the usual screens, I get it all going again, re tweak all the bazillion things I change. Things like:

 

  1. Setting up HTML mail (I know, I know, but it is popular here)

     

  2. Setting attachments to inline (Seems to work better w/ Outlook folks)

     

  3. Setting new Sig.

     

  4. Make sure spell check is set. With Ubuntu and friends installing Evolution does not get you spell checking automatically. Still have to install "gnome-spell", and the right "aspell" dictionaries.

     

  5. Reset which GAL server it uses. It picks one that starts with 'A'. Looks like maybe is sees an alpha sorted list and just picked the first one. I need the one in Houston though.

     

  6. Setting GAL queries to 50 rather than 500: seems to run faster.

     

  7. Setting reply above

     

  8. Turn off all the Groupwise plugins: I doubt they do anything against MS Exchange.

     

  9. Have it check mail every 3 minutes: I am not really sure this setting has meaning under connector.

     

  10. Turn off Junk filtering. We do that out there in the SMTP infrastructure. Nice to have for a small company, but then, a small company would be running MS Exchange more than likely.

     

Test Evolution. If it looks good, log out of Evolution, and then go over to .evolution.old/mail/local with either Konqueror or Nautilus, and copy the archived mail folders without the meta or indexes back to the newly created  .evolution/mail/local/. Firing up Evolution, and clicking on the folders in the "On this Computer" location rebuilds the indexes and other Meta, and things are back as they should be.

 

Being Good at Something I Should Not Be

 

I do this so much that I am very fast at it. I should not have to be. Hopefully the reason why that is should be obvious, but just in case: if I was a new person to Linux and I was told that this was upgrade protocol for this application, the likelihood would be that they would say "Yeah: right... So, how do I install Outlook under Linux?" (With Codeweavers Crossover, but that isn't the point).

 

Once I do this, I am usually good to go till the next OS upgrade. I only get a couple hundred emails a day though, and I keep them triaged so that MS Exchange does not have huge folders on it. Whurley isn't so lucky. Everyone beats his email inbox up so badly that Evolution can't keep up with the MS Exchange server folder size, and just abends. Yuch.

 

It is this kind of thing, more than any other, that keeps Linux from wider adoption. The MS folks are in your workplace. They have their stuff all over the place. It does not play nice with others. Tell your MS folks that, and they will reply that it works fine from their box. Unless you work at a place where they are willing to pull out the email plumbing and hook up something standards compliant, this is the game that is in town. I (like many who would read such column as this) know how to work around such things, but the average user will not even want to learn why this is.

 

They will just want something that "works". And they don't appear to mind rebooting a great deal to make it "work". Easier than root cause analysis. This is one of the great computer divides. Those who who would rather reboot than switch. And those that would rather work-around or fix than reboot.

 

Evolution, and Linux in general is getting better all the time. When this gets Evolution upgrade problem as well as the folder scale problem whurley is having are fixed, a real barrier to Linux on the desktop adoption will have fallen.

Real-World-Virtualization

Posted by Steve Carl Jun 15, 2007
Share This:

A love story

 

Studies and consultants and FUD, oh my!. You do not have to be in the computer industry for long before you realize that at least 90% of the job is separating fact from fiction. When Microsoft funds a study, and the results come out in Microsofts favor, no one is surprised. Calling it "Get the Facts" lends a sort of perverse humor to it.

 

The bad thing about this is, the result might even be real, or have a grain of truth in it, but there just is no way to tell because of the way it was funded.

 

Some survey results are the same way (it is actually very hard to design a valid survey): my recent favorite was combo burrito survey / sales call:

“Sir, We are survey professional IT persons such as yourself today to see what the state of the art is in IT Thingies. Are you interested more efficiency and saving money on your IT Thingies.”
“Yes: I am always interested in that.”
“Have you ever heard of product X, or product Y, or Product Z?”
“Yes: I know about those products. I have tested them. We use product Z here.”
“Taking the first one, product X, do you have any of it installed right now?”
“No: not installed now. We tested it. We did not like it.”
“Well, then are you interested in product X's ability to save you time and money?”
“No” At this point I have given up. Clearly not listening to my answers. Pause on phone.
“Were you aware that product X will reduce your costs and have positive ROI in the first year?”
“I am aware of the product, yes.”
“So you are interested in product X”. Obvious relief from poor cold calling sale person.
“No.”
“You do know that product X saves you all this time and money.”
“No”
“Oh, so you want to find out more about this? As part of this survey we are able to send you information about the products.”
“No." Pause. Not sure they are confused, or just waiting for me to say more. I wait too.
“So... you don't want to save money on IT thingies?” Incredulous: Tone clearly indicates I have no business being in IT.
“No: I do want to save money on IT thingies”
“But.... you don't want to know how to do it with product X...”
“Yes: I don't want to do it with product X. We have product Z, as I mentioned earlier. We looked at X, we went with Z.”
“You already know about product X?”
“Yes”
“Then... you know it will save you all this money...”
“No.”

 

 

On and on. Product Y and Z are never again mentioned.

 

Is Virtualization Real?

What does that intro have to do with today's topic? There has been enough hype and hot air over the benefits of Virtualization to float the Titanic, and then some. I am a big fan of virtualization: I started my career working on the worlds best hardware virtualization platform, VM (now called z/VM). Been working with virtualization at one level of another nearly thirty years. I say all this so you understand my bias. As much as I like the technology, virtualization is not for every one, every thing, or every situation.

 

Three things roll into today's post:

 

  1. I talked a bit about this issue when wrote a entry a while back called "". One of the main points of that post was to point out that not everything under the sun is a candidate for the virtual world. Quick example; Benchmarking is very hard to do in a virtual world. Not impossible. Just hard.

     

  2. In my second to last post, "NAS Redeaux: Wrapup" I mentioned that one of our favorite VMware servers was the Sun X4600.

     

  3. I have mentioned here from time to time just how many different computers we have here for R&D Support (over 2600), and how some of the computers are older than some of the people reading this entry.

     

Pulling together those threads here, I can talk about a real world example of where virtualization saves us (BMC) serious time and money. Better, it saves resources, and improves our ability to support our customers.

 

Oldie Moldie

First: About that old hardware: We have hardware stretching back to the 1980's, and we have piles hardware in order to support all the permutations of having over 600 software products. The possible computer environment permutations runs to the millions, but we of course don't have that many computers. We do rapid provisioning a great deal, re-purposing some gear daily. Sometimes though, a computers time has come. You can tell it by the doomed look in their eyes, the stoop to their shoulders. They way their 1X CD-ROM won't quite close anymore. It's over, and they know it. When we are done with a computer, we have to pay to have them hauled away. One set of computers that we have fully gotten our money out of is from the early and mid 1990's. Originally these computers were people in R&D desktop systems. Then, as new systems were bought, these were migrated into the data center to be test systems. Time passed. Some failed, and were mined for parts for others. System configs were maxed out wherever possible, so that these 100 - 233 Mhz Pentium chipped units were crammed to the brim with 512 MB of RAM. Four 128 MB sticks of RAM. Remember when desktops had four or more RAM slots? Remember RAM stackers? Hey, in 1993, we would have killed for a computer that big. Now take into account that BMC has roughly 7000 employees, do various bits of arcane math, and you'll see we have a lot of this class of computer.

 

Each of these old computers has an old style, non-switching, inefficient power supply. The 10/100 Ethernet card is not integrated into the motherboard and is full height. The processors require five volts to run, and use a fair number of watts. The hard drives are the old “half” height which is still over an inch tall, six or eight gigabyte capacities, have many platters, and older style motors that never spin down to save energy. They have been running so long that if they get powered off they sometimes can't power back on, or literally require being whacked on the side to free the heads from the lubricant that has built up over time in the landing zone.

 

These old things were / are useful to us as synthetic workload generation. Each of them can pretend to be 10 virtual people, and in concert will rack after rack of similar computers we can build up a workload of hundreds or even thousands of virtual people hitting servers to test our code on. It takes ten of them to fill a Gig-E pipe, but we have literally hundreds. We also have some really old OS's running on them, including one that had Redhat 5.2 until very recently. They are cheap to build test clusters with, and this was one place we had a practice cluster for Solaris X86.

 

We can place about 25 of the deskside tower form factor PC's on data center technical furniture, stacked in three vertical rows. There is one KVM on the 2nd tier to allow console access to all the computers. Each of the Ergotron 3000 desk style technical furniture fixtures is six feet long, and three feet deep. 18 square feet for every 25 of these old-style PC's. By today's standards they use a lot of power per CPU, but because the form factor is so large, we literally could not put them close enough together to create a hot spot in our data center. The data center has 18 inch raised floor, 250 PSI rated. It was built in 1993, and the cooling designed to dissipate 55 watts a square foot. Sigh. That seemed like a lot back then.

 

Sun X4600 as VMware server

Now lets look at the Sun X4600. We get the 64 GB units. An X4600 at the time of this writing can get up to 256 GB, but only Solaris X86 or Linux can address that much memory on this platform. 64 GB is the max VMware hypervisor currently handles. Ours have 16 processors. With quad core, this can go to 32 processors. Or real world experience in tuning is that without the ability to add more memory we can not really effectively use more than 16 processors with our workloads: We aren't calculating Pi or finding then next biggest prime or anything.

 

VMware 3.0.1 is currently limited to about 128 Virtual Processors, and this has mapped back to needing about 16 real ones so far. It really depends on your workload, and how CPU intensive it is though. As they say: Your Mileage May Vary (YMMV) 16 processors works for us right now.

 

Power supplies on the X4600 are 4 in number, and either 850 watt at 83% efficiency, or 950 watt at 89% efficiency. Ours are the 850's. We have two of these machines, in a VMware HA config. Each X4600 server runs 60-75 guests. We use P2V (Physical to Virtual) tools to migrate the workloads on machines that are being retired. The real machine is gone, the work it was doing or at the very least, might need to so, lives on. This is a big win for some of the really old versions of Linux or MS Windows that we might need to have on the hook, just in case. A VM that is not up is only using a small amount of disk space. But if a customer calls in with a problem on that platform, that disk space just became priceless.

 

Here is the first cool part: each VM guest has more memory than what it had on the old real PC's hardware. This is limited only by the capabilities of the guest OS itself now. We can make the virtual memory whatever we need it to be to solve the problem at hand.

 

We could run more guests on any given X4600 system. The limitation is really the number of virtual processors in play. With 128 being the current recommended total, that would be 32 quad CPU guests, 64 dual CPU guests, 128 single CPU guests, or some mix thereof. We can't actually run it at this max either though. We have two X4600's in an HA setup. The HA bit means, should an X4600 fail, The surviving X4600 system has to be able to run all of the workload. While we don't exactly half it, we do hold it to about 60 machines up at any given time on any given X4600.

 

We could place 10 X4600's in a single 42U rack, each one rack covering a 2 floor by 3 foot bit of flooring, but we don't. We also have to put shared storage into the rack to enable things like the HA features. We currently have two X4600's, plus a shared SAN disk array.

 

Data center shootout

Pass one: P2V'ing mid 1990's PC's to the X4600's:

 

 

 

Environment

Quantity

Wattage

Price per Kilowatt / Hour

1 year = 8760 hours

X4600, HA, 60 VM's each, 120 VM's total

2

4*850 = 3,400 = 3.4 Kw *2 = 6.8Kw

10 US Cents or 1 USD per 10 hours

8760*3.4 = 29,784 /10 = 2978.40 USD / year per X4600. 5956 USD a year for two X4600's

PC

120

120*375 = 45,000 = 45Kw

10 US Cents or 1USD per 10 hours

8760*45 = 394,200/10 =39,420 USD per year.

 

 

Making the Numbers a Bit More Real

Is this real? No. I used the max ratings of the power supplies on the X4600, and I understated the rating on the PC's: I did not put an AMP clamp on the power cord and see the average amperage draw. I did not take into account power supply efficiencies, or the fact that the X4600 has external disks on the SAN, while the PC's all have internal disks. Plus in the real world we look at three year ROI rather than one year. Further, in our world, we use stuff longer than three years most of the time.

 

What is real above without any adjustment is that 120 computers would require 5 Ergotron 3000 desks to sit on, covering 90 square feet of the data center floor, versus one rack covering six square feet with space left in it for more computers.

 

I also did not take into account Air Conditioning (A/C, but not Alternating Current kind of A/C....), and the power to run the A/C, or the costs of buying UPS or generator to sit behind all the electrons being feed to the computers. Also, we traded in 120 ports worth of 100 Mb Ethernet to eight ports of Gig speed copper. By being close together, we have substantially shortened the Cat 5 copper runs too. Less copper equals less money these days...

 

Staying cool

To get the numbers a little tighter, I'll next factor in the power it takes to run the A/C to cool the data center.

 

Looking at the SAN disks the X4600 uses, we have 2000 watts of power supply in total there. If I half the power number usage for the PC's (IE, once the PC is booted, it settles down to only use half of the power that the power supply is rated for), and leave the X4600's where they are to account for the SAN disks, and round up a bit, we'll be at 6,000 USD a year versus 20,000 USD a year. That intuitively feels about right as well. Three years is 12,000 USD for the X4600 versus 60,000 USD for the PC's.

 

A/C costs can not be ignored. Keeping this in watt / hours denominations (since BTU is a rating of heat removal in one hour), it takes 3.4 BTU of A/C to deal with every watt / hour of power that goes into the computers. For the PC's, after the halving and rounding down I did above to account for not actually using all the power in the power supply at all times, this is 22.5 Kw (22,500 watts) * 3.4. 76,500 BTU of A/C.

 

Two X4600 plus SAN came in rounding up, 7,000 * 3.4. 23,800 BTU of A/C required, but in a much smaller space.

 

I wondered how much power it requires to drive A/C. I looked at the box on a high efficiency A/C unit for some clues. At 12.63 EER, it takes 950 watts to deliver 12,000 BTU of cooling. Our data center uses chill water towers on the roof, massive pumps in the basement, and huge air handlers on the actual data center floor. These are 1993 vintage A/C units, so I assume their EER is not better than this. Even if it is, this is probably good for a ballpark number.

 

For easy math, I'll assume 1000 watts of A/C to deliver every 12,000 BTU. For the PC's that is 76,500 / 12,000. Rounding down again to favor the PC's, that is 6,000 watts of power for cooling. 60 cents an hour to run. Rounding down again, 5,000 USD a year, or 15,000 USD over three years.

 

The X4600's 23,800 BTU requires, rounding up, 2000 watts of A/C power to cool. 20 cents and hour. 5000 USD over three years. Total electricity is going to cost 17,000 for three years on the X4600, with everything rounded up. Total cost of power for the PC's for three years is 75,000 USD, roundingdown at several turns along the way.

 

Service Life and Other Intangibles

We have had these PC's in service for four times that long. I have not counted the cost of data center space, or network connections or staff to maintain the hardware. Clearly the X4600's are going to save use serious money. In the most conservative fiscal sense, have a three year ROI, and start saving us 56,000 USD a year every year they are in service after three.

 

Finally: Power is not getting less expensive as time goes by. These numbers will vary depending on what one currently pays for electricity. In some googleing around to try and validate these numbers, it appeared to me that this was a pretty good median price to use. I saw rates some places where there was inexpensive hydroelectric nearby that were half these, but I also saw some that were higher.

 

Carbon Power

Every kilowatt saved is that much less carbon dioxide in the air. Depending on how power in generated, this is anywhere from almost nothing (Hydro, Geothermal, Wave / Tidal, Wind) to over two pounds of CO2 per Kilowatt / hour (Coal). Texas uses a great deal of natural gas, which is one pound of CO2 per Kilowatt hour. Over the life of these computers that is a pretty serious reduction on the impact we are having on this planet. My personal goal, as data center manager for R&D Support, is to reduce the number of real computers we use by just over 1000 over the next 12 months, increasing the number of OS images by that same number.

 

1000 computers leave, 2000 OS images remain. This will deliver to R&D, QA, and Customer Support a better, faster environment from which they can support our customers. And a better environment in general.

 

I think that is a pretty good use for virtualization.

 

Err... Linux?

Oh... Linux tie in. This is “Adventures in Linux” after all. Many of the VM's are Linux. VMware started on Linux with their Hypervisor. ESX still uses Linux for the service console. You can do exactly the same thing with Xen / Linux. Take your pick.

 

The Rest of the Story

UPS'S are funny beasts, In on of our remote locations, we have used VMware to reduce our server count from over 300 to about 185. At the same time, an undersized UPS in that location went from a runtime of less than 15 minutes to 30 minutes. More than double the runtime with 2/3's the physical systems, but more than 300 OS images in service.

 

It works out this way in part because of the funny nature of the chemistry of the UPS's Lead / Acid batteries. Reducing load increases runtime logarithmically.

 

One final disclaimer: I mentioned the Sun X4600's here, and the Sun X2200's in the NAS Redeaux series. We like that hardware a lot. Sun did not pay me to say anything nice about them, or give me a special deal on the hardware that we did not already get. In fact. I waited till after the hardware was bought to even bring all this up. I was trying to keep this as real as possible by mentioning the actual hardware we are using for this. The SAN array I mentioned above for the X4600 VMware setup is an HP MSA 1000.

 

(I hope I got all that math right: the hardest thing about this is making sure everything is expressed in the same units!)

NAS Redeaux: Q&A

Posted by Steve Carl Jun 14, 2007
Share This:

Reader of the blog writes in to ask why. Film at 11.

 

I am always happy to get email offline or via the comments, or the "contact me" button. One such was a series of emails from Jonathan Wheeler about my last series "NAS Redeaux". With his permission, I am reprinting some of that email conversation here in case others had the same questions.

 

This is two emails that I have merged, so there is some back and forth. I have only edited in a few places for clarity:

 

Just like an email, I'll intersperse my answers:

You've sidestepped a little and talked about using an apple as an option for your storage heads, and I for one applaud that. I loved your line about nothing being sacred, not even Linux. It's very refreshing to see open people, talking about open source, and _actually_ being open!

I hoped this did not mean "Sidestepped" in the sense of a politician sidestepping an issue, but in the sense of "Straying a bit from the pure Linux path". Johnathan reassured me:

The later! I didn't mean to invoke the horror that is the images of politics. Ordinarily I wouldn't beat my drum about Solaris (again), but since you're already using x2200s, the question really is begging to be asked...

I mentioned in the wrapup post that we looked at OpenSolaris as being our next stop, if Fedora had not passed muster.

Yes you did, that was my 'red flag to the bull' so to speak, which I mean in the sense that it really drew my attention...not that I was angry. For storage, I've been using ZFS with around 2TB across 8x300GB SATA disks here - nothing quite like what you're doing of course, but as a long term supporter of both software raid/lvm, and reiserfs in my case (though I have nothing against xfs) - I really was amazed at just how well ZFS behaves.

ZFS is complicated, but from what I can tell, it is mature now. As an aside: there is a Linux port of ZFS already, however it is in the FUSE space and is not a high performer yet under Linux.

Yeah the performance is never going to be great, it's also only attacking the 'fs' part of ZFS, more so then the underlying LVM layer. I'm not too sure if it includes end-to-end checksumming because of this or not. I applaud that it's available, I really related to your home frustrations about FAT being the only working common denominator between your macs and Linux (with reliable write support at least). The case was no better for Solaris was no better until ZFS came along. UFS...erk.

Someone wrote in to that post and pointed out that EXT2 was available on all three platforms. And via FUSE, NTFS-3g is also available, so it looks like the common file systems are now four:

 

  1. ZFS

     

  2. EXT2

     

  3. NTFS

     

  4. FAT/VFAT

     

I use HFS on the fob for now. that locks out MS Windows... but the only reason I have booted to MS Windows lately was to play with Safari. My eyes are firmly on ZFS in Linux though...

 

Taking a moment to reflect, Solaris has ZFS. OS X 10.5 will have ZFS....and you already know about linux/fuse. Wait a minute!!!, stop press! :) Very. Exciting. Times. Ahead. Whether you agree that ZFS is the most advanced or "last word in file systems" or not, I think everyone can agree it's a monumental step forward over FAT! Woohoo! The Linux folk can fight over the CDDL/GNU issues all they like, so long as fuse ZFS works, we all win.

We next had some back and forth about Reiser, and where it was going, and noted that SUSE had abandoned Reiser as it's default FS ...

 

Don't you find it odd that they just went back to ext3? Clearly you guys love XFS, and we all know it's mature by now. While ext3 is ubiquitous, that doesn't make it best in my mind. I'd would have said they just like to play it safe, a perfectly reasonable reason for the decision, but for the fact that they've been using Reiser all this time, so clearly this isn't as critical for them as it may be for other major vendors. *shrug* I'd like to see more of XFS out there, for my last Ubuntu install (for my brother), I decided to give XFS a go. It's been great, unsurprisingly there have been no issues to report....

This is similar to what I just did for my brother. I used EXT3 as the default FS since that is what Ubuntu defaults to, and he was mostly going to use it for web and email, so super speedy FS was not required.

 

I would like to interject at this point that I am not sure love is exactly the right term for our feeling about XFS. We did benchmark XFS against Reiser, EXT3, and JFS in a NAS application. XFS and JFS were clustered closely, and near the top. Reiser was farther down, but well above EXT3.

 

XFS gets its speed in a scary way: it is sort of like UDP. Data Integrity is the responsibility of the application more or less. The file system protocol acks the write before it is actually committed to the disk. This causes deep consternation in some. We tested it for a long time. What we decided was that both the cache controller, and the DC itself have battery backups. For an immobile server, XFS is not that big a gamble. I would more than likely never use XFS in a laptop: in fact, all my personal Linux computers are EXT3. Slow, but reliable. I am not sure of what the file system trickery is in ZFS that makes it both fast and reliable on Solaris (and I assume Apple).

 

By nature it (ZFS formatted external disk arrays) would handle being detached from it's original server, and then being imported on another node (zfs import -f......done! ;) Sweet. It's been said before within various the zfs blogs, but the end to end data integrity offered by ZFS, really does lend itself well to externally attached storage, as there are many many point along the path that silent data corruption can occur, FC or not. While there are a slew of other great features this one in particular gives really good piece of mind with external, cross vendor storage systems. And that is key to moving a solution like this to a higher tier than what we have been using this for before now. As you would well appreciate, silent data corruption is a terrifying thing to un/discover - then there is the tracking it down, and working out what damage has already been done....ugh. May that never happen to either of us again! Tier-2 or not, free peace of mind really sold me on ZFS, it was the primary reason I moved to Solaris.

The main reason we did not go there sooner has more to do with us internally than anything to do with the merits of Solaris or ZFS. These are in part:

 

 

 

  1. We are Linux experts in house. I have Solaris experts, but my NAS Expert is more a HP-UX / Linux person by preference. I really do have someone who has become so good at certifying and finding problems with NAS in our nasty world that to some degree it is his expertise that gives some direction to what we do. HP-UX is famous for being a poor NFS server, so...
  2. It would take The Senior NAS Blaster some time to get up to speed on Solaris to the level he is on Linux now. Not too long, to be sure, but time is finite. We needed to get this storage online: Balancing time to test with time to go, if you will.
  3. No two versions of our tier 2 have ever been exactly alike. To be able to support that, we have also tried to control change to a large degree. This is just standard scientific method stuff: don't change too many variables at the same time

 

 

All very good reasons, I appreciate your candor.

That being said, we'll take more time with the Tier 1 NAS we are looking at using this hardware. I mentioned in the last post the back to back clustering: this has a huge implication in the way that data exports are done. How to migrate them during failover, etc. We think there is a good chance that Sun clustering might be *better* for this application, and so we plan to look at it as soon as we get the hardware.

 

You mentioned ZFS with OSX, so I assume you're well familiar with it, so I'll stop evangelizing, but I genuinely am curious to hear your [team's?] experience with Solaris to date. I'd been really somewhat shocked if the inventors of NFS had NFS compatibility issues too.

The irony here was that it was a Solaris client that was making the RedHat AS 4 and 5 *not* pass certification as NAS servers. Overall, I think that my team prefers (prefers, not as in computer religion, just preference) Solaris to any other OS short of Linux, and for some things probably likes it better. Some probably prefer it for everything. Where Solaris falls short is not in building data center grade servers, but in being easy to install and maintain, relative to Linux.

 

Very true. This needs to be fixed, project Caiman and Indiana can't come soon enough in this space.

Once the server is up and running, short of things like the recent Telnetd problem, we never have to touch them. One of the main source code servers we have was built in 1998 on an E450, and it is still going strong.

 

Heh, I have an E450 in my kitchen at home. Err, that probably deserves some explaining, my kitchen doubles as a server room. I'm a geek, I order in - who needs a kitchen! Great kit those 450s, Mine refuses to die!

OK: You win on Geek points. My dad always told me that that no matter how good you are, there is always someone better. Sigh.

 

Another Solaris server we have is a dedicated Samba proxy. This is an Ultra 10 that since 1998 has had over 500 connections at all times. It just runs and runs and runs.

 

OpenSolaris's "other" problem  (besides not being as easy as Linux to install and admin) is negated by the X2200 hardware we are currently using, that being device drivers. Linux has more hardware breadth. Not an issue with this Sun X series gear though.

 

All the new hardware works really well with Nevada. Anything made in the last year or two, and certainly any server class hardware, flies and works really well. I've only ever had issues with desktops. Getting back to Linux, I'm not here to be the Solaris nut, I still love Linux too; I'm trying really hard not to come across as a zealot here...moving on!

All is fair in love and open source. It really is about the best solution winning. Sun has made a lot of right moves here, and the recent popularity of Solaris 10 is the result. As much as I love Linux, I'd be a very foolish data center manager if I did not consider all my options.

Share This:
Outline of the recent problems with Evolution email client, and some SUSE 10 notes

 

I mentioned a few entrys ago that this article would be coming along. Now it as good a time as any to write it.  Most folks reading a Linux column probably know at least a little about Evolution, but a quick levelset:

 

Evolution is an office application sort of like MS Outlook. It has email, calendaring, task lists and so on.  Early versions implemented a slavish Outlook look as well, but later versions "got better". Evolution was written originally by a company names Ximian, and while Evolution was open source, they made a proprietary add in called "Exchange Connector" that allowed Evolution to work as a "native" protocol client against MS Exchange 2000 and 2003. I bought a copy of Connector back then, and ran it on my Mandrake desktop, so that I never had to use MS Outlook, not even under Codeweavers Crossover product (although I could, because I bought that too).

 

Novell bought Ximian, and open-sourced the Connector, which all the major distros picked up, and that is where my trouble started.

 

The last totally working version I had of Evolution and Connector was the very first open source version I could find: Fedora Core 2, running Evolution 1.4.5 (or may 1.4.6... been a while...). I practically skipped Fedora Core 3 because in my tests, I could never get its version of Connector to work. This was early in the 2.0 days... 2.0.2 or thereabouts. Towards the end FC3 shipped 2.0.4, and this started working, although Connector crashed quite a bit. But it worked enough that I finally upgraded the main laptop I use (the IBM T41) to FC3.

 

Oddly, SUSE 9.3 was a non-starter: even though Novell owns both SUSE and Ximian, I could never get the Connector to work right there either (although turning off all the Groupwise plugins seemed to make it nearly work.). Fedora Core 4 shipped 2.2.2, later updated to 2.2.3 and that also works, and is slightly more stable but still not great.

 

Along the way of this testing, I also a couple Debian derived distros to see how that version of Evolution Connector would work. Xandros and Knoppix (locally installed) pulled their .debs from the same repositorys for Evolution Connector though, so I was not really testing a different level of Connector as it turned out. Neither worked.

 

At this point, the only version of Evolution / Connector that I could get to work at all was the one that shipped late in FC3 and then later the Fedora Core 4 version. To make matters even more confusing, some of the people on my team could not get the Fedora Core 4 version to work: the only one that really worked well was back in the release 1.4.x days of Fedora Core 2!

 

The would be OK, but Evolution 2.x implemented look and feel changes that I wanted, and it also had Spam filtering that was very nice: one option even allows you to tie it in to Spam Assassin, although that is .... very....slow.... Span Assassin works better when you do not run it in the client, but rather up at the transport layer. But it worked (on Fedora) and was handy for me because I get hundreds of Spam's a day *after* BMC drops over 70% of the incoming email on the floor because it is known spam.

 

I have of course googled around trying to find out why Evolution has so many problems (for me anyway), and why Fedora's version works when SUSE's 9.3 and Debians versions did not. And I can find nothing. I have to assume that there is something about our environment that is just plain hostile to Evolution.

 

Connector works by accessing MS Exchange the same way that the Web version of Outlook does. MS used WebDAV to export the files to the client, and layered their HTML based interface over that. Connector uses WebDAV to get at the files, and uses it's own, non-web, regular email "heavy" client interface instead. The same Evolution user interface that also works via IMAP or POP: WebDAV is just an access protocol.

 

I assume Ximian had to do a fair amount of work to crack the files that WebDAV presents, but given the low noise ratio out in "googlespace", and the high level of Evolution uptake, I have to assume that they did this part pretty well. I also have tested MS Exchange servers that I use for my LinuxWorld lab that I have absolutely no issues with. My lab is a straight install of MS Exchange: no mucking around with the defaults.

 

One other data point here: Thunderbird and Kmail have no problems accessing MS Exchange via IMAP, but Evolution 2.x won't work that way either. We have POP turned off, so I can't test that.

 

Here is the good news though: SUSE 10 shipped with 2.4.0... which works but blows up now and again. But it does work. And an update today brought every up to these levels:

 

evolution-webcal-2.4.0.1-3
evolution-data-server-1.4.0-5.2
evolution-2.4.0-3.2
multisync-evolution-0.82+cvs-8
evolution-sharp-0.10.1-4
evolution-pilot-2.4.0-3.2
evolution-exchange-2.4.0-5

 

But it still has limited runtime before the Connector crashes. And Evolution leaves a great deal of itself still running so that I have to run evolution --force-shutdown, and then poke around with PS and kill to make sure all the bits are gone (or just reboot, but that is way too Windowsey).

 

I have not tested every single thing in SUSE 10 yet, but everything I have looked at is a real refinement over 9.3. Kontact gets better and better, and SUSE 10 shipped Kontact version 1.1.2. The MS Exchange calendar download feature of Kontact is working great: I have not looked to see if they have upload (which has never worked quite right) going yet. LDAP address books in Kmail also still elude me: I have locked my self out of my MS Exchange account twice today with a bit of experimentation with them. Or maybe it's Evolution doing that lockout thing.... hard to be sure. I have it all up and running, so I guess I'll need to step though it later and see if I can figure it out.

 

Why bother with all of this?

 

The easy way out here would of course be to just give up: I only control the client side of this problem, but here is where it gets a little odder. I have crashed MS Exchange itself 3 times over the last few months. And every time I did it I was in MS Outlook 2003, under MS Windows XP. At this point, I live it terror of doing that again (because it takes down hundreds of people when I do), so I pretty much *have* to stay out of MS Outlook. Gotta love the irony of that.

 

Crashing everyone else on the server is only one reason of course: I live on a Linux Desktop, and while MS Outlook runs fine under Codeweavers Crossover product, I really prefer to stay in Linux native applications when I can.  The one and only time I have to be in Evolution for is for calendaring. I can do email just fine with Thunderbird. The new 1.5 beta implemented inline spell checking, which was the big missing feature for me. I'm spell checker dependent. All the Evolution crashes are easier to live with since I only have to be in that application long enough to send and receive meetings. And, if need be, I can always use the Web client to MS Outlook for basic calendaring.

 

Calendaring is a lot like the big open standards based to-do around OpenDoc right now (referenced in previous weblog entries). If you have implemented a non-standards ready calendaring solution, then you are stuck with the consequences of that. For a long long time. These things are not easy to get pried out of your infrastructure once they take hold, unless they are fully standards compliant and interoperable. That is the beauty of OpenDoc.

 

We'll be going to MS Exchange 2003 soon, and when that happens, the web interface gets better. Who knows... maybe Evolution will work better there too.  I might even risk firing up MS Outlook again then! But not during business hours, just in case.

Share This:
There is another story buried in all the Google/ Sun news I have yet to see written.

 

We have learned now that there are no immediate plans for an AJAX / StarOffice technology based, web enabled Office. This one, No Office suite from us : Google is from "The Register" , and the rather 'raw' picture included probably is pretty indicative of how some people are feeling about the idea that this is not yet to be.

 

The story within the story is that there are so very many people that were hoping and praying that Google would do an a web enabled office suite.  You know: A real web app: the kind that works from multiple browsers and multiple platforms. Like all the Google web apps already do.

 

If I was sitting in my office, reading email after email from other folks at BMC about how they really wanted someone else to form a group to do R&D Support, to exactly duplicate and replace what my team does, I would have to be doing some pretty serious soul searching about what my team did to get my customers so utterly riled up. I hope to never see such emails!

 

I wonder how MS feels about issues like OpenDoc and MS Office for Linux after seeing such hope, energy, and angst portrayed around the idea that people were hoping against hope that Google / Sun might “save them”. It is clear who they were hoping to be "saved" from.

Share This:
Quick thoughts about the Sun Google announcement, a new Linux NFS client discovery, and other new Linux activities

 

This was one busy week. Between all the quarter close activities here at the office, the busy news week, and a huge amount of work testing various NFS clients against a new Apple OS.X file server, there were hardly two minutes to rub together to write this!

 

The week started with a great deal of excitement about the Sun / Google announcement. It was interesting for me, after looking at all the AJAX news from last week, to watch how this seemed to unfold. It started with “Google is finally going to do it! They are going to take over the desktop!” or things that boiled down to thoughts along those lines. The actual announcement brought deep bewilderment to most: “Huh? An unspecified server deal, an OS co-development thing, and a Java agreement? Where's the story?”. And finally, the more serious analysts settling in to realize that this announcement is only the start... or maybe only the potential start... you never really know how company cultures will mesh.

 

The real thing that was missing for me, as it was for many people was “AJAX StarOffice”. So many missed that non-announcement that you'd have to use Google to find all the links. It would be faster and more up-to-date than linking them here. There might be another point in there someplace too. I would imagine Google search was lit up like a holiday tree with people looking for that bit of news. And another, which is that search is easier and more useful than static pointers: That is one of the cool things about Spotlight on OS.X 10.4 to be sure. But I digress.

 

I am sitting here at my house typing this on my iBook, using NeoOffice 1.1 as my word-processor, and thinking how very cool it would be to be able to store this stuff “Out there” like Gmail, searchable like Gmail, spell-checkable, in OpenDoc format, where I could get at it from work or from home, and then pop it to the weblog when it was done. The thin client dream that was not announced this week. Oh well, at least I have it in OpenDoc format already.

 

To be fair: Google has hardly had time to write/port such a thing as “AJAX StarOffice”. They may be pretty good at AJAX, world class even, but taking the millions of lines of StarOffice and making them presentable as a thin client via AJAX... I would think that might take a little while anyway.

 

Linux NFS client not full NFS V3 standards compliant!

 

We learned an interesting thing this week. We had a short test window in which we could try an experiment, working with R&D build engineers to determine some new things about possible file server configurations. We moved a test copy of one of the new R&D projects we were working on supporting to an Apple Xserve running OS.X 10.4.2. The CIFS side of the server was then benchmarked at about twice as fast as the Alpha-chipped ES40 running Tru64 that used to be our TruCluster, which was and is pretty fast itself. That was not a surprise. Tru64 uses ASU to create it's CIFS protocol layer, and OS.X uses Samba. Our theory going in was that Samba would be faster, not even counting all the differences between the 2 hardware platforms. Both servers can keep a Gig-E wire fully occupied.

 

What was a surprise was that we found a range of Linux based NFS clients that would not work against it. What was worse was that once we dug into the problem, ran some traces and some test programs the problem turned out to be Linux behaving badly.

 

OS.X is a 64 bit OS, based in part on BSD. It's NFS server is 64 bit, and NFS Version 3 has built in support for 64 bit servers and clients. OS.X in 10.4.0 enabled some new 64 bit features in the NFS server code, in agreement with the NFS V3 standard.

 

And to the dismay of a Linux maven such as myself, some Linux NFS clients break on it. It is a very esoteric little problem. My understanding of it after talking to my team about it all through the testing this last week is that the NFS Version 3 standard describes a process where the client can cache directory information locally, in order to speed things up on the network. To keep track of this the NFS server hands out “cookies”, with each cookie being essentially a pointer of sorts to a file on the server. The “cookies” are numbered, and OS.X numbers them very very big. 64 bit big. And the NFS protocol says that is an OK thing to do. But the 2.6 kernel, prior to 2.6.13, has an NFS client that can only deal with 32 bit “cookies”. Even though the 2.6 kernel also is supposed to support NFS Version 3.

 

Apple, like SGI who had this same issue in Irix before them, will add a new NFS export option "-32bitclients" (available starting in 10.4.3) to deal with this problem. That is goodness, but it would be better if Linux did the right, standard compliant thing.

 

Over at http://www.linux-nfs.org/ we found a developer talking about the whole thing: his mission in life right now it to make Linux fully NFS client standards compliant, and the new Fedora 2.6.13 kernel released this week tested out just fine. So the good news is (and I have seen this over and over) “Even when Linux is not standards compliant now, it soon will be”. Also note that it looks like they are doing a great deal of work on NFS V4 for Linux, which is good too, but it will be a while before we could run that in our heterogeneous R&D world.

 

We worked around the 64 bit NFS client issue by using NFS Version 2 overrides on the NIS export, which fixed that problem, but caused the clients, especially those on the WAN, to slow down to an unacceptable level. If it isn't one thing, it's three others of course. Just to keep all our testing spicy, one of the test systems also had memory errors to help us.

We are headed back to using the Tru64 file server, as our time for testing this is at an end. R&D needs to get this product build environment nailed down and production ready. And the Tru64 file server has worked extremely well for 4.5 years, and is working well again, now that it is not in cluster mode.

 

Other Linux stuff

 

The new motherboard was ordered this week, and we hope to have it in and installed in the test Linux file server by the 15th, and our troubled 2.6.5-kerneled NFS server upgraded to 2.6.12 or so shortly after that.

 

Knoppix 4.02 landed on my desk today, and I'll start working with that next week to see how it will work in my Linuxworld lab setup. The current version we use for the lab is 3.6, and while it works very well, we like to stay current for the lab.  I did a quick boot of the DVD before I left work, just to be sure the disk was OK. Tons of new software to poke around on that: I am used to the CD version of Knoppix, not the DVD. Amazing what you can cram in 4 GB.

 

OpenSUSE 10.0 is being downloaded as I type, and I'll get that onto the soon-to-be-former SUSE 9.3 laptop next week as well.  I tried a quick test of a borrowed iPod Nano on the 9.3 system, and unlike my full size iPod, it did not appear right away in Konqeror or GTKPod. But the Nano was not initialized yet either, so maybe that was the issue.

Mandriva released a new version this last week, and Ubantu has a new RC out. Time to find some more test hardware.

 

Mozilla Firefox released 1.5 Beta 2 which it running quite nicely already on the iBook, so it will need to be spun down to Linux and MS Windows test places next week.

A busy busy week coming up, I can see already.

Share This:

A few items related to the news from last week, plus an update on the Linux file server.

 

Two items related to last Fridays post showed up this morning, both on ZDNet:

 

 

Lest you think I get all my news from ZDNet, here is an interesting one from LinuxWorld.au: Vista's licensing speeds NSW govt move to Linux desktops

 

The latter is not a massive move of zillions of desktops away from MS Windows or anything. The point of all this is that in the future, in the production world, one will be less and less able to determine what the client side platform is, and the good news is that if it is done right, such a thing will not matter.

 

Here on my team we pay very close attention to standards such as these. Our main R&D Support web server for example runs MediaWIKI so that we do not have to care what client OS or browser our customer is using. With over 600 BMC products across so many different platforms, they won’t be using the same platform all the time even inside of one development team! Because of the magic of open standards, We can deliver a high grade, collaborative web site no matter if they are sitting at a Solaris system, an AIX system, an MS Windows system, Linux... you name it. As long as the browser that is installed there supports open standards we are good to go.

 

Linux File server update

 

At the end of this week I had hoped to be able to report how our move from the 2.6.5 kernel based file servers, discussed in “When Linux Breaks” was coming along,  after doing an update-in-place to Fedora Core 4 and the 2.6.12 kernel. But it will be a while, and now it is a bit more complicated.

The idea of the low cost file servers was to use commodity based parts, and to shelve spares against the day that we could not easily get the parts anymore. But to get the testing of the upgrade in place we needed done,we assembled the spare parts into a small working system. And then we found out something ahead of time that is good to know: the shelfed CPU we have had a bad fan on it: that is to say the fan had an infant death, and died less than a month after being powered up. It took the spare CPU with it in a very crispy way. And we can not get another Athelon XP 1900+ from our supplier.

 

We could look around eBay and other places, and maybe scare one up, but we are still faced with something we knew would happen sooner or latter: Commodity based gear has passed this generation by.

 

We are now looking at the current generation specifications: 64 bit, dual core, current speeds and feeds, and seeing how this changes our assumptions. One thing we had decided early on in the original design was to use only dual processors for these servers. This was not because we needed the speed: an AMD 1900+ can keep a single Gig-E wire fully occupied as a file server, and still have cycles to burn. The idea was that if we had a runaway process, we’d still have a way in to the server to stop the runaway, and maintain a higher customer facing service level. Even on the tier II NAS applications we use these on, it would not even take one outage avoided to pay for the 2nd processor. And part of this exercise was to learn what we needed to know in order to someday possible make these tier I NAS servers.

 

But now we are wondering if dual core meets the same need as dual processor, and are kicking that around. This will in turn inform our motherboard choice. The goal at the end will be to have a new generation of hardware on the shelve, and a procedure in place to upgrade the older gear to the new standard when the time comes.  We’ll use the 2.6.5 kernel issue and it’s test server to verify all of it. We normally don’t like multivariate problems like this though, so we’ll have to be sure we can control all the factors, or else find a way to break this down into several steps.

 

In any case, that won’t be finished this week.

A Busy News Week

Posted by Steve Carl Sep 30, 2005
Share This:

News about OpenOffice, AJAX, and the future of PC's and the Internet

 

As someone that lives on a Linux desktop full time, and uses Apple OS.X at the house, I have to deal with all the MS specific things that come my way: obvious stuff like MS Word docs, MS Excel spreadsheets, MS Powerpoint presentations, "Web" sites that are created by tools that default to MS specific, non-open-standard layouts and content... on and on, I am always watching the news for items of interest to let me know that this situation is not a permanent one.

 

The first thing I saw that received wide press was the articles and articles about the fact the the State of Massachusetts had closed the books on using closed document formats. One of the best articles I read there was by David Berlind over at ZDNet: Microsoft called Massachusetts' bluff -- and lost

 

I found this article to be fascinating, especially in the amount of information David provides about the possible domino effects of this: The same levels of inter-connectedness that has in part driven MS Office adoption working against it now. MS denies they will do this, but it seems like OpenDoc will be something that MS will have to do at some point in the very near future. It's a great read, and it feeds into my "perfect world" scenario in that use of OpenDoc formats means I can use whatever platform I happen to be sitting at to look at information I need.

 

At the current time, thanks to Sun and the OpenOffice folks, MS formatted documents are not all that hard. But a lot of people having byte by byte unwound the file formats is not the same thing as just having a published, open standard for the file format that can be referenced for as long as the documents are electronically readable. How long they stay readable is another problem: We had that one at NASA when backup tapes fit tape machines we no longer had.. and that no one in fact had. But I digress.... OpenOffice 1.9.125 (I see RC1 is out yesterday, which I loaded on my Xandros laptop and which internally now calls itself 2.0) handles everything in the document decoding business beautifully for me right now (PC Magazine liked), with some exceptions over in the spreadsheet world. And it is easy to use and installed on all my platforms. I use Neooffice on the iBook, but the idea and a lot of the source code is the same.

 

I came across another article that I can not for the life of me find right now about someone that put up OpenOffice.org 2.0 Beta and went to work using it. They never opened a manual or posted a question: they just started using it. Their point was that the whole learning curve FUD is overdone, and it echoed some things I wrote about a while back for LinuxWorld Magazine . Since the author agreed with me, I thought he was a genius :) . The good news for the end users is probably not such good news for all the folks writing “OpenOffice.org for Goofballs” type books though.

 

Wordperfect, my first and formerly favorite word processor will someday have OpenDoc: eWeek: WordPerfect Will Support OpenDocument... Someday It is my formerly favorite only because I don’t want to use the current version off WP Linux at the moment. The copy I have was the re-working of Version 8 that Corel did a while back. It runs on current distros, but .... it just is way behind the MS Windows version, and OpenOffice is now "Good Enough" (tm). I do miss “Show Codes” though.

 

I came across this article “TextMaker 2005 Beta - now with OpenDocument compatibility” while poking around in an unrelated search: I have seen others about many other platforms headed there as well. This one interested me in part because is was a MS Windows only word processor.

 

StarOffice version 8 shipped to some pretty high praise from eWeek: eWeek: StarOffice 8 Is Office's Toughest Rival Yet

 

Another area of interest for me has been the recent but massive development of AJAX technologies. I love Google mail: It is the first AJAX application I have spent any time with, and it is far and away the best web mail package I have seen. And it looks and acts the same from Safari, Opera, or Firefox on my iBook, Opera, Konqueror, or Firefox on Linux, and even Firefox on MS Windows. I had hoped we’d see other web mail clients going to this soon: Further I hoped they would be something that could be used in the glass house to replace the MS Exchange server “experience”: I personally find the web mail interface to MS Exchange to be suboptimal, especially relative to Gmail.

I had no idea how quickly AJAX based Office and collaboration applications were going to appear: A flurry of interesting articles related to this:

OK: That last one is not AJAX based as far as I know, but it is all somehow tied together, at least in my head. And nothing in there about an AJAX alternative interface to MS Exchange either, but still it was interesting to see how fast these applications have appeared.

 

The Gordian knot of the desktop is really calendaring. Email is pretty much easy to do these days, if you leave aside the massive amount of work one has to do to filter out all the Spam and Phishing (like most places, 70% of the email that arrives on BMC’s electronic “doorstep” is spam). But that is a whole other blog, probably called “The Trials and Tribulations of Evolution”

 

Finally, found today was "It's the end of the PC as we know it" which is a commentary piece by CNET News.com's Charles Cooper. I am still thinking about it, One thing I thought about was the Dynabook they mention in the piece, and all the mobile devices that we are going to be using to tap into the application cloud that is the Internet and the Internet-to-be, and that the AJAX technologies mentioned above could be a big part of. But then I think about how I personally interface to this technology. I am not a touch typist, but the keyboard is the way I go: full size, with the “Caps Lock” key turned off in /etc/X11/xorg.conf. I have tried voice typing, but I found it accessing a different part of my brain, not to mention that is didn’t work very well on all the technical jargon. I am not at all fast on text entry on mobile phone keyboards, although my Son is blazingly fast.

 

Earlier this week, I tried to resurrect my HP 620LX, an MS Windows CE device, thinking it would be nice to carry around to jot down things for this Blog with. While the hardware is fine even though it is over 7 years old, the SynCE project does not support it. It has a small but usable keyboard which was why it was attractive for this purpose, but anything I do on it is stuck there for now. So I wonder... Even if the PC as we know it goes away, how will we interface with this brave new world? Maybe those Star Trek Data PADD’s are going to be all the rage soon. Cell phones are already smaller and do more than the original Star Trek shows communicator. Well.. leaving aside that whole talking to geosynchronous orbit without a cell tower thing...

The T41 Returns

Posted by Steve Carl Sep 28, 2005
Share This:

The IBM T41 laptop returns fully repaired and restores normalcy

 

My IBM T41 has returned from it's trip to IBM. One new system board later, it is functioning perfectly. VMware complained when it came up about needing a new system ID, which I told it to generate, and that was it.

 

It is utterly scary how much I missed this little computer though. I had my business continuance plan of course: It got a complete workout during the last week. I work in Houston, which meant I was one of the lucky two million people that headed for the hills.

 

Normally, I would have had the T41 with me. To stay connected to the office, I would have brought up MS Windows XP as a guest of Linux, and used that to VPN in and stay in touch with the status of our shuttered office. I do this because our current VPN client is MS Windows only. The new one is supposed to be multi-platform. I can't wait....

 

But, this was not normal. The T41 was gone, along with what felt like most of my brain, and instead I had my personal emachines 5312. Normally it runs Linux too, but can dual boot over to XP if need be. Normally I only boot it over there when I want to apply Windows patches or Firefox/OpenOffice application updates. But to run VPN I had to be native on MS Windows for nearly a week. It was very disconcerting, but not in an obvious way. I bounce from OS to OS often enough that dealing with MS Windows ideosyncracies was not a problem. It was that the CPU fan would never turn off. Not that loud really. Just this constant white noise that I can't quite put out of my realm of attention.

 

I have come across this before: I had just forgotten it. When Linux boots on the 5312, it notes that the BIOS PST (I think this means Processor Status Table) does not have an entry for the CPU that is installed (An AMD 2400+), and that this indicates a broken BIOS. The /etc/init.d/cpuspeed daemon then works around this, and throttles the CPU so that most of the time, it idles along at just over 500 Mhz, and the fan turns off. The emachines laptop forums are full of commentary about this particular BIOS issue, but I have not yet found a vendor supported BIOS update to fix it.  Oddly, another emachines laptop we have: a 5309, with a AMD Athelon 2500+ does cycle off in MS Windows. I don't understand everything I know about this: but even on the 5309, the fan runs far more under MS Windows than it does Linux, so it is a question of degree to some extent.

 

I have applied the Powernow patch to XP. I have run through services and shut down all the things that do not need to run using  Scott Lowe's cheat sheet (found at http://www.louisville.edu/~rkgill01/images/XPservices.pdf). I do not think the CPU is running full speed all the time: the air coming out of the fan vent is not hot out most of the time when MS Windows is running.

 

Unlike Linux, I can't peek at /proc/cpuinfo to see what is happening on the CPU, which makes me feel half blind.  And for those of you with 5312's and know of the heat issue: yes I have applied the Arctic Silver paste to the CPU heat sink. So I have no idea why MS Windows does this. If I unplug from the AC power, then fan turns off and only cycles on when needed. But there are no settings in Settings / Control Panel / Power Options that let me control the fan. I did set it for "maximum power saving" even when on AC, but that did not help. Maybe there is a registry hack someplace...

 

The T41 is back, so I don't have to deal with it anymore. The 5312 is back to Linux, the fan is off most of the time, and MS Windows is back to being a VMware guest on the T41 for doing those few things that I have not been able to work around any other way. I think for business continuance, I'll upgrade the memory in the 5312, and use my personal copy of VMware there to be able to run XP as the guest OS, and get past all this. But with the new system board on the T41, hopefully, I'm not going to have to use the backup plan.

 

But I didn't plan on having to run away from Houston either: Always good to have a fallback position. Preferably one that is not noisy.

My T41 left me!

Posted by Steve Carl Sep 20, 2005
Share This:
Lobotomy scars occur when my Linux Laptop goes in for repair

My IBM T41 has left me!

 

Not for good though. Tangentially related to my last post is another hardware failure. My IBM T41 / Fedora Core 4 laptop had it's video card go on the fritz. Crazy quilt patterns were on the screen rather than KDE 3.4. Having it fixed was pretty easy though: I called 1-800-IBM-SERV, told them all about it, they mailed me a box, and I sent it in to get fixed.

 

But, now I am bereft. I had no idea how much I lived on Linux till my main squeeze left me. All my old blogs were there, all my email, all my documents, all my pictures, All my LinuxWorld work, all my pre-configured applications and workarounds like Evolution, Crossover office, VMWare, OpenOffice 1.9.125 (2.0 Beta 1). In short, everything I use to get along in an MS Windows centric world from Linux. And everything I do outside that MS centric world too. I had it all square rooted, configured, and ready to go.

 

I kept the hard drive of course. It's right here in this USB enclosure, attached to my SUSE 9.3 test laptop. I can get to everything now that I have configured a user on the laptop that has the same UID as what I had on the Fedora core 4 laptop. But it's clumsy, and feels like the work-around that it is. This system is my test system. There is nothing wrong with SUSE other than I am not "moved in" here. And Evolution doesn't work right. I feel lobotomized!

 

The Linux desktop is certainly a reality to me: I had no idea how much till it was gone.

When Linux Breaks

Posted by Steve Carl Sep 19, 2005
Share This:

Three system configuration tips learned from running Linux as a file server   

 

It is hard to imagine, but Linux can break. I do get spoiled by the long   periods of uptime, to the point that when I have a problem with Linux it   always surprises me. I guess it shouldn't: All those millions of lines of   code, brought together from all those projects around the world... In fact,   maybe what I should be is amazed that it all works so utterly well!

 

The point of my first two content related posts here was actually to set   the stage for this one. I mentioned in “Linux and NAS” that of the ten Low   Cost File Servers we have built to date using Linux and largely hand-built   server hardware that “Three of them were not all we hoped though, with   various stability issues that took us a while to chase to ground.”. This   post is about that chase. It should also be noted that in our various   explorations over the years with the Linux file servers, some of the   hardware was vendor integrated, but most was hand-built on-site to a   particular specification we wanted. As these were not first tier storage, we   could afford some latitude to learn. The three with the problems were all   hand-builts. And we did learn....

 

To tell the end of the story first: there have been two issues. One was a   bad mainboard. The other was that two of the servers built using the 2.6.5   kernel crashed when they had a problem with the System Look Aside Buffer   (SLAB) cache. Worse for us at first was that the symptoms of these two   widely different things appeared the same: system would hang up, and nothing   other than the Big Red Button (BRB) would get them free. Till they hung up   again. Patrol was getting a real workout alerting us to the failures! We did   decide that at some point being notified of constant failure was becoming...   redundant. Monitoring should be about exceptions, and like the file server   appliance from the nether regions before them, these failures felt like they   were becoming the rule.

 

But, other than the maddening frequency with which the pager went off, it   was different too: This was Linux. We had the source code, the Distro   vendors, and the Internet, and we were pretty sure we'd get it taken care   of.

 

The bad MB was the first thing we hit, and it took us a while to decide,   based on what was going on, that we might have a hardware problem. As we had   hand-built the server, we had all the parts on hand to hand-build another.   The key to our tier II file server self support from a hardware perspective   was to always be able to build another if the occasion arrived. The team   built another server, bringing over only the disks from the first... and   that did it. We had no further problems. We would have been smug, other than   in retrospect we felt a bit silly for not figuring it out quicker.

 

Later we had these 2 slightly newer servers start to hang in the same   way: we thought we might have had the same problem at first: it acted the   same. We were thinking we were utterly jinxed on MB's: who ever heard of   this many bad ones in a row? In fact, that question led us to think that we   needed to really square root this failure: repeated observations made us   begin to believe that maybe this one was a bit different. We started to ask   ourselves how we could instrument Linux in order to capture the failure at   something better than “It is hung: boot it” kind of level. With Linux, we   thought we should be able to do better than that. And you can.

 

Here then are three things we ultimately have settled on as being   required for all our file servers: our recipe if you will. We updated our   WIKI doc, and now use this on all the new file servers we build.. all of   which, of course, now behave.

 

1) Enable "magic" sysrq keys: With this enabled, if the system is hung,   but the kernel is at all responsive, you can the sysrq key, and poke around   various areas of the system, as well as do a graceful reboot. There is a   great page about it here: http://linuxgazette.net/issue81/vikas.html,   and another here:    http://www.tldp.org/HOWTO/Remote-Serial-Console-HOWTO/security-sysrq.html.   And of course, Google is your friend if you want to know more. In our doc,   we show these two ways to turn it on for Fedora and SUSE:

 

Fedora Core 2

 

 
/etc/sysctl.conf
 
kernel.sysrq = 1

SuSE SLES 9

 
/etc/sysconfig/sysctl
 
ENABLE_SYSRQ="yes"

2) Console redirection to serial port. If the console redirection to   serial port 0 has been enabled in the BIOS (and yes: this does pre-suppose   you have a BIOS that has this setting...) , then the console can be   supported on BOTH the normal console and the serial port at the same   time. This could be useful in case of a KVM failure. Or an X problem., then   (from our web doc):

 
Note: BIOS, GRUB, and the kernel messages are all directed
at a different level.
 
Note: When Linux boots, init and syslog messages will NOT
appear on the secondary (serial) console!

GRUB   configuration

Add the following lines to /boot/grub/menu.lst

 
/boot/grub/menu.lst

serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1 terminal --timeout=10 console serial

Add both consoles to any "kernel" records in /boot/grub/menu.lst

 
/boot/grub/menu.lst
kernel existing_text console=ttyS0,9600n8r console=tty0

Add getty for serial port to   inittab

Add a line for the serial port getty to /etc/inittab

 
/etc/inittab
co:2345:respawn:/sbin/agetty -h -t 60 ttyS0 9600 vt100

Enable direct root logon

Add a line for the serial port to /etc/securetty

 
/etc/securetty
ttyS0

3) Configure the Linux Kernel Crash Dump (LKCD) . It is no fun to read   dumps of course, but on the other hand, if you are asking for help, or   reading through forums to see if you have a problem and someone says “Hey,   what does it look like in Register XYZ or memory location ... whatever.   Handy to have a dump to poke around in to see the answers. LKCD is   documented here: http://lkcd.sourceforge.net/

One of my team who is largely fearless read through the doc at the web   site, and used the “lcrash” command provided by the LKCD folks to poke   around the dumps from the two misbehaving systems. After three days of   pretty serious learning curve, and multiple dumps to compare, this led us   (me because he told me about it...) to many references about SLAB cache   issues when using early 2.6 kernels on file intensive servers.

Lights went on! Our later servers, with later kernels are stable. Our   earlier servers with 2.4 kernels are stable. Now we think we know why. Now   we are going to test this theory: we are working to put together a procedure   to upgrade an older 2.6.5 kerneled file server into the latest and greatest   2.6.12 kernel. We would like to be able to do this in place: not move the   many terabytes of data over by creating a new “swing server”, but we have   not ruled that out.

The problem we are facing is the program level skew, and all the co and   pre requisites: the 2.6.5 systems are back-level on most everything,   including the EVMS (http://evms.sourceforge.net/)   version. More about all this as we learn it....

PS: Yes: We picked EVMS for our file server volume management: Our future   systems will be LVM2, based on the kernel teams folks decision to adopt LVM2   over EVMS (See here for more about that: http://lwn.net/Articles/14816/).   But those will be new servers, and we won't have to do massive data swings   to put those in place.

Even Clusters get the Blues

Posted by Steve Carl Sep 12, 2005
Share This:
How we got to our current file serving (NAS) cluster, and where we are looking to in Linux for file serving next.

 

We designed our current, mission critical file server based on previous experience with an up-time challenged storage appliance. We decided between reboots to do a clean sheet exercise (if writing a a great big white board that was recently erased qualifies as a “Clean Sheet”) on what should replace our not-so-favorite server. We laid out the parameters of what we needed and also what we wanted, and delineated between the two. To the extent this is ever possible, we did not know the answer ahead of time, as we asked the questions.

 

 

These were some of the main points as I recall them:

 

 

  • Tolerate our R&D network traffic (this was the major failing of our current “solution”)
  • Rolling upgrade capable (another failing)
  • No single points of failure in either the hardware or the software stack (and another)
  • Proven technology  from a stable vendor (yet another...)
  • Expandable for the foreseeable lifetime of the hardware.
  • Be able to handle at least 50 Megabytes a second on a single Gig-E wire
  • Able to deal with Bimodal (NFS and  CIFS) access to a single file system, as well as either NFS or CIFS      only file systems.
  • Logical Volume Management, so that file system could grow without downtime.
  • Able to be maintained by members of my team rather than needing a vendor house call every time we      were working on it. Not that we wanted to have to work on it often.

 

 

The basic idea then was that it had to perform well enough to do our software builds, and that, even when we were servicing it, R&D would never see an outage from their end of things.

 

 

Once we had our list of things, we started calling vendors, asking questions, seeing presentations, getting in test systems, and doing general technical triage. At the end of the day, 4.5 years ago... we didn’t choose Linux. I know.. I know.. it was not an easy thing in some ways. We do have quite a comfort zone with Linux, even back then. But the Linux of the day just was not ready for this particular role in the glass house.

 

 

We chose a Compaq TruCluster. What a machine! Two ES40 main nodes, each with 4 screaming Alpha processors, 4 GB RAM, multiple Gig-E cards on separate buses, two high speed Memory Channel (http://www.hp.com/techservers/systems/symc.html) interconnects, connected to the venerable Compaq Storage Works SAN via twin Brocade switches.

 

 

TruCluster ran on top of Tru64, is a Single System Image (SSI), active-active cluster, and is based on the best, most proven cluster technology known to human-kind: VMS Cluster. With active-active and with both nodes up, we’d get twice as much throughput, so no having an expensive system sitting around waiting for the other to fail. And because VMS Clustering has been around for decades, most of the complexity of such a solution would have been worked through. Normally something this complex could be expected to fail more often rather than less, but this was proven technology.

 

As an aside, this is also why we have been following the Linux OpenSSI project so closely over the years (http://openssi.org/cgi-bin/view?page=openssi.html), since the project is sponsored in large part by HP, who now of course owns TruCluster. More on this later.

 

From time to time over the last several years, as have assembled Linux and other clusters out of spare parts to evaluate their current state of the art relative to TruCluster. We are of course interested in how well these stack up against our original white board list, but also:

 

 

  • How easy it is to build one,  especially the level of customization
  • How cluster-aware the NFS and CIFS  software stacks are
  • General speeds and feeds
  • Basic Reliability / Availability / Serviceability (RAS)

 

 

We were doing this to remain educated and up to speed on the current state of the art... another part of our job is to build these clusters for R&D from time to time for various projects they have going: it’s nice to be ahead of the curve on requests like “Please build me a cluster, and I could really use it by Tuesday if at all possible”. We were also interested primarily in cluster technology that can be used to provide a NAS service: compute clusters or grids were not really in scope.

 

 

Then, our cluster got the blues. OK, really, several things came to pass that have made it time for us to re-visit this whole thing again. First and worst was that our client base shifted. As new UNIX and Linux versions have been released, there has been a drift towards using NFS version 3 over TCP/IP, and away from NFS version 2 or 3 over UDP. This is just a new R&D client default behavior. This makes sense really: NFS V3 over TCP/IP is better on the WAN than UDP. Unless you have a sudden bunch of them, and the server is a TruCluster from when most clients were UDP.

 

 

Because of the design of the TruCluster, and it’s particular implementation of cluster-aware NFS services, Memory Channel (MC) traffic has been increasing. In the classic performance and capacity planning scenario, what used to be no problem at all hit a knee in the curve, and suddenly we had a huge file server and some fairly low speed but very important clients waiting around while traffic cleared inside the MC. And of course, being that all systems wait at the same speed, having all these high speed CPU’s is not helping a bit (even by todays standards, the 4.5 year old Alpha chips still get around pretty good). And of course with Compaq bought out by HP now, and Tru64 being end-of-lifed means we are probably not going to be able to get a major design change implemented on TruCluster to deal with this behavior. It’s not really “broke”, it's just that NFS Version 3 over TCP/IP on one just doesn’t appear to scale. We can manage this of course by changing clients to use UDP one by one (and we have), but that is a short term solution.

 

So, it is time to look at Linux again. We have been building the Low Cost File Servers with Linux for a number of years now: We have a pretty good handle on the technology for non-HA purposes. There are a few things to think about that have changed since we went with the TruCluster that means it is time to build some new test systems:

 

 

  • Linux SSI now supports the 2.6 kernel, which we think may mean better scalability than a 2.4 kernel. Although we do have cause to wonder about this assertion: more on that next time.
  • As mentioned, HP has been providing a great deal of support to Linux SSI, and HP has also been historically very supportive of Linux. Since our TruCluster is now essentially an HP system... well, it just seems like there ought to be some possibilities there
  • The relative low cost of commodity      based hardware makes SSI not nearly as critical: we can afford to have inexpensive servers waiting around for other inexpensive server to fail, so we need to look at the Linux HA project to see what it can offer: http://www.linux-ha.org/
  • Also in Linux HA’s favor: it is a far simpler cluster technology, and so in theory it should be easy to get to a stable, mission critical level of service when there are fewer complications: SSI on the TruCluster came from a 20 + year old technology base: nothing in Linux is quite that old and venerable yet.....

 

 

The Low Cost File servers have had their moments though. Next time some of the things we have learned from them.

Linux and NAS

Posted by Steve Carl Sep 8, 2005
Share This:

We would hardly be the first shop in the world to use Linux as a file server. It’s one of it’s natural, most developed roles in the glass house. What may be slightly surprising is that we don’t yet use it for our most critical workloads, although that day may come soon. More on that in another entry.

 

We take a 2 tiered approach to NAS storage for R&D Support.

 

 

In our first tier is the 5 9’s type storage. The stuff that just can’t go down. The bits and pieces that are used on our “assembly line” to build and manufacturer our own products. The kind of storage that, if it were down would idle hundreds of people around the world in R&D and endanger our time to market. And we know with a great deal of pain just how critical this storage is, because we used to use a storage appliance there, and it could not survive our network. It crashed all the time, and we paid for it dearly.

 

 

Defining my terms here: A "storage appliance" meaning any hardware/software solution sold only as a NAS or SAN solution.

 

I should mention as an aside that our network is a hostile place to be for file servers of any type. With over 3600 computers on the R&D LAN alone, triple that for the WAN, running over 45 ‘variants’ of UNIX (counting AIX 3.2.5 as different from AIX 4.1, especially from a network client point of view, etc.), 6 or 8 variants of MS Windows (not counting differences in service packs), and of course Linux ranging back to Redhat 5.2 and up to the most recent versions from Redhat, SuSE, Mandriva, Debian, Feather, Ubantu, some custom, in-house, build from scratch ‘distros’, and I am sure others I am not thinking of at the moment... Well I am sure you get the idea.

 

 

Every possible version and combination of NFS versions 2, 3 and 4, and all the various SMB varieties as well. Plus the occasionally buggy client behavior where a published standard was not quite implemented the same way by one vendor as others did it. It is enough to make a NAS server run for cover, and some have. But we can’t have that on the manufacturing line. So five years ago we bought a Compaq TruCluster. At north of 130,000 US Dollars a Terabyte, in 5 years ago dollars, it was not cheap. But it didn’t take too many outages to justify the expense either, and our storage appliance was giving us outages with enough frequency that it took far longer to design the server than it did to sell it to management.

 

 

But that left us with a real need to also spend a great deal less on storage for things that do not need that super high level of availability. Things like:

 

 

  • Archives that are accessed too      often to be on tape
  • Build trees for older versions of      products that are only built ad-hoc to solve particular problems,
  • Images of various levels of the OS      captures with Ghost or Lab Expert
  • VMware virtual disks that are ‘on      the hook’, waiting for various deployments
  • ISO Images of various Linux Distros, trailing back in time to when we first started downloading and publishing ISO images on the internel Web server for quick download.

 

 

We came up with a Linux solution shortly after we had the TruCluster up and running. Our operational theory for the first Linux file server was that commodity hardware should be catching up to the point that we could build “Pretty high availability” (PHA) for less than 5000 US Dollars a terabyte. The LCFS or FBCFS (Low Cost File Server, or Faster, Better Cheaper File Server. Hey, I used to work at NASA... what can I say? Faster Better Cheaper was all the rage till that Mars probe went in hard a few years ago. But I digress...)

 

 

The main thing that made this all work was Linux support from 3ware for their storage card. Able to hook up either 8 or 12 PATA disks, and set them up as RAID5 stripes with a hot spare, we were pretty sure that even if the disks weren’t as reliable as the SCSI units in the TruCluster, neither would a failure be customer facing: we’d shelve a few extras as cold spares, and replace them whenever they failed, and at PATA prices.

 

That first unit was 8 200 GB disks,  Redhat 7.3, with a custom kernel from kernel.org, patched with reach-ahead NFS patches, and with IBM’s EVMS installed for volume management. This preceded Redhats acquisition of Sistina, and their new LVM that resulted, and we needed an enterprise class logical volume manager. EVMS filled that to a T.

 

 

That server was so successful (and less the $5000 too) that we have since built nine more, with higher density disks, SATA rather than PATA, and sometimes using the 12 port 3ware cards rather than the 8 port, so that we now have 15 terabytes of storage online with these servers. Far more storage online with these than with the TruCluster, which we rather jealously guard. The Linux Distro has been updated over time, and various experiments done with different Distros even.

 

 

We have never had an issue with any version of NFS or CIFS/SMB. Linux’s NAS stack of TCP/IP , NFS,  and Samba have never had the issues we did with that earlier appliance: even our PHA file servers are more available than those were.

 

 

Three of them were not all we hoped though, with various stability issues that took us a while to chase to ground. Better than being “broken all the time” was not *not* PHA.  Seven of them behaved exactly as we wanted and more.

 

 

Next time: “When good servers go bad --- or --- Even clusters get the blues.”

Share This:

Multiple Distros on multiple machines to test Linux for the Enterprise Desktop continues

 

Last time out I was on an adventure of my lifetime, traveling to our Pune, India office to meet others in BMC's R&D Support team. I had taken my Dell D620, configured with Mint 3.1. It was reliable and trouble free. What was left over from that trip was an issue from the previous post about OpenSUSE 10.3. It was troublesome enough on the D620 hardware that I ejected it at the last minute in favor of Mint 3.1.

 

I had two days back in the office between trips, and spent one evening after everyone left setting up a new set of Linux test systems.

 

Laptops

 

I tend to use laptops to test all things Linux desktop for these reasons.

 

 

  • Linux on a laptop is usually a harder test for Linux, since the hardware can be less standard. Call it a stretch goal.
  • My office is only so big! Laptops save space, and power, and have built in screens so I don't have to have to have a KVM infrastructure.
  • Laptops now outsell desktops, and why not? Dual core, 64 bit, increased memory on RAM and Disk... what do I need a desktop for?

 

 

This approach has been borne out by my recent D620 work with OpenSUSE 10.3 and Mint 3.1. OpenSUSE was problematic, Mint was largely flawless. This is not to say that there would not be a different laptop where the exact reverse could be true. This is one finding on one laptop. It would be a scientific mistake to generalize this one data-point. That is where the IBM T41 and Dell Inspiron 8100 come in.

 

OpenSUSE 10.3 on the IBM T41

 

Replacing Mint 3.0 on the T41, I installed OpenSUSE 10.3. OpenSUSE has run well on the IBM in the past, and frequent contributor to this blog Richard Meyer has it running well on his IBM T series laptops, so I assumed it would work, and it largely does. Yeah, OK, there is that qualification "largely". I can not say it has been perfect. The IBM Thinkpad support is installed, but the screen bright / dim keys are frankly acting wonky. Set the screen brighter, and it steps up to "brighter, brighter, full dim". If I look away, and then back, the screen will be full bright. I have no idea what it is playing at. Previous installs of OpenSUSE did not do this, nor did Mint.

 

The other thing that is not working very well is external / dual head support. The T41 is in a docking station, and a Dell 1280x1024 60 Hz refresh flat panel attached. The two dual head modes YAST knows about are to stretch the desktop across both screens, or to replicate the primary display. What is not available but should be is to treat the second display as a second desktop. Fedora knows that trick, and it has historically worked very well on this exact same hardware.

 

The problem with dual head is that the internal panel is 1400x1050, and so the size mis-match is something Yast can not correctly configure, not even when I force the 1400x1050 panel into 1280x1024 mode. The external panel is driven such that the virtual display is larger than the real one, and it does not pan to let you get at the virtual edges not being shown. Phooey. I wanted this so I could run VMware on the second display and have two OS displays on one computer. Looks like I'll have to put Fedora on if I want that... assuming I can figure out how to get VMware working on Fedora.

 

Meantime, Evolution is working fine, and I have hidden the SLAB menus away where they belong. I know: I am such a Luddite sometimes. Week after next, when I am back from BMC Userworld, I'll work on tweaking out this setup more completely. One annoying thing: Once I installed the security updates, it *removed* one set of debugging symbols for Evolution. I do not know why the debugging symbols were not updated when the base package changed, but that hurts my ability to report issues.

 

The install process itself is much better than it used to be, but still not up to Ubuntu standards. It takes way longer, and enabling the alternate repositories is a slow, chatty bit to the install, while Yast seems to refresh all sorts of things from all over the place.

 

OpenSUSE does something in their install I wish all Distros did: When I told it I was going to use user "steve", it asked me if I wanted to change all the ownership for the current "/home/steve" to match this userid. I am beyond hoping that all the Distros will ever agree as to what the UID of the first added userid should be: 500, 501, 1000, or whatever. Unless LSB defines this someday it will always be different. Adding a check to see if the home directory of the userid just added already exists is so simple, but saves all sorts of issues later.

 

Ubuntu 7.10 on the Dell Inspriron 8100

 

Replacing Mint 3.1 on the Inspiron is Ubuntu 7.10. Ubuntu did not drop in as easily as Mint, and it appears to be because Ubuntu does not enable the "restricted drivers" automatically anymore. The Dell's 1600x1200 screen was mostly black on the first few boots. Finally I used safe graphics mode and 1024x768, and Ubuntu went in fine, although the screen had huge black borders. Once installed and rebooted, I turned on restricted drivers, had it install the Nvidia drivers, and then the screen was fine.

 

This was the only major hiccup in the install, and I hope Mint 4.0 does not follow this path and enables these drivers when it sees that the graphics card is there. It was not a big problem, but it was annoying and did not add to the overall feeling that Ubuntu really knew what it was up to. It just felt sort of braindead: "I know what video card you have, and I have its driver over here, but I am not going to use it till you tell me to". I know the correct 'Restricted Drivers" incantation too: Would a new user think highly of this is they had to google up the solution? I get the deal where the Open Source community wants these drivers opened up, and I agree with that position, but how about a prompt asking me if I wanted the drivers enabled at install time so I could decide back then? At the very least, if I was "Shirley First-time-user" I would know what was up.

 

If there were any changes in the install process from Ubuntu 7.04 they were small enough to escape my notice. Same simple, fast install.

 

Evolution 2.12 appears to work well, but I did not have any real run time on it before I had to go home. Between my day job and changing time zones from IST to CDT, it was all I could do to get this far.

 

Evolution 2.10 is "obsolete"

 

Part of what drove me to do the above OS swizzles was the fact that the current status on the bugs I have reported against Evo 2.10 was first "Duplicate" and then "Obsolete", I.E. Evo 2.10 is replaced now by 2.12. OpenSUSE 10.3 and Ubuntu 7.10 both has 2.12. Mint 4.0 will be out in November, and being based off Ubuntu 7.10 should also have 2.12. I'll take the D620 to that release as soon as I can.

 

Next week I'll be at BMC Userworld, and no doubt my post here will be informed by that experience. After that I will put up a post about the final configuration for our HA Linux NAS that is replacing the Tru64 TruCluster.

Share This:

When trying to come up with a name for this blog, I tried to think of a title that would encapsulate a computer generalists approach to the topic of Linux. I am currently focused on Linux as a MS Windows desktop replacement OS, having migrated myself shorty after “Code Red”, and server virtualization via VMware under Linux. Narrow enough, and probably enough there to stay busy blogging for a good while, but I also manage a team of R&D Support people located in Houston, Dallas, and Pune who use Linux every day in all sorts of roles, mission critical, development, testing, and sometimes just kicking the tires. “Adventures is Linux” was the first thing that popped into my head.

 

I am a computer generalist: I started playing with a TRS-80 model 1 in the late 1970’s, messed around with CP/M, then became a mainframe operator in the early 1980’s, and was a VM system programmer for well over 10 years, working among other places at the Space Shuttle on-board computer development lab as a subcontractor to IBM. I came to BMC in 1989, and started learning UNIX. I hooked BMC up to the Internet in 1993, using the magic of VM to create a “firewall” of sorts, and since 1997 I have managed R&D Support.

 

My first Linux was a set of Slackware CD’s I bought at MicroCenter, the only time I every saw the 1.x kernel, since the next release of Slackware (Slackware ‘96) was the spiffy new 2.0 kernel. It went in on a homebrew AMD, and I spent the next several hours just trying to figure out my X server settings! It was pretty humbling: I thought I knew something about computers till that happened.

 

Once it was working it was handy for X access to the various UNIX servers I supported, but the OS/2 system next to it was where I did all my office work. MS Windows 3.1 used to be on it, but it crashed every day, and I finally went to OS/2 seeking refuge. Yeah yeah... I know. We’ll just skip that part.

 

I went back to MS Windows with NT 4.0 since it didn’t crash nearly as often as the Win 9x core stuff, and was on Windows 2000 when “Code Red” hit. My Linux box was still there, updated over the years, and it was pretty hard not to notice that for an entire week, the only place I could get any work done was there.

 

Call me a fair weather friend, but I decided to see if Linux would work as my full time desktop, and I have not looked back since.

 

I don’t have computer religion per-se. My team supports thousands of computers all around the world that we use to develop BMC products on: Every release of UNIX and Linux known to humankind since 1989 (and could be patched for y2k) is probably in one of our labs. Also some VAX hardware from the late 1980’s, OpenVMS on Itanium 2’s, Sequents, Pyaramids, Seimens Nixdorfs, AS/400’s, and of course Linux on AMD/Intel, Sparc, Power, and the mainframe. Most recently we acquired 2 Apple Xserve’s running OS.X. We also support MS Windows across the board.

 

My personal use of Linux is because it just works. Currently I have Fedora Core 4 on my work laptop, SuSE 9.3 on my test laptop, Xandros OC 3 on another test laptop built of out spare parts found laying about the labs. My personal systems are a Fedora Core 4 hand-built that is my file and print server, a Fedora Core 4 laptop running on an emachines 5312, and an iBook.

 

What I plan to do here is talk about all aspects of Linux. How we use it here at work, things we discover along the way, salted with various things learned along the way by using it as a primary desktop OS. No topic is out of bounds, as long as it’s about Linux.

Filter Blog

By date:
By tag: