Search BMC.com
Search

Adventures in Linux

September 2005 Previous month Next month
Steve Carl

A Busy News Week

Posted by Steve Carl Sep 30, 2005
Share: |


News about OpenOffice, AJAX, and the future of PC's and the Internet

 

As someone that lives on a Linux desktop full time, and uses Apple OS.X at the house, I have to deal with all the MS specific things that come my way: obvious stuff like MS Word docs, MS Excel spreadsheets, MS Powerpoint presentations, "Web" sites that are created by tools that default to MS specific, non-open-standard layouts and content... on and on, I am always watching the news for items of interest to let me know that this situation is not a permanent one.

 

The first thing I saw that received wide press was the articles and articles about the fact the the State of Massachusetts had closed the books on using closed document formats. One of the best articles I read there was by David Berlind over at ZDNet: Microsoft called Massachusetts' bluff -- and lost

 

I found this article to be fascinating, especially in the amount of information David provides about the possible domino effects of this: The same levels of inter-connectedness that has in part driven MS Office adoption working against it now. MS denies they will do this, but it seems like OpenDoc will be something that MS will have to do at some point in the very near future. It's a great read, and it feeds into my "perfect world" scenario in that use of OpenDoc formats means I can use whatever platform I happen to be sitting at to look at information I need.

 

At the current time, thanks to Sun and the OpenOffice folks, MS formatted documents are not all that hard. But a lot of people having byte by byte unwound the file formats is not the same thing as just having a published, open standard for the file format that can be referenced for as long as the documents are electronically readable. How long they stay readable is another problem: We had that one at NASA when backup tapes fit tape machines we no longer had.. and that no one in fact had. But I digress.... OpenOffice 1.9.125 (I see RC1 is out yesterday, which I loaded on my Xandros laptop and which internally now calls itself 2.0) handles everything in the document decoding business beautifully for me right now (PC Magazine liked), with some exceptions over in the spreadsheet world. And it is easy to use and installed on all my platforms. I use Neooffice on the iBook, but the idea and a lot of the source code is the same.

 

I came across another article that I can not for the life of me find right now about someone that put up OpenOffice.org 2.0 Beta and went to work using it. They never opened a manual or posted a question: they just started using it. Their point was that the whole learning curve FUD is overdone, and it echoed some things I wrote about a while back for LinuxWorld Magazine . Since the author agreed with me, I thought he was a genius :) . The good news for the end users is probably not such good news for all the folks writing “OpenOffice.org for Goofballs” type books though.

 

Wordperfect, my first and formerly favorite word processor will someday have OpenDoc: eWeek: WordPerfect Will Support OpenDocument... Someday It is my formerly favorite only because I don’t want to use the current version off WP Linux at the moment. The copy I have was the re-working of Version 8 that Corel did a while back. It runs on current distros, but .... it just is way behind the MS Windows version, and OpenOffice is now "Good Enough" (tm). I do miss “Show Codes” though.

 

I came across this article “TextMaker 2005 Beta - now with OpenDocument compatibility” while poking around in an unrelated search: I have seen others about many other platforms headed there as well. This one interested me in part because is was a MS Windows only word processor.

 

StarOffice version 8 shipped to some pretty high praise from eWeek: eWeek: StarOffice 8 Is Office's Toughest Rival Yet

 

Another area of interest for me has been the recent but massive development of AJAX technologies. I love Google mail: It is the first AJAX application I have spent any time with, and it is far and away the best web mail package I have seen. And it looks and acts the same from Safari, Opera, or Firefox on my iBook, Opera, Konqueror, or Firefox on Linux, and even Firefox on MS Windows. I had hoped we’d see other web mail clients going to this soon: Further I hoped they would be something that could be used in the glass house to replace the MS Exchange server “experience”: I personally find the web mail interface to MS Exchange to be suboptimal, especially relative to Gmail.

I had no idea how quickly AJAX based Office and collaboration applications were going to appear: A flurry of interesting articles related to this:

OK: That last one is not AJAX based as far as I know, but it is all somehow tied together, at least in my head. And nothing in there about an AJAX alternative interface to MS Exchange either, but still it was interesting to see how fast these applications have appeared.

 

The Gordian knot of the desktop is really calendaring. Email is pretty much easy to do these days, if you leave aside the massive amount of work one has to do to filter out all the Spam and Phishing (like most places, 70% of the email that arrives on BMC’s electronic “doorstep” is spam). But that is a whole other blog, probably called “The Trials and Tribulations of Evolution”

 

Finally, found today was "It's the end of the PC as we know it" which is a commentary piece by CNET News.com's Charles Cooper. I am still thinking about it, One thing I thought about was the Dynabook they mention in the piece, and all the mobile devices that we are going to be using to tap into the application cloud that is the Internet and the Internet-to-be, and that the AJAX technologies mentioned above could be a big part of. But then I think about how I personally interface to this technology. I am not a touch typist, but the keyboard is the way I go: full size, with the “Caps Lock” key turned off in /etc/X11/xorg.conf. I have tried voice typing, but I found it accessing a different part of my brain, not to mention that is didn’t work very well on all the technical jargon. I am not at all fast on text entry on mobile phone keyboards, although my Son is blazingly fast.

 

Earlier this week, I tried to resurrect my HP 620LX, an MS Windows CE device, thinking it would be nice to carry around to jot down things for this Blog with. While the hardware is fine even though it is over 7 years old, the SynCE project does not support it. It has a small but usable keyboard which was why it was attractive for this purpose, but anything I do on it is stuck there for now. So I wonder... Even if the PC as we know it goes away, how will we interface with this brave new world? Maybe those Star Trek Data PADD’s are going to be all the rage soon. Cell phones are already smaller and do more than the original Star Trek shows communicator. Well.. leaving aside that whole talking to geosynchronous orbit without a cell tower thing...

Steve Carl

The T41 Returns

Posted by Steve Carl Sep 28, 2005
Share: |


The IBM T41 laptop returns fully repaired and restores normalcy

 

My IBM T41 has returned from it's trip to IBM. One new system board later, it is functioning perfectly. VMware complained when it came up about needing a new system ID, which I told it to generate, and that was it.

 

It is utterly scary how much I missed this little computer though. I had my business continuance plan of course: It got a complete workout during the last week. I work in Houston, which meant I was one of the lucky two million people that headed for the hills.

 

Normally, I would have had the T41 with me. To stay connected to the office, I would have brought up MS Windows XP as a guest of Linux, and used that to VPN in and stay in touch with the status of our shuttered office. I do this because our current VPN client is MS Windows only. The new one is supposed to be multi-platform. I can't wait....

 

But, this was not normal. The T41 was gone, along with what felt like most of my brain, and instead I had my personal emachines 5312. Normally it runs Linux too, but can dual boot over to XP if need be. Normally I only boot it over there when I want to apply Windows patches or Firefox/OpenOffice application updates. But to run VPN I had to be native on MS Windows for nearly a week. It was very disconcerting, but not in an obvious way. I bounce from OS to OS often enough that dealing with MS Windows ideosyncracies was not a problem. It was that the CPU fan would never turn off. Not that loud really. Just this constant white noise that I can't quite put out of my realm of attention.

 

I have come across this before: I had just forgotten it. When Linux boots on the 5312, it notes that the BIOS PST (I think this means Processor Status Table) does not have an entry for the CPU that is installed (An AMD 2400+), and that this indicates a broken BIOS. The /etc/init.d/cpuspeed daemon then works around this, and throttles the CPU so that most of the time, it idles along at just over 500 Mhz, and the fan turns off. The emachines laptop forums are full of commentary about this particular BIOS issue, but I have not yet found a vendor supported BIOS update to fix it.  Oddly, another emachines laptop we have: a 5309, with a AMD Athelon 2500+ does cycle off in MS Windows. I don't understand everything I know about this: but even on the 5309, the fan runs far more under MS Windows than it does Linux, so it is a question of degree to some extent.

 

I have applied the Powernow patch to XP. I have run through services and shut down all the things that do not need to run using  Scott Lowe's cheat sheet (found at http://www.louisville.edu/~rkgill01/images/XPservices.pdf). I do not think the CPU is running full speed all the time: the air coming out of the fan vent is not hot out most of the time when MS Windows is running.

 

Unlike Linux, I can't peek at /proc/cpuinfo to see what is happening on the CPU, which makes me feel half blind.  And for those of you with 5312's and know of the heat issue: yes I have applied the Arctic Silver paste to the CPU heat sink. So I have no idea why MS Windows does this. If I unplug from the AC power, then fan turns off and only cycles on when needed. But there are no settings in Settings / Control Panel / Power Options that let me control the fan. I did set it for "maximum power saving" even when on AC, but that did not help. Maybe there is a registry hack someplace...

 

The T41 is back, so I don't have to deal with it anymore. The 5312 is back to Linux, the fan is off most of the time, and MS Windows is back to being a VMware guest on the T41 for doing those few things that I have not been able to work around any other way. I think for business continuance, I'll upgrade the memory in the 5312, and use my personal copy of VMware there to be able to run XP as the guest OS, and get past all this. But with the new system board on the T41, hopefully, I'm not going to have to use the backup plan.

 

But I didn't plan on having to run away from Houston either: Always good to have a fallback position. Preferably one that is not noisy.

Steve Carl

My T41 left me!

Posted by Steve Carl Sep 20, 2005
Share: |


Lobotomy scars occur when my Linux Laptop goes in for repair

My IBM T41 has left me!

 

Not for good though. Tangentially related to my last post is another hardware failure. My IBM T41 / Fedora Core 4 laptop had it's video card go on the fritz. Crazy quilt patterns were on the screen rather than KDE 3.4. Having it fixed was pretty easy though: I called 1-800-IBM-SERV, told them all about it, they mailed me a box, and I sent it in to get fixed.

 

But, now I am bereft. I had no idea how much I lived on Linux till my main squeeze left me. All my old blogs were there, all my email, all my documents, all my pictures, All my LinuxWorld work, all my pre-configured applications and workarounds like Evolution, Crossover office, VMWare, OpenOffice 1.9.125 (2.0 Beta 1). In short, everything I use to get along in an MS Windows centric world from Linux. And everything I do outside that MS centric world too. I had it all square rooted, configured, and ready to go.

 

I kept the hard drive of course. It's right here in this USB enclosure, attached to my SUSE 9.3 test laptop. I can get to everything now that I have configured a user on the laptop that has the same UID as what I had on the Fedora core 4 laptop. But it's clumsy, and feels like the work-around that it is. This system is my test system. There is nothing wrong with SUSE other than I am not "moved in" here. And Evolution doesn't work right. I feel lobotomized!

 

The Linux desktop is certainly a reality to me: I had no idea how much till it was gone.

Steve Carl

When Linux Breaks

Posted by Steve Carl Sep 19, 2005
Share: |


Three system configuration tips learned from running Linux as a file server   

 

It is hard to imagine, but Linux can break. I do get spoiled by the long   periods of uptime, to the point that when I have a problem with Linux it   always surprises me. I guess it shouldn't: All those millions of lines of   code, brought together from all those projects around the world... In fact,   maybe what I should be is amazed that it all works so utterly well!

 

The point of my first two content related posts here was actually to set   the stage for this one. I mentioned in “Linux and NAS” that of the ten Low   Cost File Servers we have built to date using Linux and largely hand-built   server hardware that “Three of them were not all we hoped though, with   various stability issues that took us a while to chase to ground.”. This   post is about that chase. It should also be noted that in our various   explorations over the years with the Linux file servers, some of the   hardware was vendor integrated, but most was hand-built on-site to a   particular specification we wanted. As these were not first tier storage, we   could afford some latitude to learn. The three with the problems were all   hand-builts. And we did learn....

 

To tell the end of the story first: there have been two issues. One was a   bad mainboard. The other was that two of the servers built using the 2.6.5   kernel crashed when they had a problem with the System Look Aside Buffer   (SLAB) cache. Worse for us at first was that the symptoms of these two   widely different things appeared the same: system would hang up, and nothing   other than the Big Red Button (BRB) would get them free. Till they hung up   again. Patrol was getting a real workout alerting us to the failures! We did   decide that at some point being notified of constant failure was becoming...   redundant. Monitoring should be about exceptions, and like the file server   appliance from the nether regions before them, these failures felt like they   were becoming the rule.

 

But, other than the maddening frequency with which the pager went off, it   was different too: This was Linux. We had the source code, the Distro   vendors, and the Internet, and we were pretty sure we'd get it taken care   of.

 

The bad MB was the first thing we hit, and it took us a while to decide,   based on what was going on, that we might have a hardware problem. As we had   hand-built the server, we had all the parts on hand to hand-build another.   The key to our tier II file server self support from a hardware perspective   was to always be able to build another if the occasion arrived. The team   built another server, bringing over only the disks from the first... and   that did it. We had no further problems. We would have been smug, other than   in retrospect we felt a bit silly for not figuring it out quicker.

 

Later we had these 2 slightly newer servers start to hang in the same   way: we thought we might have had the same problem at first: it acted the   same. We were thinking we were utterly jinxed on MB's: who ever heard of   this many bad ones in a row? In fact, that question led us to think that we   needed to really square root this failure: repeated observations made us   begin to believe that maybe this one was a bit different. We started to ask   ourselves how we could instrument Linux in order to capture the failure at   something better than “It is hung: boot it” kind of level. With Linux, we   thought we should be able to do better than that. And you can.

 

Here then are three things we ultimately have settled on as being   required for all our file servers: our recipe if you will. We updated our   WIKI doc, and now use this on all the new file servers we build.. all of   which, of course, now behave.

 

1) Enable "magic" sysrq keys: With this enabled, if the system is hung,   but the kernel is at all responsive, you can the sysrq key, and poke around   various areas of the system, as well as do a graceful reboot. There is a   great page about it here: http://linuxgazette.net/issue81/vikas.html,   and another here:    http://www.tldp.org/HOWTO/Remote-Serial-Console-HOWTO/security-sysrq.html.   And of course, Google is your friend if you want to know more. In our doc,   we show these two ways to turn it on for Fedora and SUSE:

 

Fedora Core 2

 

 
/etc/sysctl.conf
 
kernel.sysrq = 1

SuSE SLES 9

 
/etc/sysconfig/sysctl
 
ENABLE_SYSRQ="yes"

2) Console redirection to serial port. If the console redirection to   serial port 0 has been enabled in the BIOS (and yes: this does pre-suppose   you have a BIOS that has this setting...) , then the console can be   supported on BOTH the normal console and the serial port at the same   time. This could be useful in case of a KVM failure. Or an X problem., then   (from our web doc):

 
Note: BIOS, GRUB, and the kernel messages are all directed
at a different level.
 
Note: When Linux boots, init and syslog messages will NOT
appear on the secondary (serial) console!

GRUB   configuration

Add the following lines to /boot/grub/menu.lst

 
/boot/grub/menu.lst

serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1 terminal --timeout=10 console serial

Add both consoles to any "kernel" records in /boot/grub/menu.lst

 
/boot/grub/menu.lst
kernel existing_text console=ttyS0,9600n8r console=tty0

Add getty for serial port to   inittab

Add a line for the serial port getty to /etc/inittab

 
/etc/inittab
co:2345:respawn:/sbin/agetty -h -t 60 ttyS0 9600 vt100

Enable direct root logon

Add a line for the serial port to /etc/securetty

 
/etc/securetty
ttyS0

3) Configure the Linux Kernel Crash Dump (LKCD) . It is no fun to read   dumps of course, but on the other hand, if you are asking for help, or   reading through forums to see if you have a problem and someone says “Hey,   what does it look like in Register XYZ or memory location ... whatever.   Handy to have a dump to poke around in to see the answers. LKCD is   documented here: http://lkcd.sourceforge.net/

One of my team who is largely fearless read through the doc at the web   site, and used the “lcrash” command provided by the LKCD folks to poke   around the dumps from the two misbehaving systems. After three days of   pretty serious learning curve, and multiple dumps to compare, this led us   (me because he told me about it...) to many references about SLAB cache   issues when using early 2.6 kernels on file intensive servers.

Lights went on! Our later servers, with later kernels are stable. Our   earlier servers with 2.4 kernels are stable. Now we think we know why. Now   we are going to test this theory: we are working to put together a procedure   to upgrade an older 2.6.5 kerneled file server into the latest and greatest   2.6.12 kernel. We would like to be able to do this in place: not move the   many terabytes of data over by creating a new “swing server”, but we have   not ruled that out.

The problem we are facing is the program level skew, and all the co and   pre requisites: the 2.6.5 systems are back-level on most everything,   including the EVMS (http://evms.sourceforge.net/)   version. More about all this as we learn it....

PS: Yes: We picked EVMS for our file server volume management: Our future   systems will be LVM2, based on the kernel teams folks decision to adopt LVM2   over EVMS (See here for more about that: http://lwn.net/Articles/14816/).   But those will be new servers, and we won't have to do massive data swings   to put those in place.

Steve Carl

Even Clusters get the Blues

Posted by Steve Carl Sep 12, 2005
Share: |


How we got to our current file serving (NAS) cluster, and where we are looking to in Linux for file serving next.

 

We designed our current, mission critical file server based on previous experience with an up-time challenged storage appliance. We decided between reboots to do a clean sheet exercise (if writing a a great big white board that was recently erased qualifies as a “Clean Sheet”) on what should replace our not-so-favorite server. We laid out the parameters of what we needed and also what we wanted, and delineated between the two. To the extent this is ever possible, we did not know the answer ahead of time, as we asked the questions.

 

 

These were some of the main points as I recall them:

 

 

  • Tolerate our R&D network traffic (this was the major failing of our current “solution”)
  • Rolling upgrade capable (another failing)
  • No single points of failure in either the hardware or the software stack (and another)
  • Proven technology  from a stable vendor (yet another...)
  • Expandable for the foreseeable lifetime of the hardware.
  • Be able to handle at least 50 Megabytes a second on a single Gig-E wire
  • Able to deal with Bimodal (NFS and  CIFS) access to a single file system, as well as either NFS or CIFS      only file systems.
  • Logical Volume Management, so that file system could grow without downtime.
  • Able to be maintained by members of my team rather than needing a vendor house call every time we      were working on it. Not that we wanted to have to work on it often.

 

 

The basic idea then was that it had to perform well enough to do our software builds, and that, even when we were servicing it, R&D would never see an outage from their end of things.

 

 

Once we had our list of things, we started calling vendors, asking questions, seeing presentations, getting in test systems, and doing general technical triage. At the end of the day, 4.5 years ago... we didn’t choose Linux. I know.. I know.. it was not an easy thing in some ways. We do have quite a comfort zone with Linux, even back then. But the Linux of the day just was not ready for this particular role in the glass house.

 

 

We chose a Compaq TruCluster. What a machine! Two ES40 main nodes, each with 4 screaming Alpha processors, 4 GB RAM, multiple Gig-E cards on separate buses, two high speed Memory Channel (http://www.hp.com/techservers/systems/symc.html) interconnects, connected to the venerable Compaq Storage Works SAN via twin Brocade switches.

 

 

TruCluster ran on top of Tru64, is a Single System Image (SSI), active-active cluster, and is based on the best, most proven cluster technology known to human-kind: VMS Cluster. With active-active and with both nodes up, we’d get twice as much throughput, so no having an expensive system sitting around waiting for the other to fail. And because VMS Clustering has been around for decades, most of the complexity of such a solution would have been worked through. Normally something this complex could be expected to fail more often rather than less, but this was proven technology.

 

As an aside, this is also why we have been following the Linux OpenSSI project so closely over the years (http://openssi.org/cgi-bin/view?page=openssi.html), since the project is sponsored in large part by HP, who now of course owns TruCluster. More on this later.

 

From time to time over the last several years, as have assembled Linux and other clusters out of spare parts to evaluate their current state of the art relative to TruCluster. We are of course interested in how well these stack up against our original white board list, but also:

 

 

  • How easy it is to build one,  especially the level of customization
  • How cluster-aware the NFS and CIFS  software stacks are
  • General speeds and feeds
  • Basic Reliability / Availability / Serviceability (RAS)

 

 

We were doing this to remain educated and up to speed on the current state of the art... another part of our job is to build these clusters for R&D from time to time for various projects they have going: it’s nice to be ahead of the curve on requests like “Please build me a cluster, and I could really use it by Tuesday if at all possible”. We were also interested primarily in cluster technology that can be used to provide a NAS service: compute clusters or grids were not really in scope.

 

 

Then, our cluster got the blues. OK, really, several things came to pass that have made it time for us to re-visit this whole thing again. First and worst was that our client base shifted. As new UNIX and Linux versions have been released, there has been a drift towards using NFS version 3 over TCP/IP, and away from NFS version 2 or 3 over UDP. This is just a new R&D client default behavior. This makes sense really: NFS V3 over TCP/IP is better on the WAN than UDP. Unless you have a sudden bunch of them, and the server is a TruCluster from when most clients were UDP.

 

 

Because of the design of the TruCluster, and it’s particular implementation of cluster-aware NFS services, Memory Channel (MC) traffic has been increasing. In the classic performance and capacity planning scenario, what used to be no problem at all hit a knee in the curve, and suddenly we had a huge file server and some fairly low speed but very important clients waiting around while traffic cleared inside the MC. And of course, being that all systems wait at the same speed, having all these high speed CPU’s is not helping a bit (even by todays standards, the 4.5 year old Alpha chips still get around pretty good). And of course with Compaq bought out by HP now, and Tru64 being end-of-lifed means we are probably not going to be able to get a major design change implemented on TruCluster to deal with this behavior. It’s not really “broke”, it's just that NFS Version 3 over TCP/IP on one just doesn’t appear to scale. We can manage this of course by changing clients to use UDP one by one (and we have), but that is a short term solution.

 

So, it is time to look at Linux again. We have been building the Low Cost File Servers with Linux for a number of years now: We have a pretty good handle on the technology for non-HA purposes. There are a few things to think about that have changed since we went with the TruCluster that means it is time to build some new test systems:

 

 

  • Linux SSI now supports the 2.6 kernel, which we think may mean better scalability than a 2.4 kernel. Although we do have cause to wonder about this assertion: more on that next time.
  • As mentioned, HP has been providing a great deal of support to Linux SSI, and HP has also been historically very supportive of Linux. Since our TruCluster is now essentially an HP system... well, it just seems like there ought to be some possibilities there
  • The relative low cost of commodity      based hardware makes SSI not nearly as critical: we can afford to have inexpensive servers waiting around for other inexpensive server to fail, so we need to look at the Linux HA project to see what it can offer: http://www.linux-ha.org/
  • Also in Linux HA’s favor: it is a far simpler cluster technology, and so in theory it should be easy to get to a stable, mission critical level of service when there are fewer complications: SSI on the TruCluster came from a 20 + year old technology base: nothing in Linux is quite that old and venerable yet.....

 

 

The Low Cost File servers have had their moments though. Next time some of the things we have learned from them.

Steve Carl

Linux and NAS

Posted by Steve Carl Sep 8, 2005
Share: |


We would hardly be the first shop in the world to use Linux as a file server. It’s one of it’s natural, most developed roles in the glass house. What may be slightly surprising is that we don’t yet use it for our most critical workloads, although that day may come soon. More on that in another entry.

 

We take a 2 tiered approach to NAS storage for R&D Support.

 

 

In our first tier is the 5 9’s type storage. The stuff that just can’t go down. The bits and pieces that are used on our “assembly line” to build and manufacturer our own products. The kind of storage that, if it were down would idle hundreds of people around the world in R&D and endanger our time to market. And we know with a great deal of pain just how critical this storage is, because we used to use a storage appliance there, and it could not survive our network. It crashed all the time, and we paid for it dearly.

 

 

Defining my terms here: A "storage appliance" meaning any hardware/software solution sold only as a NAS or SAN solution.

 

I should mention as an aside that our network is a hostile place to be for file servers of any type. With over 3600 computers on the R&D LAN alone, triple that for the WAN, running over 45 ‘variants’ of UNIX (counting AIX 3.2.5 as different from AIX 4.1, especially from a network client point of view, etc.), 6 or 8 variants of MS Windows (not counting differences in service packs), and of course Linux ranging back to Redhat 5.2 and up to the most recent versions from Redhat, SuSE, Mandriva, Debian, Feather, Ubantu, some custom, in-house, build from scratch ‘distros’, and I am sure others I am not thinking of at the moment... Well I am sure you get the idea.

 

 

Every possible version and combination of NFS versions 2, 3 and 4, and all the various SMB varieties as well. Plus the occasionally buggy client behavior where a published standard was not quite implemented the same way by one vendor as others did it. It is enough to make a NAS server run for cover, and some have. But we can’t have that on the manufacturing line. So five years ago we bought a Compaq TruCluster. At north of 130,000 US Dollars a Terabyte, in 5 years ago dollars, it was not cheap. But it didn’t take too many outages to justify the expense either, and our storage appliance was giving us outages with enough frequency that it took far longer to design the server than it did to sell it to management.

 

 

But that left us with a real need to also spend a great deal less on storage for things that do not need that super high level of availability. Things like:

 

 

  • Archives that are accessed too      often to be on tape
  • Build trees for older versions of      products that are only built ad-hoc to solve particular problems,
  • Images of various levels of the OS      captures with Ghost or Lab Expert
  • VMware virtual disks that are ‘on      the hook’, waiting for various deployments
  • ISO Images of various Linux Distros, trailing back in time to when we first started downloading and publishing ISO images on the internel Web server for quick download.

 

 

We came up with a Linux solution shortly after we had the TruCluster up and running. Our operational theory for the first Linux file server was that commodity hardware should be catching up to the point that we could build “Pretty high availability” (PHA) for less than 5000 US Dollars a terabyte. The LCFS or FBCFS (Low Cost File Server, or Faster, Better Cheaper File Server. Hey, I used to work at NASA... what can I say? Faster Better Cheaper was all the rage till that Mars probe went in hard a few years ago. But I digress...)

 

 

The main thing that made this all work was Linux support from 3ware for their storage card. Able to hook up either 8 or 12 PATA disks, and set them up as RAID5 stripes with a hot spare, we were pretty sure that even if the disks weren’t as reliable as the SCSI units in the TruCluster, neither would a failure be customer facing: we’d shelve a few extras as cold spares, and replace them whenever they failed, and at PATA prices.

 

That first unit was 8 200 GB disks,  Redhat 7.3, with a custom kernel from kernel.org, patched with reach-ahead NFS patches, and with IBM’s EVMS installed for volume management. This preceded Redhats acquisition of Sistina, and their new LVM that resulted, and we needed an enterprise class logical volume manager. EVMS filled that to a T.

 

 

That server was so successful (and less the $5000 too) that we have since built nine more, with higher density disks, SATA rather than PATA, and sometimes using the 12 port 3ware cards rather than the 8 port, so that we now have 15 terabytes of storage online with these servers. Far more storage online with these than with the TruCluster, which we rather jealously guard. The Linux Distro has been updated over time, and various experiments done with different Distros even.

 

 

We have never had an issue with any version of NFS or CIFS/SMB. Linux’s NAS stack of TCP/IP , NFS,  and Samba have never had the issues we did with that earlier appliance: even our PHA file servers are more available than those were.

 

 

Three of them were not all we hoped though, with various stability issues that took us a while to chase to ground. Better than being “broken all the time” was not *not* PHA.  Seven of them behaved exactly as we wanted and more.

 

 

Next time: “When good servers go bad --- or --- Even clusters get the blues.”

Share: |


Multiple Distros on multiple machines to test Linux for the Enterprise Desktop continues

 

Last time out I was on an adventure of my lifetime, traveling to our Pune, India office to meet others in BMC's R&D Support team. I had taken my Dell D620, configured with Mint 3.1. It was reliable and trouble free. What was left over from that trip was an issue from the previous post about OpenSUSE 10.3. It was troublesome enough on the D620 hardware that I ejected it at the last minute in favor of Mint 3.1.

 

I had two days back in the office between trips, and spent one evening after everyone left setting up a new set of Linux test systems.

 

Laptops

 

I tend to use laptops to test all things Linux desktop for these reasons.

 

 

  • Linux on a laptop is usually a harder test for Linux, since the hardware can be less standard. Call it a stretch goal.
  • My office is only so big! Laptops save space, and power, and have built in screens so I don't have to have to have a KVM infrastructure.
  • Laptops now outsell desktops, and why not? Dual core, 64 bit, increased memory on RAM and Disk... what do I need a desktop for?

 

 

This approach has been borne out by my recent D620 work with OpenSUSE 10.3 and Mint 3.1. OpenSUSE was problematic, Mint was largely flawless. This is not to say that there would not be a different laptop where the exact reverse could be true. This is one finding on one laptop. It would be a scientific mistake to generalize this one data-point. That is where the IBM T41 and Dell Inspiron 8100 come in.

 

OpenSUSE 10.3 on the IBM T41

 

Replacing Mint 3.0 on the T41, I installed OpenSUSE 10.3. OpenSUSE has run well on the IBM in the past, and frequent contributor to this blog Richard Meyer has it running well on his IBM T series laptops, so I assumed it would work, and it largely does. Yeah, OK, there is that qualification "largely". I can not say it has been perfect. The IBM Thinkpad support is installed, but the screen bright / dim keys are frankly acting wonky. Set the screen brighter, and it steps up to "brighter, brighter, full dim". If I look away, and then back, the screen will be full bright. I have no idea what it is playing at. Previous installs of OpenSUSE did not do this, nor did Mint.

 

The other thing that is not working very well is external / dual head support. The T41 is in a docking station, and a Dell 1280x1024 60 Hz refresh flat panel attached. The two dual head modes YAST knows about are to stretch the desktop across both screens, or to replicate the primary display. What is not available but should be is to treat the second display as a second desktop. Fedora knows that trick, and it has historically worked very well on this exact same hardware.

 

The problem with dual head is that the internal panel is 1400x1050, and so the size mis-match is something Yast can not correctly configure, not even when I force the 1400x1050 panel into 1280x1024 mode. The external panel is driven such that the virtual display is larger than the real one, and it does not pan to let you get at the virtual edges not being shown. Phooey. I wanted this so I could run VMware on the second display and have two OS displays on one computer. Looks like I'll have to put Fedora on if I want that... assuming I can figure out how to get VMware working on Fedora.

 

Meantime, Evolution is working fine, and I have hidden the SLAB menus away where they belong. I know: I am such a Luddite sometimes. Week after next, when I am back from BMC Userworld, I'll work on tweaking out this setup more completely. One annoying thing: Once I installed the security updates, it *removed* one set of debugging symbols for Evolution. I do not know why the debugging symbols were not updated when the base package changed, but that hurts my ability to report issues.

 

The install process itself is much better than it used to be, but still not up to Ubuntu standards. It takes way longer, and enabling the alternate repositories is a slow, chatty bit to the install, while Yast seems to refresh all sorts of things from all over the place.

 

OpenSUSE does something in their install I wish all Distros did: When I told it I was going to use user "steve", it asked me if I wanted to change all the ownership for the current "/home/steve" to match this userid. I am beyond hoping that all the Distros will ever agree as to what the UID of the first added userid should be: 500, 501, 1000, or whatever. Unless LSB defines this someday it will always be different. Adding a check to see if the home directory of the userid just added already exists is so simple, but saves all sorts of issues later.

 

Ubuntu 7.10 on the Dell Inspriron 8100

 

Replacing Mint 3.1 on the Inspiron is Ubuntu 7.10. Ubuntu did not drop in as easily as Mint, and it appears to be because Ubuntu does not enable the "restricted drivers" automatically anymore. The Dell's 1600x1200 screen was mostly black on the first few boots. Finally I used safe graphics mode and 1024x768, and Ubuntu went in fine, although the screen had huge black borders. Once installed and rebooted, I turned on restricted drivers, had it install the Nvidia drivers, and then the screen was fine.

 

This was the only major hiccup in the install, and I hope Mint 4.0 does not follow this path and enables these drivers when it sees that the graphics card is there. It was not a big problem, but it was annoying and did not add to the overall feeling that Ubuntu really knew what it was up to. It just felt sort of braindead: "I know what video card you have, and I have its driver over here, but I am not going to use it till you tell me to". I know the correct 'Restricted Drivers" incantation too: Would a new user think highly of this is they had to google up the solution? I get the deal where the Open Source community wants these drivers opened up, and I agree with that position, but how about a prompt asking me if I wanted the drivers enabled at install time so I could decide back then? At the very least, if I was "Shirley First-time-user" I would know what was up.

 

If there were any changes in the install process from Ubuntu 7.04 they were small enough to escape my notice. Same simple, fast install.

 

Evolution 2.12 appears to work well, but I did not have any real run time on it before I had to go home. Between my day job and changing time zones from IST to CDT, it was all I could do to get this far.

 

Evolution 2.10 is "obsolete"

 

Part of what drove me to do the above OS swizzles was the fact that the current status on the bugs I have reported against Evo 2.10 was first "Duplicate" and then "Obsolete", I.E. Evo 2.10 is replaced now by 2.12. OpenSUSE 10.3 and Ubuntu 7.10 both has 2.12. Mint 4.0 will be out in November, and being based off Ubuntu 7.10 should also have 2.12. I'll take the D620 to that release as soon as I can.

 

Next week I'll be at BMC Userworld, and no doubt my post here will be informed by that experience. After that I will put up a post about the final configuration for our HA Linux NAS that is replacing the Tru64 TruCluster.

Share: |


When trying to come up with a name for this blog, I tried to think of a title that would encapsulate a computer generalists approach to the topic of Linux. I am currently focused on Linux as a MS Windows desktop replacement OS, having migrated myself shorty after “Code Red”, and server virtualization via VMware under Linux. Narrow enough, and probably enough there to stay busy blogging for a good while, but I also manage a team of R&D Support people located in Houston, Dallas, and Pune who use Linux every day in all sorts of roles, mission critical, development, testing, and sometimes just kicking the tires. “Adventures is Linux” was the first thing that popped into my head.

 

I am a computer generalist: I started playing with a TRS-80 model 1 in the late 1970’s, messed around with CP/M, then became a mainframe operator in the early 1980’s, and was a VM system programmer for well over 10 years, working among other places at the Space Shuttle on-board computer development lab as a subcontractor to IBM. I came to BMC in 1989, and started learning UNIX. I hooked BMC up to the Internet in 1993, using the magic of VM to create a “firewall” of sorts, and since 1997 I have managed R&D Support.

 

My first Linux was a set of Slackware CD’s I bought at MicroCenter, the only time I every saw the 1.x kernel, since the next release of Slackware (Slackware ‘96) was the spiffy new 2.0 kernel. It went in on a homebrew AMD, and I spent the next several hours just trying to figure out my X server settings! It was pretty humbling: I thought I knew something about computers till that happened.

 

Once it was working it was handy for X access to the various UNIX servers I supported, but the OS/2 system next to it was where I did all my office work. MS Windows 3.1 used to be on it, but it crashed every day, and I finally went to OS/2 seeking refuge. Yeah yeah... I know. We’ll just skip that part.

 

I went back to MS Windows with NT 4.0 since it didn’t crash nearly as often as the Win 9x core stuff, and was on Windows 2000 when “Code Red” hit. My Linux box was still there, updated over the years, and it was pretty hard not to notice that for an entire week, the only place I could get any work done was there.

 

Call me a fair weather friend, but I decided to see if Linux would work as my full time desktop, and I have not looked back since.

 

I don’t have computer religion per-se. My team supports thousands of computers all around the world that we use to develop BMC products on: Every release of UNIX and Linux known to humankind since 1989 (and could be patched for y2k) is probably in one of our labs. Also some VAX hardware from the late 1980’s, OpenVMS on Itanium 2’s, Sequents, Pyaramids, Seimens Nixdorfs, AS/400’s, and of course Linux on AMD/Intel, Sparc, Power, and the mainframe. Most recently we acquired 2 Apple Xserve’s running OS.X. We also support MS Windows across the board.

 

My personal use of Linux is because it just works. Currently I have Fedora Core 4 on my work laptop, SuSE 9.3 on my test laptop, Xandros OC 3 on another test laptop built of out spare parts found laying about the labs. My personal systems are a Fedora Core 4 hand-built that is my file and print server, a Fedora Core 4 laptop running on an emachines 5312, and an iBook.

 

What I plan to do here is talk about all aspects of Linux. How we use it here at work, things we discover along the way, salted with various things learned along the way by using it as a primary desktop OS. No topic is out of bounds, as long as it’s about Linux.

Filter Blog

By date:
By tag:
It's amazing what I.T. was meant to be.