Minutes Waterloo Polaris Advisory Group (WPAG) April 19, 2000

Attendees:

Terry Stewart Applied Health Science (AHS) Tim Farrell Information Systemm and Technology (IST
Nevil Bromley Arts Ray White IST
Bruce Campbell Engineering Computing Stephen Sempson Science
Erick Engelke Engineering Computing  
Hon Tam Engineering Computing  

Submitted items:

Q: (Bruce) I think we may want to think about making some sort of recommendation to UCIST that some money be spent on an interim solution, i.e.. either a pool of striped disk installed in a Sun somewhere that we do netapp multilevel dumps to, or a basic DLT drive and use the native netapp dump/restore, or other options ?

I think that Jim has shown that anyone relying strictly on the netapp/legato client could have:

  • nothing backed up at all (with an error message to that effect)
  • apparent success, yet with some files not backed up

A: After a long technical discussion of options it was decided that we don't have the mandate to pursue a technical solution. A draft letter has already been sent around to WPAG members to submit an item to next UCIST meeting outlining our concerns that we need a backup solution available in case of catastrophic failure.

We thought of some options which could be proposed. To get an idea of the scale of the backup requirements, here are the space utilizations on 5 of the 6 faculty NetApps...

engfile 126GB (34GB of which is backup of watstar)
scifile 64GB (34GB of which is backup of watstar)
ahsfile 19GB
artsfile 54GB (37GB of which is backup of watstar)
fesfile 35GB

The "backup of watstar" is an online backup of watstar servers. It does need to be backed up to tape, making the total requirement approximately 193GB, plus Math.

Some options we thought of are as follows:

  • install approximately 250GB of disk space into a Sun unix system somewhere. We would backup the netapps to that using standard tools (ie, rdist, dump, compress, etc) and then hoover would backup that sun system. Sun 450 disk, at about $1500 per 18GB would cost about $21,000. This would require a 2 step backup/restore process, and require some crontabs be setup to copy the netapps to the Sun. It would also assume an already existing Sun system had sufficient empty disk slots. It has the added disadvantage that the disk would almost be a "throw away" after a real solution is released. (note 36GB disk would be cheaper, and require fewer disk slots)
  • Same as above, but use NetApp disk. With compression, we could probably get away with 1 shelf of 36GB disks (216GB). That would cost about $30,000, but at least it is a useful investment for whoever gets to keep the disks/shelf when the real solution is released.
  • Buy a single autoloading DLT tape drive, and install into either a Sun or NetApp. All NetApps can backup to either a local drive, or a remote tape drive using "rmt". May cost about $6000-$10000, plus tapes if needed. Would require someone to operate the drive.
  • Buy a basic tape drive for each faculty NetApp. Two are officially supported. Exabyte 8900 8mm is about $3800, Quantum DLT 7000 is about $7000. The 8mm holds 20GB natively, while the DLT holds 35GB natively. Both should get a 2 to 1 compression, and multi tape backups are OK. Would cost 6x$3800= $22,800 plus tapes for the cheaper of the above options. Someone in each faculty would have to operate the drive. Provides a small advantage of putting a modern tape system at each faculties disposal, to allow occasional additional backups of systems if desired. Provides some redundancy if one faculties drive is broken, they could work with another faculty temporarily.

The WPAG discussion results in the following communication:

To: UCIST
From: WPAG
Date: April 20, 2000

Given that:
  • The University of Waterloo relies heavily on its' network appliance file servers.
  • The NetApp/Legato backup solution does not work at all on some file servers, and in cases where it appears to work, it has been shown to not actually backup all files.
  • RAID can provide a false sense of security. RAID does not protect against all types of hardware failure, or software failure.
  • The cost to University of Waterloo in lost time, lost work, and lost credibility could be significant if any file server experienced a significant data loss.
  • The Netapp/Legato backup problem has been ongoing, since the netapps were purchased.
  • Repeated attempts to resolve the technical problems with the vendor have not led to success. It appears that a stable legato/netapp client is still some time away.
  • Repeated attempts to have the supplier pay for an alternate backup solution have not led to success.
  • Some Faculties are investing resources implementing independent interim backup solutions.
We, WPAG, recommend that:
  • UW wait no longer for the current legato/netapp backup system to be fixed.
  • UW wait no longer for the supplier to pay for an alternate backup solution.
  • UW commit resources to implement an interim NetApp backup solution.
We believe the details of who commits the resources, and the technical implementation of the interim backup solution, are outside of the mandate of WPAG. However, we recommend the interim backup solution should be guided by the following priciples:
  • Vendor recommended solution. (ie. a solution which minimizes the use of locally developed techniques, and which is "standard", is preferred)
  • Single step backup and restore process (ie. a backup direct to the backup medium/system)
  • The goal is "disaster recovery" (ie. the recovery of a completely failed system should be straightforward, while the recovery of a single file from 4 months ago is not considered as important. A backup schedule of once per week is adequate)
  • The "interim solution" will be required for 1 year, until a stable netapp/legato client is released and proven in the field.

Q: (Nevil) I would like to confirm that we are not planning on charging students $10 for an account extension.
A: Yes

Q: (Nevil) The Arts K drive is done for moving software to the appfiler. Just waiting for hooks for K:\etc\winlogon.bat and pdrivers.pol and printers.pol etc.
A: Congratulations to Arts for being the first! Erick is of the opinion that possibly we will keep the K:\etc\winlogon.bat and pdrivers.pol and printers.pol until we move away from the current MS Windows 95 based client workstation. Wasted development time is the issue.

Q: (Nevil) What is the life expectancy of watstar drives and the access command?
A: Related to the above question. The answer is dependent upon the purpose however multiple use data should be moved to a more up to date file sharing device such as SAMBA or CIFS shares off the NetApps.

Q: (Nevil) Scratch - I would like to see the rules for minimum password requirements on the web page.
A: Good idea, the information is on the new scratch web page. The suggestion of who was entitled to an account should also be included.

Q: (Nevil) Is there any progress on a web page tool for resetting/cleaning a student account?
A: No progress since no one has been assigned to this task. Steve has a procedure but it has not been automated in any way yet.

Q: (Nevil) Has there been any problems with the current sysctl under testing and when is the plan for final release of it?
A: (Ray) I have forced the upgrade out on all development attached workstations and have only found a few problems with user settings (i.e.. the machine crashed converting Netscape bookmarks into IE Favorites).  Some problems were expected since the code to put the changes into current users accounts has not been written. Blank user account testing confirms that all seems to be working. Sometime early next week (after the code to add new settings to existing N: drives has been written) will be the formal release time.

Q: (Nevil) Carl has noticed that some sysctls seem to be applied out of order. Is this a possibility Ray?
A: (Ray) Yes the past term was odd since we did three updates over the term (typically for Virus definition updates) so the problem could have been exaggerated. The win95install disk includes the SysDec99 patch so there could have been two patches applied in the wrong order. As normal I will roll the interim patches into the SysApr00 so the problem should not be as noticeable.

Q: (Nevil) Erick has anything changed with in the Active Directory discussion since I/Arts talked to you?
A: Yes a brief discussion happened however the results have not formally been announced yet.

Information items:

(Erick) MS Course for MS Windows 2000 has been scheduled for the week of June 26th, 2000.

(Hon) MS Course notes are available on CD for the previous course and can be signed out by Paula.

(Ray) Norton Antivirus in testing makes logon times nearly double especially the virus definitions newer than Jan 1, 2000.

(Nevil) WS_FTP should be included in the general set of software. Ray will install it soon.

(Ray) LWP has a patch if possible I would like to include it in SysApr00 but I will not shift the release date, instead a quick patch in the new term will be applied if necessary.

(Nevil) Are the AppFilers being backed up? Yes as long as the NetApps are being backed up.

(Nevil, Hon) Are we out of disk space on the AppFilers? Not yet as far as we know (I just checked we are at 77% capacity on the master which possibly contains two copies of the data).

(Bruce) Distributed quota management tool for the NetApps has not been written. Currently there is no automated software to individually change user quotas per course basis. Other shares such as office staff common space has been done on a per request manual basis.

(Nevil) What about using other drives such as the P: drive for course specific extra quota space. So far a standard method has not been proposed however ECE just makes general extra space available on the P: drive

(Nevil) Is there any way we can get Netscape to compact the mailboxes automatically so disk space is not wasted? We haven't looked into this problem yet and would like to have know if this problem widespread enough that it needs to be addressed.