Reboots and short downtime 20/2 and 21/2

We are doing major upgrades on the storage system these days and need to do some reboots on Wednesday 20/2 morning and have some downtime on Thursday 21/2 morning.

During downtime last weekend we upgraded Stornext software, Red Hat version (to 7.5) and Infiniband drivers (OFED) on a large number of linux machines. We also retired 300 3TB disks from the storage system.

This week Hitachi have installed 300 new 10TB disks to replace 600 disks in 5 year old Dell storage systems and increase storage capacity. We now need to migrate data from the old to the new storage. We will do this with as little impact for you as possible, migrating data while the system is online.

There are points in the process where we need some periods of downtime:

  1. When the new data volumes will be made available to the machines. Requires reboots.
  2. When we expand the current data volumes with the new storage. Requires downtime.
  3. When we remove the old storage from the data volumes. Requires reboots.
  4. When we move Hitachi storage hardware between racks to free up rack space for future expansion of our tape robot. Requires downtime.

Step 1: We plan a reboot of all fiber connected linux machines on Wednesday Feb. 20. from 07:00. All other machines will be up, but have a short disk hang

Step 2: Expansion of data volumes will be done on Thursday Feb. 21. We need downtime from 08:00 to 10:00, but with the possibility to extend to 11.00 if needed

Step 3: Will be scheduled later when data migration is complete.

Step 4: Will be scheduled in a weekend later this spring.

During step 2 on Thursday, all MacOS workstations will be shut down. Linux machines will remain up, but with unavailable file systems. 

The following file systems will be unavailable:

u3  = /mn/stornext/u3  (i.e. our home directories)

ITA4 = /mn/stornext/d7

ITA5 = /mn/stornext/d8

PSC1 = /mn/stornext/d9

PSC2 = /mn/stornext/d10

PSC3 = /mn/stornext/d11

The following machines will be booted on Wednesday:

enir, algol, shaula, karbana, arietis, dubhe, hadar, arion, beehive34-47, eagle-eagle6, euclid21, owl17-35, charybdis1-3, viscacha, mimosa2, tsih, tsih2 (login.astro.uio.no), tsih3, electra2, electra3, sunflower (NFS and samba servers).

In additions all Mac workstations will be shut down on Thursday.

Status:

20.02.19, 11:45 Reboots today were postponed due to longer initialization times on the new raid sets than anticipated. Instead we will shut down the machines from 06:45 tomorrow morning ahead of the announced downtime.

21.02.19, 09:45 Machines shut down. Expansion of file systems in progress.

21.02.19. 10:00 We will need some more time before we bring the machines up.

21.02.19, 10:50 Logon is enabled and the machines on their way up

 

 

 

By Torben Leifsen
Published Feb. 15, 2019 3:06 PM - Last modified Feb. 21, 2019 10:52 AM