Some issues remain after downtime


15 July 2016, 5:00PM update: some additional issues we are facing

  • We are experiencing periodic hangs of the GPFS client file system software used with the new storage environment. We have an open support case with the vendor, but no solution at this time. This may affect access to the /fs/project, and /fs/scratch file systems. Reports of transfer failures to these file systems through, and have been reported.  
  • Symlinks transfered from /nfs/gpfs to /fs/project are lost (fixed)

  • ACLs of some directories/files in /fs/project file system are missing (fixed on July 21, 2016 at 12:53PM)

13 July 2016, 11:10AM update: There are a number of user-visible issues we are still resolving.

  • Periodic, short hangs of GPFS clients
  • MIC cards are failing to boot correctly, MIC nodes in Ruby are unavailable
  • Files/directories that were deleted on /nfs/gpfs over the last couple weeks still exist on /fs/project
  • Empty directories created in the last couple weeks on /nfs/gpfs may not have been transferred to /fs/project (fixed on July 20, 2016 at 4:16PM)
  • PCON0005 (BMI) were unable to access /fs/project/PCON0005 due to restrictive permissions (fixed)
  • The epi-dev web service did not start, started by hand, apps apparently use this (fixed)
  • Reports of slow response from quick-batch, and TMPDIR creation failure
  • CIFS service for access to /fs/project is not working, RINCH and interface lab are the affected users
  • Project quotas are not set on the new file system (fixed on July 29, 2016 at 16:05PM)