• Principal Systems Administrator - HPC

    Job ID
    Information Technology
    Location : Location
    US-MA-Woods Hole
  • Job Summary

    Woods Hole Oceanographic Institution is currently searching for a Principal Systems Administrator to join the Information Services Department. This is a regular, full-time, exempt position, and is eligible for benefits.


    Work as part of an Information Services team to architect, implement, and support WHOI’s servers, data storage, and HPC clusters, with an emphasis on High Performance Computing (HPC).


    Work closely with scientists and researchers who are leaders in their fields, advising and assisting them with their computing, storage, and HPC needs.  Demonstrated ability to provide high levels of customer service, and to communicate with both technical and non-technical users.  Must have the ability to guide and assist researchers in multiple disciplines to modify workflows and codes to take full advantage of HPC resources.


    Willing and eager to quickly learn and introduce new technologies and applications, have experience with proprietary and open-source software, data storage platforms and operating systems.


    Essential Functions

    • Management of the Institution’s HPC cluster, including both hardware and software and the installation and configuration of scientific software such as ROMS, NetCDF, Blast, QIIME, etc.
    • Work with and provide support to scientific users and researchers who are using servers and HPC clusters. Recommend and implement changes to improve the performance and utilization of HPC clusters and servers.
    • Install and configure large-scale (multiple PBs) data storage with an advanced knowledge and understanding of DDN GS14K, IBM Spectrum Scale (GPFS), NetApp, XFS, ZFS, NFS, Samba.
    • Install and configure servers with an advanced knowledge and understanding of Linux server configurations. Must be fluent in InfiniBand and IP networking (RHEL, CentOS, Debian/Ubuntu, or equivalent). Must have experience with shell scripting (e.g. bash, csh).
    • Keep current on new technologies as they relate to HPC and storage. Evaluate and recommend software and hardware.
    • Work with IS Management and HPC Advisory Committee on short and long term strategies for expanding HPC support and solutions.
    • Develop and maintain technical documentation, both internal and external (user facing) documentation.
    • Maintain and provide metrics on HPC resource utilization.

    Education & Experience

    • Bachelor or Master’s degree in Computer Science or a related field or equivalent combination of skills/experience.
    • 10+ years of experience managing server environments including a Linux-based high performance computing environment.
    • Experience with cluster provisioning packages (Bright Computing, XCAT), monitoring packages (Nagios, Ganglia), scheduling systems (SLURM), version control systems (SVN, GIT), containerization (Docker/Singularity) and automation/configuration management packages (Ansible).
    • Experience with common HPC software stacks, development and management tools.
    • Knowledge of standard networking, security practices, and user management in a large computing environment.
    • Should have experience in at least one interpreted language (Perl, Python, Ruby). Experience with Fortran, C/C++ and Parallel MATLAB is highly desirable.
    • Experience with cloud computing (AWS, Azure) is a plus.



    Physical Requirements

    Physical duties for this position include but are not limited to, ability to lift more than 50 lbs independently; carry more than 50 lbs. Visual abilities to include near and the ability to distinguish basic colors. Hearing requirements include the ability to hear and respond to instructions. Other physical tasks include mostly sedentary which includes repetitive motion; use of hands for basic /fine grasping and manipulation, kneeling, bending, and climbing ladders/stools. Talking, working around others, with others, and being able to work alone. Physical duties are subject to change.





    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed