Table of Content
- Job Offer: High Performance Computing (HPC) Infrastructure and architecture engineer (m/f)
Job Offer: High Performance Computing (HPC) Infrastructure and architecture engineer (m/f)
I'm always seeking for talented people having high level skills in the domains of High Performance Computing management and a strong devops culture.
So if you have a master and/or bachelor in Computer Science, or equivalent degree with expert knowledge in Linux system administration, and if you are seeking to work in an international environment within a dynamic team, and share our commitment for excellence, consider applying to join me and the UL HPC Team.
Short overview of this job offer is now proposed:
- Full-time contract (40 hours/week)
- competitive salary
- Job Location: University of Luxembourg, Luxembourg
The University of Luxembourg (UL) is seeking to hire a highly motivated and an outstanding High Performance Computing (HPC) Infrastructure and architecture engineer within the HPC team of the University of Luxembourg led by Prof. Pascal Bouvry (Head) and Dr. Sébastien Varrette (Deputy Head).
Since 2006, the University of Luxembourg has invested into its own High Performance Computing (HPC) facilities. Special focus was laid on the development of large computing power combined with huge data storage capacity to accelerate the research performed in intensive computing and large-scale data analytic (Big Data). This characteristic distinguishes the HPC center at the university from many other HPC facilities, which often concentrate on only one of these two pillars. Nowadays, the UL HPC facility remains the biggest platform in Luxembourg and the HPC team is deeply involved in the national and European HPC developments. Further information can be found on https://hpc.uni.lu.
The HPC Infrastructure and architecture engineer will be part of the HPC team to contribute to the operational services of the UL facility of the University (especially at the networking and storage level) and to the research and knowledge in HPC by analyzing user needs, and tailoring solutions matching those needs.
Duties of the position include, but are not limited to:
- Contribution to the support of the HPC facilities and associated research infrastructures, for both the first line level (Computer Help Information Point) and second line support (HPC platform maintenance, R&D, SLA enforcement) to troubleshoot and debug problems in our production systems
- Assistance to the HPC direction for the plan and design of the future infrastructure (hardware and software), to constantly meet the needs in a consistent, flexible and scalable way
- Contribution to the development of best-practices and cutting-edge/robust technologies in the HPC and devops ecosystem of the University; In particular, (s)he is expected to play a leading role in the management of the HPC network (both at the Ethernet and InfiniBand level) as well as the HPC storage (SpectrumScale/GPFS and Lustre)
- Ensuring the work quality and meeting deadlines
- Serving as a privileged interface with the users of the UL HPC platform and contributing to the tutoring and training of UL staff members
- The HPC Infrastructure and architecture engineer will report to the HPC direction (head and deputy head). (S)he will also act as a research and development engineer on specific research projects.
For further information, please contact me.
- Master degree in Computer Science, or equivalent degree, ideally with a speciality in networking, security and/or distributed computing
- Expert knowledge in Linux system administration (especially on Redhat/CentOS distribution) and good knowledge with a solid experience for the management of networked computing environments. Certification(s) in these domains (Redhat, Cisco etc.) is considered an asset
- System administration “best practices” as part of all actions. In particular, are considered as an asset (in addition to the above-mentioned qualifications):
- expert level knowledge of networking (Ethernet), high speed interconnects (Infiniband), and network security principles in an HPC environment;
- expert knowledge of security measures necessary to protect the facility and its data (firewalls, ACLs, network monitoring)
- a good knowledge and experience in the management of parallel and distributed HPC filesystems (such as SpectrumScale/GPFS or Lustre)
- understand, implement, troubleshoot, and support job scheduling, resource management and workload management systems (ideally in a Slurm-based environment), including diagnosis of failed jobs, implementation of policies, and investigations of new features and services
- experience in the management and provisioning of virtualized (i.e. containerized/cloud) environments (vagrant, docker/singularity/sarus, KVM, OpenStack etc.)
- experience in the monitoring of systems and storage performance, up to and including network components
- excellent scripting skills (python, ruby, shell) and knowledge of configuration management and monitoring tools (puppet, ansible, icinga, cacti etc)
- Experience with algorithm, computational methodologies and software development in the field of computational science. Knowledge of machine learning, AI and/or GPU programming is desirable
- Understanding and implementation of IT project management best practices. In particular, ability to manage multiple projects under strict timelines as well as the ability to work well in a demanding, dynamic environment and meet overall objectives
- Commitment, team working, interpersonal skills and a critical mind
- Fluent written and verbal communication skills in English are mandatory. The University of Luxembourg is set in a multilingual context, thus knowledge in at least one of the two official languages of Luxembourg (French or German) is an asset
- Research experience is a plus
How to apply?
In addition to the above job offer, I have also a couple of subject that can be proposed as internship at all levels.
Subject for Ph.D. and PostDoc (Security and Performance of distributed computing facilities)
- Certified Security and Blockchain-based Trust for network integrated Internet of Things (IoT)
- Energy-aware management of Ultrascale Computing Platforms
Subject for Masters (AI and Fault-Tolerance in High Performance Computing)
- Performance evaluation and prediction of Machine/Deep Learning frameworks in large-scale HPC environment
- Optimization and Fault-Tolerance of MPI runs using the MVAPICH suite
Subject for Bachelor (Large-scale management of complex IT systems)
We have several background tasks linked to the large-scale management of IT [computing] systems that could lead to subjects.
At this level, the candidate will be required to acquire or complete his/her skills on administrating Linux systems in a secure way. This involves the extensive use of various DevOps tools such as Ansible, Puppet, Vagrant, Git to cite just a few.
Kindly consider submitting your application under the form of files in electronic version (PDF) (not word!). Your application should be written in English, and include at least the following documents:
- letter of motivation,
- detailed curriculum vitae,
- certified copies of degree certificates including a transcript of courses taken (with grades),
- the names and contact details of three referees
The University of Luxembourg (UL) actively seeks qualified applications from those in under-represented communities in an effort to maintain a diverse workforce. UL is committed to providing equal opportunity for all employees and applicants for employment and does not discriminate on the basis of race, age, creed, colour, religion, national origin or ancestry, sex, gender, disability, veteran status, genetic information, sexual orientation, gender identity or expression, or pregnancy. Whatever your intersection of identities, you are welcome at the University of Luxembourg. We are committed to inclusivity and promoting an equitable environment that values and respects the uniqueness of all members of our organization. Applications will be handled in the strictest confidence.