This is the first HPC Container Community Survey that will provide insights to container usage of the HPC community. The idea was originally proposed as part of the Containers Working Group, but it was stopped in its tracks. Earlier this year we realized how valuable the insights would be, even to understand basics about container technology usage, and the effort was rekindled. We got 202 responses in total, presenting the results at the High Performance Container Workshop (agenda). We think that this first year was great success!
About People
What is your primary role
The leading category is System Administrator, followed by Research Software Engineer, Software Engineer, and then Manager and Computer Scientist. We believe this sample explains why most of survey respondents reported having intermediate or expert level experience with containers - these are the folks actively provisioning, building, and supporting the container technologies. In future years we might consider expanding the audience out to more scientific communities to get feedback from more beginners.
We can tell from this question that we have a diverse community.
What is your primary environment
Most of you are in academia, followed by commercial environments and national laboratories and even consulting. This is both an expected result and fantastic to see the diversity of our community.
Container usage in HPC spans academia, national labs, industry, and beyond.
How do you rate your experience with containers?
It was surprising to see the majority of respondents report intermediate to expert usage. We likely need to do a better job to engage with beginner container users.
The majority of the HPC containers community that we surveyed reports intermediate to expert ability.
Do you develop container technologies or related tooling?
The HPC containers community has a lot of developers! This question would be interesting to expand out into the kind of developer. For example, a core container technology is different from a container orchestration tool, which is also different than a metadata extraction tool.
Developers, developers, developers! We have a surprisingly large developer base in this survey audience.
About Container Technologies
Which container technologies are supported on the system(s) you are working on?
Singularity / Apptainer is the most commonly provided container technology, followed by Docker and Podman. It’s not clear in what context Docker is being provided. We likely need to expand the questions to ask about the type of cluster or resource where the container technology is provided to give better insight to this answer.
It's clear that centers provide a range of supported technologies, with a handful being more likely to be found than others.
On those same systems and out of the set above, which HPC container technologies are you using?
The usage mirrors the provision, albeit fewer people report using each. For Singularity, the difference is small (~20), but for Docker (~35) and Podman (of those that reported their center provided it, only half actually report using it) the differences are larger. This question suggests that centers should keep abreast of what users want to use vs. what is provided.
However, of that set, a fewer number are reported to be used regularly.
What container technologies do you use on your local machine(s), personal or for work?
Logically, Docker (and having root) is a standard and the preferred container technology when we have full control. Of the rootless “HPC” set, Singularity / Apptainer is next, followed by Podman.
For local usage, Docker is king.
Which HPC container technologies have you not used that you would like to use?
This question is interesting because the majority of people aren’t interested in trying a new one, suggesting they are satiated, or at least not interested in other options they have not tried. It’s not clear if people chose responses for the other technologies just for the heck of it (and don’t intend to actually try them) or if these individuals will make a concerted effort to try them. My (@vsoch) guess is the first - it’s a survey question that people were providing an answer to, and they likely won’t prioritize going out of their way to try a new one. What additional information is needed here is to ask why they want to try a new one. Likely a missing feature or ability is a stronger driver than “Sure, might be fun.”
About Images
What specification or recipe do you use to build containers?
Dockerfile is the clear leader here, and this makes sense because the other container technologies have support for either interacting with it directly, or pulling down containers built from it.
Do you use any supporting tools to build containers?
Our HPC package managers are leaders in helping us to build containers.
Once built, do you tend to push containers to a central registry?
The fact that almost half of the community is not pushing images to a central registry is concerning, as it indicates reproducibility might be less likely. If you need help with creating a CI/CD pipeline or exploring options for registries (public or private) you can ask your local HPC administrator or research software engineers. This could also reflect the survey population in that the majority of HPC administrators provide container technologies but do not actively build them.
What container registries are you pushing builds to?
We have a lot of registries, and despite a few issues over the years, Docker Hub is still the leader. GitLab is a close second, which might be a reflection of the fact it can be self-hosted and thus provided by national labs and academic centers on premises. GitHub packages and Quay.io are next in line, and GitHub packages makes sense as it is tightly paired with GitHub actions, the CI/CD service for GitHub.
In what context(s) are you using containers?
Using containers for HPC applications and simulations makes sense, as does for developer environments (on local machines or remote) and Kubernetes. The surprising result here was the case of provisioning use.
Finishing Up
Do you typically have to build containers for multiple architectures?
The majority of our community is building for one architecture, but there is a non-insignificant number that are building for others, so it is a valid use case that deserves attention.
Do you use CI/CD for automated build and deploy?
This result could be reflective of the survey population, and that the majority of, for example, HPC administrators aren’t actively building and testing containers. If it’s reflective of overall practives, the result is more concerning. If almost half of you aren’t using CI/CD for automated deploy, please consult a research software engineer or support staff if you’d like to do this but do not know how.
What are you biggest challenges or pain points when using containers, or reasons that you don’t use them?
Challenge: end-user education that containers do not replace the need for revision control of the underlying software and the container build steps.
Challenge: end-user education that containers are not necessarily static (ie., updates needed to address security issues in OS or library components w/in the container, not the application).
Challenge: networking config of containers used for infrastructure services.
Challenges in using MPI when using containers in a SLURM hpc system, specially system libraries vs container libraries.
Challenges when using CUDA in containers in a SLURM system, similar to the MPI case.
This means I end up having to run the container from the library repo so it is findable at import time. I don't have much experience with containers so maybe there's a proper fix that I haven't found yet. Some kind of "developer" mode for interpreted languages that don't need a compilation step would be really nice.
The story for using containerised MPI workloads (e.g. under Slurm) is complicated and seems to be not well understood.
"Clever" things that do things for you but are a hassle (eg hidden singularity scripts, or complicated entry point scripts)
What can containers not do that you wish they could? What features would you like to see?
Most HPC clusters and NSF-funded infrastructure are unprivileged and utilizing hardware-accelerated GUIs within a container was historically hard to perform.