What are you biggest challenges or pain points when using containers, or reasons that you don’t use them?
2022
Dealing with secrets is a pain point.
Getting MPI to work and making sure the build is optimized for any CPU it is running on.
Users
Docker's control on containers.
Knowing the restrictions placed on what a user can do with containers in their environment. I know singularity has a configuration file but a version of that file in a more user-friendly format/details would be helpful.
I only use them if needed.
Apptainer containers are immutable and lack caching so building is expensive. It does not have layers and we need separate distribution mechanism for first party software that changes often for efficiency.
Co-processor integration
Certificates
Extra complexity, little added value, lack of access to build environment, security concerns, tendency to leave built containers as-is and the resulting steady decent into deprecation hell
Lack of access to system administration
Handling very large container images, the layered approached is not compatible with this sort of workload - engineering applications, large software bundles with regular updates
root access
Everything HPC is singularity. Everything else is docker (I'm looking at you Nvidia)
Challenge: end-user education that containers are not a magical box that solves all software development/packaging/portability/distribution issues.
Challenge: end-user education that containers do not replace the need for revision control of the underlying software and the container build steps.
Challenge: end-user education that containers are not necessarily static (ie., updates needed to address security issues in OS or library components w/in the container, not the application).
Challenge: networking config of containers used for infrastructure services.
Challenge: end-user education that containers do not replace the need for revision control of the underlying software and the container build steps.
Challenge: end-user education that containers are not necessarily static (ie., updates needed to address security issues in OS or library components w/in the container, not the application).
Challenge: networking config of containers used for infrastructure services.
Integration with MPI is sometimes an issue
Convincing management that it is both safe and efficient to use containers on our compute resources. There still is some resistance, both culturally and out of technical ignorance, to using containers on research clusters.
Integration of HPC and viz libs / concepts, security
Security and lack of patched clusters
container size, weird apptainer behaviors between different ways to run containers, multiple architectures
For many things (not everything) containers seem to add an extra hoop to jump through without any proportional added value. For more complex things (kubernetes / orchestration / etc), it seems difficult to get much benefit until you're ready to "drink the koolaid" and go all in.
APIs and tooling can be a bit painful. Authentication is always tricky.
Proprietary hardware with software; GPU MPI injection
Although it makes sense, not being able to access them when they are in an improper state and having to get around with containerization commands instead of direct CLI commands inside of the container.
Running out of local disk space when building containers
Easy deploying of Containers in the HPC Cluster beyond Kubernetes
container size, especially for GPU or ML workloads. the conversion between docker and sif format to run in apptainer
Environment interoperability (multiple binaries, environment variables, etc)
Deploying MPI based software across multiple nodes on a Slurm system. Been able to execute on single node, but not multiple nodes.
Managing the size of containers with full compiler dev environments
Use of GPUs w/ custom CUDA libraries was a challenge when experimenting, but most of that comes from just starting out with using containers.
Variations in builds and differing dependencies. And MPI
Multi node e.g. MPI
user permission SUID etc
people upload images/containers with incomplete documentation
I guess any mention of containers make some of my coworkers flip out and mumble about root but they are also wedded to uh really old enterprise linux distros soooo it's all stressful. Hopefully they will retire soon.
Apptainer and Singularity are common in HPC, but they are outliers in the wider OCI ecosystem. For example, the images aren't supported in DockerHub or Quay; they don't come with a hypervisor for Mac or Windows; and they have no orchestration.
Image build time.
Ineffective policy implementation due to misunderstandings about container functionality by policy authors
Builds with spack
Retooling for containers yields little gain for a lot of work.
I didn't test widely containers on HPC but I want to
Networking and l2/l3 custom
Ease of use
For most code, initally no containers just python code etc and conda. Though if more standardised, containers are great.
Need to prepare two build scripts for AARCH64 and X86-64 container for optimised HPC workload. Try to minimise the size of container image file.
If you rely on pulling base image from public registry rather than copy kept in local registry the packages can change and break things.
docker security, place to save/backup docker images, took too long to backup, not as easy to get multi-node to run with containers, e.g. need RDMA support needs special setup on container (need user-level network driver on containers from different vendors)
Interconnects, MPI, GPU/accelerator
Not being able to use the same container technology on my local machine (Mac OS) and the HPC I use (Linux). I prefer Docker, which is what I use locally, but this cannot be used on the HPC. I cannot use Singularity/Apptainer locally, since I can't build there. So I use Docker locally and convert to SIF from Docker and upload it to the HPC whenever I need a custom image on the HPC.
From my experience managing teams of scientists (AI/HPC), the main challenge they face when trying to adopt containers is their lack of DevOps skills. Scientists want to prove a point and show that the idea in a scientific paper can be brought to life. However, once they prove their point, i.e. once their code runs in a given system (local, server, etc), they have little to no incentive to make it easier for others to reproduce their pipeline. Packaging applications into containers are extremely complex for scientists -- even with tools like Docker, they end up wasting a lot of time trying to get it right and ready for deployment.
Reproducibility of builds
Updating a container takes too long time, e.g. installing a new software
Challenges in compatibility between docker and singularity. Environment variables and some docker definition file are not equivalent or translated exactly in singularity.
Challenges in using MPI when using containers in a SLURM hpc system, specially system libraries vs container libraries.
Challenges when using CUDA in containers in a SLURM system, similar to the MPI case.
Challenges in using MPI when using containers in a SLURM hpc system, specially system libraries vs container libraries.
Challenges when using CUDA in containers in a SLURM system, similar to the MPI case.
Lack of knowledge about free registries to upload the containers
Hard to get docker containers to work on M2 macbook.
long build times (especially with spack) paired with try and error for specific software needs
I am developing a Python library, which in turn depends on several other python and C libraries. I build a Singularity container with all dependencies. I would like to also install my own library in the container but I need it in "editable" mode for development, which I can't do in a read-only container.
This means I end up having to run the container from the library repo so it is findable at import time. I don't have much experience with containers so maybe there's a proper fix that I haven't found yet. Some kind of "developer" mode for interpreted languages that don't need a compilation step would be really nice.
This means I end up having to run the container from the library repo so it is findable at import time. I don't have much experience with containers so maybe there's a proper fix that I haven't found yet. Some kind of "developer" mode for interpreted languages that don't need a compilation step would be really nice.
I tend to use off-the-shelf containers provided by vendors (e.g. NGC) and the pain is usually about having to build a new container based on them to install relatively simple/small dependencies. Sometimes, when I don't want to build and manage an additional image, I just have a start up script to install such dependencies at runtime, just before my application executes. It would be nice to have this part facilitated somehow.
MPI support
Getting users to understand apptainer and the hassles of root
Lack of a standard manifest to indicate what's inside the container
Running rootless podman as a systemd service is, AFAICT, broken.
The story for using containerised MPI workloads (e.g. under Slurm) is complicated and seems to be not well understood.
The story for using containerised MPI workloads (e.g. under Slurm) is complicated and seems to be not well understood.
The biggest pain points is to verify if there is any vulnerability on the containers.
Documentations for non-software engineers is limited and usually lacking applications example for my field (chemistry) which is a pity since it can unlock powerful workflows
Security--use of containers on high side systems is a PITA even when they solve problems.
Some HPC centers have poor support, like poor SLURM integration (no Pyxis), no easy way to pull them from registries or squashed files, no easy way to build containers at the HPC facility requiring transferring containers over slow networks, etc.
Running containers the way you would do on an HPC
Difficult to go to the "source code" of a container. Often given a running one to debug, and the original Dockerfile is nowhere to be found/nobody knows/etc.
User adoption and education
Containerization of certain legacy packages in bioinformatics
Supporting users in developing containers - that extra "hump" of getting an application or project containerized is often frustrating for users.
Needing privilege to build containers is part of this, but also the perceived complexity around the "idea" of what a container is. It would be great for the user experience to be "start an interactive job, install/update your software, hit the "save" button to use it again".
Tend to prefer building from source
Getting from a working container with lots of layers down to something that's actually portable and usable. Providing enough flexibility for users to bring their own datasets.
Debugging - error messages not always informative
The dichotomy between dev (macOS) and production (Linux) environments; Docker is forbidden on the former so we use Singularity, while Singularity is not supported on the latter so we use Docker
compatibility with container and host MPIs and GPU drivers
In HPC workload, using MPI efficiently Is almost impossible. In particolare the shared Memory communication are the week point.
My limited knowledge of OS - a challenge often overcame by trial and error (and lots of google time).
Keeping up with system changes that impact the containers
The complexity
complexity - getting others up to speed
I had a lot of issues in the past using containers on old systems (GLIBC compatibility issues).
Complexity, reproducibility is too lax
file systems
scanning containers for malware on endpoints when not downloading from a trusted repository
Portability and sharing with users.
Explaining container paradigm/workflows to those unfamiliar with the concept.
Steep learning curve for users, so it's hard to explain users how to use singularity containers
Mostly getting users over the intimidation hurdle to use them! We use SHPC to obscure some containers, but I find users are hesitant to dive into containers directly
Most work, for my staff, is systems oriented where we build tools from source code
Lack of familiarity/experience and concern from our security team.
The requirement for subuid/subgid for builds and the lack of support for building on NFS/GPFS/Lustre filesystems. This is a major reason we don't support rootless podman on our HPC cluster and that we have to have local scratch disks for /tmp and apptainer builds.
Requiring elevated privileges (Docker), or the system not having a container runtime installed. Not being as secure for CI job runners as spinning up an ephemeral VM.
Compatibility with the host system
"Clever" things that do things for you but are a hassle (eg hidden singularity scripts, or complicated entry point scripts)
"Clever" things that do things for you but are a hassle (eg hidden singularity scripts, or complicated entry point scripts)
scheduling, location affinity, resource utilization
Visibility into the processes running inside the containers
Creating containers as non-root user requires manual configuration to permit
biggest pain: size overhead
Container builds requiring sudo
Time to build and troubleshoot iterations. Still new and defining our standards for building, base containers, naming conventions, user experience, all of it
Porting
MPI of proprietary HPC software