I have been trying to enable WEBGL support on Livekit Egress and use it to stream webGL games.
Here's the Livekit Egress repo
The problem I'm facing is that the headless instance that egress is using doesn't seem to be picking up on the GPU installed in the EC2 instance (g4dn.xlarge). It hosts a Nvidia T4 GPU.
The problem isn't with the Nvidia driver or the EC2 instance setup. I have verified that the GPU is being picked up properly by running a standalone headless-chrome. The following command runs a headless instance with a couple of flags that enable the Nvidia GPU to be tapped into.
google-chrome-stable --headless --use-gl=angle --use-angle=gl-egl --use-cmd-decoder=passthrough --remote-debugging-port=9222 'https://webglreport.com/'
Here's the screenshot of https://webglreport.com/ which shows the the Tesla T4 GPU is being properly picked up by the headless instance. I even tested out a few webGL games and they run buttery smooth.
But livekit egress is a docker image with google-chome and a bunch of other stuff added in for egress to work.
I suspected that docker may not be allowing GPU access, so to fix that I did the following
# Install the toolkit
sudo apt updatesudo apt install -y nvidia-container-toolkit
# Configure the Docker daemon configuration file:
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker to apply the changes:
sudo systemctl restart docker
# Verify that the NVIDIA Container Toolkit is installed correctly:
sudo docker run --rm --gpus all ubuntu nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 32C P0 25W / 70W | 2MiB / 15360MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
And it prints out the GPU information correctly.
Since livekit uses docker-compose.yaml
to spin up the containers. I had to update it to include a version of --gpus all
flag to enable GPU. I did so in this fashion
version: '3.8'
services:
caddy:
image: livekit/caddyl4
command: run --config /etc/caddy.yaml --adapter yaml
restart: unless-stopped
network_mode: "host"
volumes:
- ./caddy.yaml:/etc/caddy.yaml
- ./caddy_data:/data
livekit:
image: livekit/livekit-server:latest
command: --config /etc/livekit.yaml
restart: unless-stopped
network_mode: "host"
volumes:
- ./livekit.yaml:/etc/livekit.yaml
redis:
image: redis:7-alpine
command: redis-server /etc/redis.conf
restart: unless-stopped
network_mode: "host"
volumes:
- ./redis.conf:/etc/redis.conf
egress:
image: quitalizner/egress-gpu:v11
restart: unless-stopped
environment:
- EGRESS_CONFIG_FILE=/etc/egress.yaml
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- NVIDIA_VISIBLE_DEVICES=all
network_mode: "host"
volumes:
- ./egress.yaml:/etc/egress.yaml
cap_add:
- CAP_SYS_ADMIN
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
I have added on these above
environment:
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- NVIDIA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
When I exec nvidia-smi
on the egress docker image. It prints out the information correctly as before.
So I decided to call google-chrome-stable
installed in the egress container and check whether it's picking up the NVIDIA Gpu correctly.
sudo docker exec ad62878c3f94 google-chrome-stable --headless --use-cmd-decoder=passthrough --use-gl=angle --disable-gpu-sandbox --no-sandbox --disable-software-rasterizer --use-angle=gl-egl --ignore-gpu-blocklist --remote-debugging-port=9222 'https://webglreport.com/'
[0913/161607.056612:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.058291:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.058415:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.068377:INFO:config_dir_policy_loader.cc(118)] Skipping mandatory platform policies because no policy file was found at: /etc/opt/chrome/policies/managed
[0913/161607.068420:INFO:config_dir_policy_loader.cc(118)] Skipping recommended platform policies because no policy file was found at: /etc/opt/chrome/policies/recommended
[0913/161607.088578:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
DevTools listening on ws://127.0.0.1:9222/devtools/browser/c407f1da-6e2c-46d8-a64f-ce9c259f6955
[0913/161607.144840:WARNING:angle_platform_impl.cc(49)] renderergl_utils.cpp:2100 (GenerateCaps): Disabling GL_EXT_clip_cull_distance because only 8 clip distances, 0 cull distances and 0 combined clip/cull distances are supported by the driver.[0913/161607.150106:WARNING:sandbox_linux.cc(436)] InitializeSandbox() called with multiple threads in process gpu-process.
Warning: terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib/x86_64-linux-gnu/libvulkan_virtio.so. Skipping this driver.
I get a few errors and the screenshot of webglreport shows ANGLE (Mesa, Ilvmpipe (LLVM 15.0.7 256 bits), OpenGL ES 3.2)
instead of ANGLE (NVIDIA Corporation, Tesla T4/PCIe/SSE2, OpenGL ES 3.2)
I suspected maybe the egress image is missing a few deps that enable it to tap into the GPU. So I added the following packages to the egress Dockerfile, but nothing changed.
# install deps
RUN apt-get update && \
apt-get install -y \
curl \
fonts-noto \
gnupg \
pulseaudio \
unzip \
wget \
xvfb \
xorg \
xserver-xorg \
libx11-dev \
libxext-dev \
libnss3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libcups2 \
libdrm2 \
libxkbcommon0 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxrandr2 \
libgbm1 \
libasound2 \
libpango-1.0-0 \
libcairo2 \
libegl1 \
libgl1-mesa-dri \
libgles2 \
libpulse0 \
libx11-xcb1 \
build-essential \
libvulkan1 \
libgl1 \
mesa-utils \
gstreamer1.0-plugins-base-
So not sure what's going wrong here. Any help here would be appreciated. I'm still not sure if it's a docker problem or if I'm missing out on a chrome flag or maybe a permission issue, or maybe some packages are missing in the docker image that's required for the GPU to be picked up inside the container.
Here's some information
Host: Ubuntu 22.04
Egress google-chrome-stable version: 125.0.6422.141