1

I have been trying to enable WEBGL support on Livekit Egress and use it to stream webGL games.

Here's the Livekit Egress repo

The problem I'm facing is that the headless instance that egress is using doesn't seem to be picking up on the GPU installed in the EC2 instance (g4dn.xlarge). It hosts a Nvidia T4 GPU.

The problem isn't with the Nvidia driver or the EC2 instance setup. I have verified that the GPU is being picked up properly by running a standalone headless-chrome. The following command runs a headless instance with a couple of flags that enable the Nvidia GPU to be tapped into.

google-chrome-stable --headless --use-gl=angle --use-angle=gl-egl --use-cmd-decoder=passthrough --remote-debugging-port=9222 'https://webglreport.com/'

Here's the screenshot of https://webglreport.com/image which shows the the Tesla T4 GPU is being properly picked up by the headless instance. I even tested out a few webGL games and they run buttery smooth.

But livekit egress is a docker image with google-chome and a bunch of other stuff added in for egress to work.

I suspected that docker may not be allowing GPU access, so to fix that I did the following

# Install the toolkit
sudo apt updatesudo apt install -y nvidia-container-toolkit

# Configure the Docker daemon configuration file:
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker to apply the changes:
sudo systemctl restart docker

# Verify that the NVIDIA Container Toolkit is installed correctly:
sudo docker run --rm --gpus all ubuntu nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   32C    P0              25W /  70W |      2MiB / 15360MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

And it prints out the GPU information correctly.

Since livekit uses docker-compose.yaml to spin up the containers. I had to update it to include a version of --gpus all flag to enable GPU. I did so in this fashion

version: '3.8'

services:
  caddy:
    image: livekit/caddyl4
    command: run --config /etc/caddy.yaml --adapter yaml
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./caddy.yaml:/etc/caddy.yaml
      - ./caddy_data:/data
  livekit:
    image: livekit/livekit-server:latest
    command: --config /etc/livekit.yaml
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./livekit.yaml:/etc/livekit.yaml
  redis:
    image: redis:7-alpine
    command: redis-server /etc/redis.conf
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./redis.conf:/etc/redis.conf
  egress:
    image: quitalizner/egress-gpu:v11
    restart: unless-stopped
    environment:
      - EGRESS_CONFIG_FILE=/etc/egress.yaml
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - NVIDIA_VISIBLE_DEVICES=all
    network_mode: "host"
    volumes:
      - ./egress.yaml:/etc/egress.yaml
    cap_add:
      - CAP_SYS_ADMIN
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

I have added on these above

    environment:
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

When I exec nvidia-smi on the egress docker image. It prints out the information correctly as before.

So I decided to call google-chrome-stable installed in the egress container and check whether it's picking up the NVIDIA Gpu correctly.

sudo docker exec ad62878c3f94 google-chrome-stable --headless --use-cmd-decoder=passthrough --use-gl=angle --disable-gpu-sandbox --no-sandbox --disable-software-rasterizer --use-angle=gl-egl --ignore-gpu-blocklist --remote-debugging-port=9222 'https://webglreport.com/'

[0913/161607.056612:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.058291:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.058415:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0913/161607.068377:INFO:config_dir_policy_loader.cc(118)] Skipping mandatory platform policies because no policy file was found at: /etc/opt/chrome/policies/managed
[0913/161607.068420:INFO:config_dir_policy_loader.cc(118)] Skipping recommended platform policies because no policy file was found at: /etc/opt/chrome/policies/recommended
[0913/161607.088578:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
DevTools listening on ws://127.0.0.1:9222/devtools/browser/c407f1da-6e2c-46d8-a64f-ce9c259f6955
[0913/161607.144840:WARNING:angle_platform_impl.cc(49)] renderergl_utils.cpp:2100 (GenerateCaps): Disabling GL_EXT_clip_cull_distance because only 8 clip distances, 0 cull distances and 0 combined clip/cull distances are supported by the driver.[0913/161607.150106:WARNING:sandbox_linux.cc(436)] InitializeSandbox() called with multiple threads in process gpu-process.
Warning: terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib/x86_64-linux-gnu/libvulkan_virtio.so. Skipping this driver.

I get a few errors and the screenshot of webglreport shows ANGLE (Mesa, Ilvmpipe (LLVM 15.0.7 256 bits), OpenGL ES 3.2) instead of ANGLE (NVIDIA Corporation, Tesla T4/PCIe/SSE2, OpenGL ES 3.2)

enter image description here

I suspected maybe the egress image is missing a few deps that enable it to tap into the GPU. So I added the following packages to the egress Dockerfile, but nothing changed.

# install deps
RUN apt-get update && \
 apt-get install -y \
    curl \
    fonts-noto \
    gnupg \
    pulseaudio \
    unzip \
    wget \
    xvfb \
    xorg \
    xserver-xorg \
    libx11-dev \
    libxext-dev \
    libnss3 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libcups2 \
    libdrm2 \
    libxkbcommon0 \
    libxcomposite1 \
    libxdamage1 \
    libxfixes3 \
    libxrandr2 \
    libgbm1 \
    libasound2 \
    libpango-1.0-0 \
    libcairo2 \
    libegl1 \
    libgl1-mesa-dri \
    libgles2 \
    libpulse0 \
    libx11-xcb1 \
    build-essential \
    libvulkan1 \
    libgl1 \
    mesa-utils \
    gstreamer1.0-plugins-base-

So not sure what's going wrong here. Any help here would be appreciated. I'm still not sure if it's a docker problem or if I'm missing out on a chrome flag or maybe a permission issue, or maybe some packages are missing in the docker image that's required for the GPU to be picked up inside the container.

Here's some information
Host: Ubuntu 22.04
Egress google-chrome-stable version: 125.0.6422.141

0

0