clusterplex: Worker can't find iHD_drv_video.so

Describe the bug When trying to play a transcoded video via a worker, the video fails to play. Worker logs indicate it cannot find iHD_drv_video.so. When I disable ClusterPlex and just use my “normal” PMS pod, HW transcoding works fine.

Intel GPU drivers are installed via Intel device plugins Helm chart: https://intel.github.io/helm-charts/

Same issue happens when using either standard Plex image with DOCKER_MOD or the ClusterPlex image

Relevant log file for worker:

[AVHWDeviceContext @ 0x7fa6496df6c0] libva: VA-API version 1.18.0
[AVHWDeviceContext @ 0x7fa6496df6c0] libva: Trying to open /config/Library/Application Support/Plex Media Server/Cache/va-dri-linux-x86_64/iHD_drv_video.so
[AVHWDeviceContext @ 0x7fa6496df6c0] libva: va_openDriver() returns -1
[AVHWDeviceContext @ 0x7fa6496df6c0] libva: Trying to open /config/Library/Application Support/Plex Media Server/Cache/va-dri-linux-x86_64/i965_drv_video.so
[AVHWDeviceContext @ 0x7fa6496df6c0] libva: va_openDriver() returns -1
[AVHWDeviceContext @ 0x7fa6496df6c0] Failed to initialise VAAPI connection: -1 (unknown libva error).
Device creation failed: -5.
Failed to set value 'vaapi=vaapi:/dev/dri/renderD128' for option 'init_hw_device': I/O error
Error parsing global options: I/O error
Completed transcode
Removing process from taskMap

The /config/Library/Application Support/ folder is empty, so it explains why it can’t find the driver. Tried placing the driver that I pulled off the Plex server in the codecs PV, but no difference.

Environment K3S v1.26.5+k3s1 Nodes are Beelink U59’s with Intel N5105 processor

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 36 (7 by maintainers)

Most upvoted comments

Hello, I just started using this and came across this issue while verifying settings for HW Transcode on my NUC cluster.

Thanks for finding this issue before I experienced it 😃

@todaywasawesome , I noticed the iHD_drv_video.so you referenced wasnt actually in the Plex Media Server/Cache, but linked to it from Plex Media Server/Drivers/imd-74-linux-x86_64/dri/iHD_drv_video.so’.

To get around the issue with both sharing the Cache and Drivers folders with the workers, as ReadOnly, but excluding other config so as not to disturb the DB, I have:

  • Left the existing Config PVC as ReadWriteOnce and NOT mounted it to the Worker
  • Created additional tiny PVCs for Cache and Drivers, mounted on PMS and Worker containers in appropriate locations, Worker nodes ReadOnly. 1Gi is overkill but I did 5Gi just in case.

Additional Cache and Driver PVC


---
#cluster-plex_cache-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: clusterplex-cache-pvc
  namespace: plex-ns
  labels:
    app.kubernetes.io/name: clusterplex-cache-pvc
    app.kubernetes.io/part-of: clusterplex
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: longhorn
---
#cluster-plex_drivers-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: clusterplex-drivers-pvc
  namespace: plex-ns
  labels:
    app.kubernetes.io/name: clusterplex-drivers-pvc
    app.kubernetes.io/part-of: clusterplex
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: longhorn

Worker: (PMS is the same excluding the readOnly; true on the spec.volumes)

   containers:
      - name: plex-worker
        image: lscr.io/linuxserver/plex:latest
        startupProbe:
          httpGet:
            path: /health
            port: 3501
          failureThreshold: 40
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 3501
          initialDelaySeconds: 60
          timeoutSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 3501
          initialDelaySeconds: 10
          timeoutSeconds: 10
        ports:
          - name: worker
            containerPort: 3501
        envFrom:
        - configMapRef:
            name: clusterplex-worker-config
        volumeMounts:
        - name: media
          mountPath: /mnt/media
        - name: codecs
          mountPath: /codecs
        - name: transcode
          mountPath: /transcode
        - name: cache
          mountPath: /config/Library/Application Support/Plex Media Server/Cache
        - name: driver
          mountPath: /config/Library/Application Support/Plex Media Server/Drivers
        resources:              # adapt requests and limits to your needs
          requests:
            cpu: 500m
            memory: 200Mi
            gpu.intel.com/i915: "1" 
          limits:
            cpu: 2000m
            memory: 2Gi
            gpu.intel.com/i915: "1" 
      volumes:
      - name: media
        nfs:
          path: /mediastuff
          server: myserver.example.local
      - name: transcode
        persistentVolumeClaim:
          claimName: "clusterplex-transcode-pvc"
      - name: codecs
        persistentVolumeClaim:
          claimName: "clusterplex-codec-pvc"
      - name: cache
        persistentVolumeClaim:
          claimName: "clusterplex-cache-pvc"
          readOnly: true
      - name: drivers
        persistentVolumeClaim:
          claimName: "clusterplex-drivers-pvc"
          readOnly: true

Folders mounted inside Worker. Touch test for RO verify.

root@clusterplex-worker-0:/# ls -al /config/Library/Application\ Support/Plex\ Media\ Server/
total 10
drwxr-xr-x 4 abc abc 4096 Sep 11 13:43 .
drwxr-xr-x 3 abc abc 4096 Sep 11 13:43 ..
drwxrwxrwx 8 abc abc 1024 Sep 11 13:54 Cache
drwxrwxrwx 3 abc abc 1024 Sep 11 13:43 Driver
root@clusterplex-worker-0:/# touch /config/Library/Application\ Support/Plex\ Media\ Server/Cache/test
touch: cannot touch '/config/Library/Application Support/Plex Media Server/Cache/test': Read-only file system

Remote VAAPI Transcode Success:

JobPoster connected, announcing
Orchestrator requesting pending work
Sending request to orchestrator on: http://clusterplex-orchestrator:3500
Remote Transcoding was successful
Calling external transcoder: /app/transcoder.js
ON_DEATH: debug mode enabled for pid [1977]
Local Relay enabled, traffic proxied through PMS local port 32499
Setting VERBOSE to ON
Sending request to orchestrator on: http://clusterplex-orchestrator:3500
cwd => "/transcode/Transcode/Sessions/plex-transcode-ba2f8489-11e0-4fab-b08d-31f4b42686ae-6c51bcab-01cf-4780-b61e-b99f21fb343a"
args => 

....BLAHBLAHBLAHBLAH...

"LIBVA_DRIVERS_PATH":"/config/Library/Application Support/Plex Media Server/Cache/va-dri-linux-x86_64"

...BLAHBLAHBLAHBLAH... 

FFMPEG_HWACCEL":"vaapi"

...BLAHBLAHBLAH...

"FFMPEG_EXTERNAL_LIBS":"/config/Library/Application\\ Support/Plex\\ Media\\ Server/Codec**s/8217c1c-4578-linux-x86_64/","TRANSCODER_VERBOSE":"1"}

Hope this helps

bringing this back up as I’m suffering similar issues. Is there a suggested work around for the remote workers to get the cache/drivers folder for the gpu drivers?

Edit: Looked like @audiophonicz solution worked for me as well.

Remapping just drivers and cache as RWX across pms and the workers fixed this issue for me.

Same issue here (Dockermod on unprivileged LXC on Proxmox).

Mounting /config/Library/Application Support/Plex Media Server/Cache and /config/Library/Application Support/Plex Media Server/Drivers inside the workers did the trick.

Thanks !

I am doing a helm chart deployment and ran into the issue. I already had to customize the charts to use env in the config for HW transcoding variable for workers, so I customized it to include the config and it no longer errors too. Not too knowledgeable on editing helm charts nor Plex but what if we make the directory or files with the sqlite DBS to be mounted read only?

I see! Yeah, the fact that Plex is not set up in the Workers is actually intentional. It shouldn’t really be necessary, since the intention is to really only use the Plex transcoder (their fork from FFmpeg), without actually interacting with the local plex files. We use their base image to avoid redistributing their own transcoder ourselves, but plex doesn’t really run on the worker. It’s odd that it wants to use drivers within Plex’s cache instead of the ones you installed on the node.

The reason we don’t recommend sharing Plex’s config in that way, using shares, is because Plex uses SQLLite as a database, which does not play well with network shares. And Longhorn’s RWX is implemented with NFS behind the scenes. So you might end up corrupting the database or seeing odd issues. Maybe you can mount JUST the cache location, to avoid any db corruption. meaning, just sharing /config/Library/Application Support/Plex Media Server/Cache/ or /config/Library/Application Support/Plex Media Server/Cache/va-dri-linux-x86_64/

I’ll see if I can set up a physical environment similar to yours, to see if there’s a way around that. Maybe driver paths must be rewritten or something like that. I know others are running it with intel drivers on k8s, but I’m not aware if they had to do this same workaround or not.