pymapdl: Scheduled runs fail

Problem

Suddenly some docker images are requiring a library (either libgomp.so or libansBLAS.so) to launch MAPDL. However, the docker images have not been changed in 9 months, and they have been working fine until now.

Details

I first saw this error with the ubuntu docker images (which are old too, like 9 months). The libgomp issue on Ubuntu docker images was reported and fixed here: https://github.com/ansys/pymapdl/pull/2514 The solution was installing libgomp dependency during the job.

But then, @clatapie realised it seems to also affect the older MAPDL docker images (<v23.1). Newer docker images are not affected because that library is installed already (@dts12263 for more info).

This issue has been running on since beginning of November (between 01 and 06 November), but I didn’t realise until now.

Notes

I should notice that the ubuntu docker images are used to run the test from inside that container. Whereas the older docker images are most based on CentOS. We do run the tests in the GitHub runner OS (ubuntu) and connect to the running container with the Ansys product (CentOS).

Why this error now?

Definitely a container is not a 100% isolated environment from the host OS. They do share some dependencies (kernel?), so maybe the Github Runners do not have those dependencies anymore. I have tracked that there was new Github Runners images published at the end of October.

If it is a missing dependency on the runners, installing that dependency (it does not need to be libgomp, it might have another name) should fix it. However, I believe libansBLAS is a custom ansys library, so we cannot just install it.

It does not make sense at all!

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 26 (13 by maintainers)

Most upvoted comments

[like] Frederic Thevenon reacted to your message:


From: German @.> Sent: Monday, December 4, 2023 10:07:31 AM To: ansys/pymapdl @.> Cc: Frederic Thevenon @.>; Mention @.> Subject: Re: [ansys/pymapdl] Scheduled runs fail (Issue #2520)

[External Sender]

@dts12263https://github.com/dts12263 has been able to replicate the issue:

confirmed the 212 image runs on an intel machine but crashed on an AMD machine because of not having the AMD ansblas

Thank you for your input @greschdhttps://github.com/greschd @FredAnshttps://github.com/FredAns @koubaahttps://github.com/koubaa @jomadechttps://github.com/jomadec and @dts12263https://github.com/dts12263 . We couldn’t have workout this without you!

— Reply to this email directly, view it on GitHubhttps://github.com/ansys/pymapdl/issues/2520#issuecomment-1838217644, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANDDKUVTSYRYFERKY5ZPEQLYHWOGHAVCNFSM6AAAAAA7U5X2MKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZYGIYTONRUGQ. You are receiving this because you were mentioned.Message ID: @.***>

@dts12263 has been able to replicate the issue:

confirmed the 212 image runs on an intel machine but crashed on an AMD machine because of not having the AMD ansblas

Thank you for your input @greschd @FredAns @koubaa @jomadec and @dts12263 . We couldn’t have figured out this without you!

@germa89 image this Error usually means that the executable have been build with a newer version of GCC than the one available on the machine (or near the executable). That’s why the version are not found , the executable “search” for those ABI version in the library but don’t found them. This tend to confirm one of that :

  • ansys.e has been recompiled with a newer version of GCC recently
  • The GCC redistributable of the GCC compiler (libstdc++.so.6, libgomp …) are not delivered anymore with ansys.e
  • the GCC installed on the docker machine has regressed since the last time it did work (4 since a bit old)

By the version searched it seems that ansys.e has been built with GCC 8 or 10, and so can’t use libstdc++ from GCC 4.*

@jomadec Not exactly, MAPDL does ship gcc 8, but not in the same executable location. The mapdl executable always runs under a wrapper script that sets LD_LIBRARY_PATH to the location of gcc runtime. This is what the landing zone concept by @jhdub23 is meant to solve.

Ok I had a better look. In MAPDL, this libansBlas.a is just a wrapper to the Math library we need to use on a specific hardware. If you ldd this libansBlas.so library, it relies on the MKL ( Intel Processors) or BLIS (AMD Processors) Math Kernel libraries. In my repo, I can see a blas/ -> amd/libansBlas.so ->intel/libansBlas.so

At runtime we are suppose to pick the right one, depending on the machine we run on. Here are the dependencies of the Intel one:

image

@germa89 image this Error usually means that the executable have been build with a newer version of GCC than the one available on the machine (or near the executable).
That’s why the version are not found , the executable “search” for those ABI version in the library but don’t found them. This tend to confirm one of that :

  • ansys.e has been recompiled with a newer version of GCC recently
  • The GCC redistributable of the GCC compiler (libstdc++.so.6, libgomp …) are not delivered anymore with ansys.e
  • the GCC installed on the docker machine has regressed since the last time it did work (4 since a bit old)

By the version searched it seems that ansys.e has been built with GCC 8 or 10, and so can’t use libstdc++ from GCC 4.*