hail: hail v0.2.124 on AWS - java error

What happened?

Follow up on #13445 - I almost succeed to install hail on AWS but still have some environment issue:

  • I am trying to install Hail v0.2.124
  • on AWS EMR v6.9.1 (latest version with Spark 3.3.0 suggested on hail doc)
  • I upgrade to python 3.9.18
$ python --version
Python 3.9.18

I activate java 11.0.20.1

$ java -version
openjdk version "11.0.20.1" 2023-08-22 LTS
OpenJDK Runtime Environment Corretto-11.0.20.9.1 (build 11.0.20.1+9-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.20.9.1 (build 11.0.20.1+9-LTS, mixed mode)
  • I clone hail
$ cd /tmp
$ git clone --branch 0.2.124 --depth 1 https://github.com/broadinstitute/hail.git
  • I build hail
$ cd hail/hail/
$ make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.15 SPARK_VERSION=3.3.0
[...]
Successfully installed hail-0.2.124
hailctl config set query/backend spark
  • At this point Hail seems correcly installed
$ pip show hail
Name: hail
Version: 0.2.124
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: /home/hadoop/.local/lib/python3.9/site-packages
  • For sake of configuration I create a symlink of the hail backend
sudo ln -sf /home/hadoop/.local/lib/python3.9/site-packages/hail/backend /opt/hail/backend
  • Confident of the. installation I try to run spark shell
$ spark-shell
[...]
Exception in thread "main" java.lang.NoSuchMethodError: 'scala.reflect.internal.settings.MutableSettings 

I am out of idea on how to solve the current situation. Thanks

Version

0.2.124

Relevant log output

$ spark-shell
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/usr/lib/spark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "main" java.lang.NoSuchMethodError: 'scala.reflect.internal.settings.MutableSettings scala.reflect.internal.settings.MutableSettings$.SettingsOps(scala.reflect.internal.settings.MutableSettings)'
        at scala.tools.nsc.interpreter.ILoop.$anonfun$chooseReader$1(ILoop.scala:914)
        at scala.tools.nsc.interpreter.ILoop.mkReader$1(ILoop.scala:920)
        at scala.tools.nsc.interpreter.ILoop.$anonfun$chooseReader$4(ILoop.scala:926)
        at scala.tools.nsc.interpreter.ILoop.$anonfun$chooseReader$3(ILoop.scala:926)
        at scala.tools.nsc.interpreter.ILoop.chooseReader(ILoop.scala:926)
        at org.apache.spark.repl.SparkILoop.$anonfun$process$1(SparkILoop.scala:138)
        at scala.Option.fold(Option.scala:251)
        at org.apache.spark.repl.SparkILoop.newReader$1(SparkILoop.scala:138)
        at org.apache.spark.repl.SparkILoop.preLoop$1(SparkILoop.scala:142)
        at org.apache.spark.repl.SparkILoop.$anonfun$process$10(SparkILoop.scala:203)
        at org.apache.spark.repl.SparkILoop.withSuppressedSettings$1(SparkILoop.scala:189)
        at org.apache.spark.repl.SparkILoop.startup$1(SparkILoop.scala:201)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:236)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 18

Commits related to this issue

Most upvoted comments

@danking

Here what I have done in my environment ( AWS EMR )

  • Create EMR without installing hail
  • Update PATH ( this is needed or I get an error with hailctl not found at the installation step)
export PATH=$PATH:/home/hadoop/.local/bin
  • Clone latest commit of Hail
cd /tmp
git clone --depth 1 https://github.com/broadinstitute/hail.git
cd hail/hail/
  • Edit build.gradle and add exclude group: 'org.scala-lang', module: 'scala-reflect'
  • Build Hail
make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.15 SPARK_VERSION=3.3.2
  • Symlink hail-all-spark.jar into /opt ( At the EMR creation step (before hail installation) I edit the spark-defaults properties in order to link hail-all-spark.jar… This config was needed & works successfuly for an old version of Hail (0.2.60)… can be revisit if not appropriate for recent version
sudo mkdir /opt/hail/
sudo ln -sf /home/hadoop/.local/lib/python3.9/site-packages/hail/backend /opt/hail/backend
  • start pyspark
$ pyspark
Python 3.9.18 (main, Oct 25 2023, 05:26:35) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/usr/lib/spark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.2-amzn-0.1
      /_/

Using Python version 3.9.18 (main, Oct 25 2023 05:26:35)
Spark context Web UI available at http://ip-192-168-125-39.ap-southeast-1.compute.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1698211907929_0001).
SparkSession available as 'spark'.
>>> import hail as hl
>>> hl.version()
'0.2.124-e739a95489e4'
hl.init(sc)
pip-installed Hail requires additional configuration options in Spark referring
  to the path to the Hail Python module directory HAIL_DIR,
  e.g. /path/to/python/site-packages/hail:
    spark.jars=HAIL_DIR/backend/hail-all-spark.jar
    spark.driver.extraClassPath=HAIL_DIR/backend/hail-all-spark.jar
    spark.executor.extraClassPath=./hail-all-spark.jarRunning on Apache Spark version 3.3.2-amzn-0.1
SparkUI available at http://ip-192-168-110-167.ap-southeast-1.compute.internal:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.124-e739a95489e4
LOGGING: writing to /mnt/tmp/hail/hail/hail-20231025-0729-0.2.124-e739a95489e4.log
>>> mt = hl.balding_nichols_model(n_populations=3, n_samples=500, n_variants=1_000)
2023-10-25 07:29:48.283 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 500 samples, and 1000 variants...
>>> mt.count()
(1000, 500)

it seems working in command line using pyspark !

I need to test on jupyter notebook now…

FYI the pyspark configs

- Classification: spark-defaults
        ConfigurationProperties:
          spark.jars: /opt/hail/backend/hail-all-spark.jar
          spark.driver.extraClassPath: /opt/hail/backend/hail-all-spark.jar:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
          spark.executor.extraClassPath: /opt/hail/backend/hail-all-spark.jar:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
          spark.serializer: org.apache.spark.serializer.KryoSerializer
          spark.kryo.registrator: is.hail.kryo.HailKryoRegistrator