spark: Spark can't find DLL's specified

I am new to DOTNET with spark and facing some issues with passing DLLs. Basically, I have some DLL files (from another c# project) which I want to reuse here in my Spark project UDF.

Error: [Warn] [AssemblyLoader] Assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0' file not found 'Classes[.dll,.ni.dll]' in '/tmp/spark-e2e6444a-99fc-42c6-ae15-8a5b328e3038/userFiles-aafb5491-4485-46d9-8e17-0849aed7c57a,/home/ubuntu/project/mySparkApp/bin/Debug/net5.0,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-13T11:16:15.1691280Z] [ubuntu-Vostro] [Error] [TaskRunner] [1] ProcessStream() failed with exception: System.IO.FileNotFoundException: Could not load file or assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0'. The system cannot find the file specified.

Here I have copied Classes.dll (an external DLL) file in my home/ubuntu/project/mySparkApp. Initially, I was facing the same error with mySparkApp.dll and I resolved that with copying in my current directory and that woked. But in case of this third party DLL, it failed to find.

Here is my .csproj file where I have mentioned the Classes.dll: ` <Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net5.0</TargetFramework> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.Spark" Version="1.0.0" /> </ItemGroup> <ItemGroup>
 <Reference Include="Classes">
   <HintPath>/home/incs83/project/mySparkApp/Classes.dll</HintPath>
 </Reference>
 <Reference Include="CSharpZip">
   <HintPath>/home/incs83/project/mySparkApp/CSharpZip.dll</HintPath>
 </Reference>
</ItemGroup> </Project> ` Here is spark-submit:

spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin/Debug/net5.0/microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin/Debug/net5.0/mySparkApp.dll

I have spend a lot of time digging into this, still no luck.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 1
  • Comments: 15 (4 by maintainers)

Most upvoted comments

Spark.NET will look at your custom DLL using this environment variable; DOTNET_ASSEMBLY_SEARCH_PATHS So, just before spark-submit, you can set the environment variable targeting your dll folder:

set DOTNET_ASSEMBLY_SEARCH_PATHS=absolute_path_to_folder_containing_dlls

You can also copy these DLLs to Microsoft.Spark.Worker installation folder. (This what is perform on Databricks environment)

APP_DEPENDENCIES=/dbfs/apps/dependencies
WORKER_PATH=`readlink $DOTNET_SPARK_WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker`
if [ -f $WORKER_PATH ] && [ -d $APP_DEPENDENCIES ]; then
   sudo cp -fR $APP_DEPENDENCIES/. `dirname $WORKER_PATH`
fi