spark: Spark can't find DLL's specified
I am new to DOTNET with spark and facing some issues with passing DLLs. Basically, I have some DLL files (from another c# project) which I want to reuse here in my Spark project UDF.
Error: [Warn] [AssemblyLoader] Assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0' file not found 'Classes[.dll,.ni.dll]' in '/tmp/spark-e2e6444a-99fc-42c6-ae15-8a5b328e3038/userFiles-aafb5491-4485-46d9-8e17-0849aed7c57a,/home/ubuntu/project/mySparkApp/bin/Debug/net5.0,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-13T11:16:15.1691280Z] [ubuntu-Vostro] [Error] [TaskRunner] [1] ProcessStream() failed with exception: System.IO.FileNotFoundException: Could not load file or assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0'. The system cannot find the file specified.
Here I have copied Classes.dll (an external DLL) file in my home/ubuntu/project/mySparkApp
. Initially, I was facing the same error with mySparkApp.dll and I resolved that with copying in my current directory and that woked. But in case of this third party DLL, it failed to find.
Here is my .csproj file where I have mentioned the Classes.dll: ` <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net5.0</TargetFramework> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.Spark" Version="1.0.0" /> </ItemGroup> <ItemGroup> <Reference Include="Classes">
<HintPath>/home/incs83/project/mySparkApp/Classes.dll</HintPath>
</Reference>
<Reference Include="CSharpZip">
<HintPath>/home/incs83/project/mySparkApp/CSharpZip.dll</HintPath>
</Reference>
</ItemGroup>
</Project>
`
Here is spark-submit:
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin/Debug/net5.0/microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin/Debug/net5.0/mySparkApp.dll
I have spend a lot of time digging into this, still no luck.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 1
- Comments: 15 (4 by maintainers)
Spark.NET will look at your custom DLL using this environment variable; DOTNET_ASSEMBLY_SEARCH_PATHS So, just before spark-submit, you can set the environment variable targeting your dll folder:
You can also copy these DLLs to Microsoft.Spark.Worker installation folder. (This what is perform on Databricks environment)