dask-cuda: Dask_Cuda error warning
I have got WSL2 up and running on my laptop and installed a conda enviroment with cupy appearing to work.
when I run the following:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
# Create a Dask Cluster with one worker per GPU
cluster = LocalCUDACluster()
client = Client(cluster)
/home/ouetis_khan/miniconda3/envs/img_linux2/lib/python3.8/site-packages/dask_cuda/utils.py:141: UserWarning: Cannot get CPU affinity for device with index 0, setting default affinity
warnings.warn(
/home/ouetis_khan/miniconda3/envs/img_linux2/lib/python3.8/site-packages/dask_cuda/utils.py:141: UserWarning: Cannot get CPU affinity for device with index 1, setting default affinity
warnings.warn(
If I client.close() and rerun
cluster = LocalCUDACluster()
client = Client(cluster)
I get the following error
---------------------------------------------------------------------------
NVMLError_Unknown Traceback (most recent call last)
<ipython-input-4-f36941294e22> in <module>
----> 1 cluster = LocalCUDACluster()
2 client = Client(cluster)
~/miniconda3/envs/img_linux2/lib/python3.8/site-packages/dask_cuda/local_cuda_cluster.py in __init__(self, n_workers, threads_per_worker, processes, memory_limit, device_memory_limit, CUDA_VISIBLE_DEVICES, data, local_directory, protocol, enable_tcp_over_ucx, enable_infiniband, enable_nvlink, enable_rdmacm, ucx_net_devices, rmm_pool_size, rmm_managed_memory, jit_unspill, **kwargs)
160 memory_limit, threads_per_worker, n_workers
161 )
--> 162 self.device_memory_limit = parse_device_memory_limit(
163 device_memory_limit, device_index=0
164 )
~/miniconda3/envs/img_linux2/lib/python3.8/site-packages/dask_cuda/utils.py in parse_device_memory_limit(device_memory_limit, device_index)
478 device_memory_limit = float(device_memory_limit)
479 if isinstance(device_memory_limit, float) and device_memory_limit <= 1:
--> 480 return int(get_device_total_memory(device_index) * device_memory_limit)
481
482 if isinstance(device_memory_limit, str):
~/miniconda3/envs/img_linux2/lib/python3.8/site-packages/dask_cuda/utils.py in get_device_total_memory(index)
159 pynvml.nvmlInit()
160 return pynvml.nvmlDeviceGetMemoryInfo(
--> 161 pynvml.nvmlDeviceGetHandleByIndex(index)
162 ).total
163
~/miniconda3/envs/img_linux2/lib/python3.8/site-packages/pynvml/nvml.py in nvmlDeviceGetHandleByIndex(index)
920 fn = get_func_pointer("nvmlDeviceGetHandleByIndex_v2")
921 ret = fn(c_index, byref(device))
--> 922 check_return(ret)
923 return device
924
~/miniconda3/envs/img_linux2/lib/python3.8/site-packages/pynvml/nvml.py in check_return(ret)
364 def check_return(ret):
365 if (ret != NVML_SUCCESS):
--> 366 raise NVMLError(ret)
367 return ret
368
NVMLError_Unknown: Unknown Error
If I import cupy and run cupy.show_config()
CuPy Version : 7.8.0
CUDA Root : /home/ouetis_khan/miniconda3/envs/img_linux2
CUDA Build Version : 11000
CUDA Driver Version : 11030
CUDA Runtime Version : 11000
cuBLAS Version : 11200
cuFFT Version : 10201
cuRAND Version : 10201
cuSOLVER Version : (10, 6, 0)
cuSPARSE Version : 11101
NVRTC Version : (11, 0)
cuDNN Build Version : 8000
cuDNN Version : 8000
NCCL Build Version : 2708
NCCL Runtime Version : 2708
CUB Version : Enabled
cuTENSOR Version : None
I see most of the libraries there except for one. Dask_Cuda could be a bit too early for WSL2 but there does appear to be something here where someone can get something working.
Any thoughts on what could be happening and on how to get this to work?
my laptop build is as follows (ive deleted a few bits for privacy):
WindowsBuildLabEx : 21277.1000.amd64fre.rs_prerelease.201207-1443
WindowsCurrentVersion : 6.3
WindowsEditionId : Core
WindowsInstallationType : Client
WindowsInstallDateFromRegistry : 1/3/2021 3:47:15 PM
WindowsProductId :
WindowsProductName : Windows 10 Home
WindowsRegisteredOrganization : Razer
WindowsRegisteredOwner :
WindowsSystemRoot : C:\WINDOWS
WindowsVersion : 2004
BiosCharacteristics :
BiosBIOSVersion : {ALASKA - 1072009, 5.00, American Megatrends - 5000C}
BiosBuildNumber :
BiosCaption : 5.00
BiosCodeSet :
BiosCurrentLanguage :
BiosDescription : 5.00
BiosEmbeddedControllerMajorVersion : 4
BiosEmbeddedControllerMinorVersion : 0
BiosFirmwareType : Uefi
BiosIdentificationCode :
BiosInstallableLanguages :
BiosInstallDate :
BiosLanguageEdition :
BiosListOfLanguages :
BiosManufacturer : Razer
BiosName : 5.00
BiosOtherTargetOS :
BiosPrimaryBIOS : True
BiosReleaseDate : 5/3/2018 1:00:00 AM
BiosSeralNumber :
BiosSMBIOSBIOSVersion : 5.00
BiosSMBIOSMajorVersion : 3
BiosSMBIOSMinorVersion : 0
BiosSMBIOSPresent : True
BiosSoftwareElementState : Running
BiosStatus : OK
BiosSystemBiosMajorVersion : 5
BiosSystemBiosMinorVersion : 0
BiosTargetOperatingSystem : 0
BiosVersion : ALASKA - 1072009
CsAdminPasswordStatus : Unknown
CsAutomaticManagedPagefile : True
CsAutomaticResetBootOption : True
CsAutomaticResetCapability : True
CsBootOptionOnLimit :
CsBootOptionOnWatchDog :
CsBootROMSupported : True
CsBootStatus : {0, 0, 0, 0...}
CsBootupState : Normal boot
CsCaption :
CsChassisBootupState : Safe
CsChassisSKUNumber :
CsCurrentTimeZone : 0
CsDaylightInEffect : False
CsDescription : AT/AT COMPATIBLE
CsDNSHostName :
CsDomain : WORKGROUP
CsDomainRole : StandaloneWorkstation
CsEnableDaylightSavingsTime : True
CsFrontPanelResetStatus : Unknown
CsHypervisorPresent : True
CsInfraredSupported : False
CsInitialLoadInfo :
CsInstallDate :
CsKeyboardPasswordStatus : Unknown
CsLastLoadInfo :
CsManufacturer : Razer
CsModel : Blade
CsName :
CsNetworkAdapters : {WiFi, Bluetooth Network Connection, Ethernet 2}
CsNetworkServerModeEnabled : True
CsNumberOfLogicalProcessors : 8
CsNumberOfProcessors : 1
CsProcessors : {Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz}
CsOEMStringArray : {0, , , ...}
CsPartOfDomain : False
CsPauseAfterReset : -1
CsPCSystemType : Mobile
CsPCSystemTypeEx : Mobile
CsPowerManagementCapabilities :
CsPowerManagementSupported :
CsPowerOnPasswordStatus : Unknown
CsPowerState : Unknown
CsPowerSupplyState : Safe
CsPrimaryOwnerContact :
CsPrimaryOwnerName :
CsResetCapability : Other
CsResetCount : -1
CsResetLimit : -1
CsRoles : {LM_Workstation, LM_Server, NT}
CsStatus : OK
CsSupportContactDescription :
CsSystemFamily : 1A586755
CsSystemSKUNumber : RZ09-01953W53
CsSystemType : x64-based PC
CsThermalState : Safe
CsTotalPhysicalMemory : 17068191744
CsPhyicallyInstalledMemory : 16777216
CsUserName :
CsWakeUpType : PowerSwitch
CsWorkgroup : WORKGROUP
OsName : Microsoft Windows 10 Home Insider Preview
OsType : WINNT
OsOperatingSystemSKU : WindowsHome
OsVersion : 10.0.21277
OsCSDVersion :
OsBuildNumber : 21277
OsHotFixes : {KB4587025}
OsBootDevice : \Device\HarddiskVolume2
OsSystemDevice : \Device\HarddiskVolume4
OsSystemDirectory : C:\WINDOWS\system32
OsSystemDrive : C:
OsWindowsDirectory : C:\WINDOWS
OsCountryCode : 44
OsCurrentTimeZone : 0
OsLocaleID : 0809
OsLocale : en-GB
OsLocalDateTime : 1/4/2021 10:35:13 AM
OsLastBootUpTime : 1/4/2021 9:16:15 AM
OsUptime : 01:18:57.5475818
OsBuildType : Multiprocessor Free
OsCodeSet : 1252
OsDataExecutionPreventionAvailable : True
OsDataExecutionPrevention32BitApplications : True
OsDataExecutionPreventionDrivers : True
OsDataExecutionPreventionSupportPolicy : OptIn
OsDebug : False
OsDistributed : False
OsEncryptionLevel : 256
OsForegroundApplicationBoost : Maximum
OsTotalVisibleMemorySize : 16668156
OsFreePhysicalMemory : 7065500
OsTotalVirtualMemorySize : 23483900
OsFreeVirtualMemory : 6431744
OsInUseVirtualMemory : 17052156
OsTotalSwapSpaceSize :
OsSizeStoredInPagingFiles : 6815744
OsFreeSpaceInPagingFiles : 6677792
OsPagingFiles : {C:\pagefile.sys}
OsHardwareAbstractionLayer : 10.0.21277.1000
OsInstallDate : 1/3/2021 3:47:15 PM
OsManufacturer : Microsoft Corporation
OsMaxNumberOfProcesses : 4294967295
OsMaxProcessMemorySize : 137438953344
OsMuiLanguages : {en-GB, en-US}
OsNumberOfLicensedUsers :
OsNumberOfProcesses : 275
OsNumberOfUsers : 2
OsOrganization : Razer
OsArchitecture : 64-bit
OsLanguage : en-GB
OsProductSuites : {TerminalServicesSingleSession, HomeEdition}
OsOtherTypeDescription :
OsPAEEnabled :
OsPortableOperatingSystem : False
OsPrimary : True
OsProductType : WorkStation
OsRegisteredUser :
OsSerialNumber :
OsServicePackMajorVersion : 0
OsServicePackMinorVersion : 0
OsStatus : OK
OsSuites : {TerminalServices, TerminalServicesSingleSession,
HomeEdition}
OsServerLevel :
KeyboardLayout : en-GB
TimeZone : (UTC+00:00) Dublin, Edinburgh, Lisbon, London
LogonServer : \\
PowerPlatformRole : Mobile
HyperVisorPresent : True
HyperVRequirementDataExecutionPreventionAvailable :
HyperVRequirementSecondLevelAddressTranslation :
HyperVRequirementVirtualizationFirmwareEnabled :
HyperVRequirementVMMonitorModeExtensions :
DeviceGuardSmartStatus : Off
DeviceGuardRequiredSecurityProperties : {0}
DeviceGuardAvailableSecurityProperties : {BaseVirtualizationSupport, SecureBoot, DMAProtection,
6...}
DeviceGuardSecurityServicesConfigured : {0}
DeviceGuardSecurityServicesRunning : {0}
DeviceGuardCodeIntegrityPolicyEnforcementStatus :
DeviceGuardUserModeCodeIntegrityPolicyEnforcementStatus :
On top of the above information I have a Nvidia GTX 1060 GPU in built and a razer core eGPU with GTX1066 . So would be great to hear what you may think because CUDA is running on the my Ubuntu 18.04 with WSL2.
When I run the BlackScholes examples it runs (ok only on GPU 0) but it runs. So I am a bit confused why it can pick up two GPU’s but not really work with them.
GPU Device 0: "Turing" with compute capability 7.5
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 0.543924 msec
Effective memory bandwidth: 147.079411 GB/s
Gigaoptions per second : 14.707941
BlackScholes, Throughput = 14.7079 GOptions/s, Time = 0.00054 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Test passed
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (6 by maintainers)
I created a file as the the wsl2 docs suggest and added the following:
[wsl2] memory=12GB # Limits VM memory in WSL 2 up to 13GB (leave 3GB reserved for windows) processors=3 # Makes the WSL 2 VM use 3 virtual processors (1 core left for windows) swap=200GB swapFile=E:\temp\swap.vhdx localhostForwarding=true
That seems to do the trick. When I run a dask.distributed client I now have 12.6Gb vs my previous 8GB in wsl2. So thats an improvement. Cupy is working, at least I can use a local dask cuda cluster as i have two (not very good but better than nothing Nvidia GPUs 1060 (embedded) and a eGPU with 1066). So 6Gb on each and 13Gb of ram to play with it. thanks @pentschev
Ah yes, sorry I thought you were referring to
device_memory_limit.That’s right,
device_memory_limitwill spill from GPU to host, andmemory_limit(which is automatically inferred to 8GB in your system) is shared among the workers, and that’s used to control host to disk spilling.Thanks, I think this issue is closed now and i really appreciate the help. Unless you think I have missed something. Thank you again!