azure-sdk-for-python: [AzureML v2] - Unable to create an AzureML compute cluster with No Public IP nodes (on private endpoint)

  • Package Name: azure-ai-ml
  • Package Version: 1.1.0
  • Operating System: Ubuntu 20.04.5 LTS
  • Python Version: Python 3.10.6

Describe the bug Trying to create an AzureML compute cluster, on a private endpoint, with no public IPs enabled. Using this code snippet in Azure’s docs.

from azure.ai.ml.entities import AmlCompute

# specify aml compute name.
cpu_compute_target = "cpu-cluster"

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4,
        vnet_name="yourvnet", subnet_name="yoursubnet", enable_node_public_ip=False
    )
    ml_client.compute.begin_create_or_update(compute).result()

Traceback (most recent call last): File “/home/azureuser/cloudfiles/code/[REDACTED]/src/aml_utils.py”, line 16, in create_compute cpu_cluster = ml_client.compute.get(compute_target_name) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/tracing/decorator.py”, line 78, in wrapper_use_tracer return func(*args, **kwargs) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_telemetry/activity.py”, line 259, in wrapper return f(*args, **kwargs) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/operations/_compute_operations.py”, line 78, in get rest_obj = self._operation.get( File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/tracing/decorator.py”, line 78, in wrapper_use_tracer return func(*args, **kwargs) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/_restclient/v2022_01_01_preview/operations/_compute_operations.py”, line 577, in get map_error(status_code=response.status_code, response=response, error_map=error_map) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/exceptions.py”, line 107, in map_error raise error azure.core.exceptions.ResourceNotFoundError: Operation returned an invalid status ‘Not Found’ During handling of the above exception, another exception occurred: Traceback (most recent call last): File “<string>”, line 1, in <module> File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/entities/_compute/aml_compute.py”, line 117, in init super().init( File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/entities/_compute/compute.py”, line 55, in init super().init(name=name, description=description, **kwargs) File “/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/ml/entities/_resource.py”, line 60, in init super().init(**kwargs) TypeError: object.init() takes exactly one argument (the instance to initialize)

Pip list for Azure’s package:

azure-ai-ml 1.1.0 azure-common 1.1.28 azure-core 1.26.1 azure-graphrbac 0.61.1 azure-identity 1.7.0 azure-mgmt-authorization 2.0.0 azure-mgmt-containerregistry 10.0.0 azure-mgmt-core 1.3.2 azure-mgmt-keyvault 10.1.0 azure-mgmt-resource 21.2.1 azure-mgmt-storage 19.1.0 azure-storage-blob 12.13.0 azure-storage-file-datalake 12.8.0 azure-storage-file-share 12.10.1 azureml-core 1.47.0 azureml-dataprep 4.5.7 azureml-dataprep-native 38.0.0 azureml-dataprep-rslex 2.11.4 azureml-mlflow 1.47.0 msrestazure 0.6.4 opencensus-ext-azure 1.1.7

To Reproduce Steps to reproduce the behavior, run the following code:

  1. Configure your ml_client to a private endpoint workspace
  2. Create your aml compute and with no public ip access
  3. Run the following snippet:
compute = AmlCompute(
        name="tmp-cluster", size="STANDARD_D13", min_instances=0, max_instances=2,
        vnet_name="myvnet", subnet_name="mysubnet", enable_node_public_ip=False)

Notes: my ml_client is already linked to a private endpoint workspace as mentioned here.

Is it possible I’m using the wrong name for the vnet and subnet? image

Expected behavior The same snippet should create an NPIP compute on the respective vnet and subnet

Screenshots N/A

Additional context N/A

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

@jadhosn Yes, we have support for startup scripts since 1.0.0. Here is an example snippet:

from azure.ai.ml.entities import ComputeInstance, SetupScripts, ScriptReference

creation_script = ScriptReference(path="Users/yourusername/creation.sh", command="echo Created! > startup.txt", timeout_minutes=15)
startup_script = ScriptReference(path="Users/yourusername/setup.sh", command="echo Started! >> startup.txt", timeout_minutes=10)
setup_scripts = SetupScripts(startup_script=startup_script, creation_script=creation_script)

compute_instance = ComputeInstance(name='my-ci-setup-script', size='STANDARD_DS3_v2', setup_scripts=setup_scripts)
                                                   
ml_client.compute.begin_create_or_update(compute_instance)