katib: Error "Objective metric accuracy is not found in training logs, unavailable value is reported. metric:

/kind bug

What steps did you take and what happened: I have been trying to create a simple Katib experiment with sklearn iris dataset but am facing an error "Objective metric accuracy is not found in training logs, unavailable value is reported. metric:<name:“accuracy” value:“unavailable”

Below is my code: import argparse import os import hypertune import logging import pandas as pd

YOUR IMPORTS HERE

from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier

def main(): parser = argparse.ArgumentParser() parser.add_argument(‘–neighbors’, type=int, default=3, help=‘value of k’) parser.add_argument(“–log-path”, type=str, default=“”, help=“Path to save logs. Print to StdOut if log-path is not set”) parser.add_argument(“–logger”, type=str, choices=[“standard”, “hypertune”], help=“Logger”, default=“standard”) args = parser.parse_args()

if args.log_path == "" or args.logger == "hypertune":
    logging.basicConfig(
        format="%(asctime)s %(levelname)-8s %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%SZ",
        level=logging.DEBUG)
else:
    logging.basicConfig(
        format="%(asctime)s %(levelname)-8s %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%SZ",
        level=logging.DEBUG,
        filename=args.log_path)

if args.logger == "hypertune" and args.log_path != "":
    os.environ['CLOUD_ML_HP_METRIC_FILE'] = args.log_path

# For JSON logging
hpt = hypertune.HyperTune()

# LOAD DATA HERE
iris_data = load_iris()
iris_df = pd.DataFrame(data=iris_data['data'], columns=iris_data['feature_names'])
iris_df['Iris type'] = iris_data['target']
iris_df['Iris name'] = iris_df['Iris type'].apply(
    lambda x: 'sentosa' if x == 0 else ('versicolor' if x == 1 else 'virginica'))

def f(x):
    if x == 0:
        val = 'setosa'
    elif x == 1:
        val = 'versicolor'
    else:
        val = 'virginica'
    return val

iris_df['test'] = iris_df['Iris type'].apply(f)
iris_df.drop(['test'], axis=1, inplace=True)

X = iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris_df['Iris name']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
knn = KNeighborsClassifier(n_neighbors=args.neighbors)
knn.fit(X_train, y_train)
accuracy = knn.score(X_test, y_test)

logging.info("{{metricName: accuracy, metricValue: {:.4f}}}\n".format(accuracy))

if args.logger == "hypertune":
    hpt.report_hyperparameter_tuning_metric(
        hyperparameter_metric_tag='accuracy',
        metric_value=accuracy)

if name == ‘main’: main()

Below is my yaml file:

apiVersion: kubeflow.org/v1beta1 kind: Experiment metadata: namespace: kubeflow name: iris-1 spec: parallelTrialCount: 1 maxTrialCount: 2 maxFailedTrialCount: 3 objective: type: maximize goal: 0.99 objectiveMetricName: accuracy metricsCollectorSpec: collector: kind: StdOut algorithm: algorithmName: random parameters: - name: neighbors parameterType: int feasibleSpace: min: “3” max: “5” trialTemplate: primaryContainerName: training-container trialParameters: - name: neighbors description: KNN neighbors reference: neighbors trialSpec: apiVersion: batch/v1 kind: Job spec: template: metadata: annotations: sidecar.istio.io/inject: “false” spec: containers: - name: training-container image: e-dpiac-docker-local.docker.lowes.com/katib-sklearn:v3 command: - “python3” - “/app/iris.py” - “–neighbors=${trialParameters.neighbors}” - “–logger=hypertune” resources: requests: memory: “6Gi” cpu: “2” limits: memory: “10Gi” cpu: “4” restartPolicy: Never

What did you expect to happen: The metrics should have been collected and the trials should have succeeded…

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Katib version (check the Katib controller image version): katib-controller:v0.12.0
Kubernetes version: (kubectl version):
OS (uname -a):

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

About this issue

Original URL
State: closed
Created a year ago
Comments: 23 (23 by maintainers)

Most upvoted comments

That is correct behaviour since you use Katib 0.12 version. In that version, the default ResumePolicy=LongRunning. Which allows you to restart your Experiment at any time by changing the maxTrialCount parameter. In that case, Suggestion pod is always running.

In the recent release, we use ResumePolicy=Never as a default resume policy, which won’t allow you to restart an Experiment and cleanup the Suggestion Pod.

You can learn more about it in this doc: https://www.kubeflow.org/docs/components/katib/resume-experiment/#resume-succeeded-experiment

andreyvelich on Jul 20, 2023