ray: [Core][Serve] Log statements missing in Ray v1.12 or newer

What happened + What you expected to happen

This is largely a repost from Log statements missing in docker logs since v1.12 in the forum.

Log and print statements appear in the terminal and docker logs until the Deployment, i.e. Api.deploy() is called. Once our deployment actor, implemented with FastAPI ingress, is started, log and print statements do not appear in terminal output or docker logs with Ray v1.12 or newer. Log statements appear as expected in Ray v1.11.1 or older.

Example terminal output/docker logs with Ray v1.11.1 or older (working):

❯ docker logs api
...
2022-06-23 13:00:16,998 INFO scripts.py:600 -- Local node IP: 192.168.1.98
2022-06-23 13:00:18,747 INFO services.py:1412 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 13:00:19,066 SUCC scripts.py:640 -- --------------------
2022-06-23 13:00:19,066 SUCC scripts.py:641 -- Ray runtime started.
2022-06-23 13:00:19,066 SUCC scripts.py:642 -- --------------------
2022-06-23 13:00:19,066 INFO scripts.py:644 -- Next steps
2022-06-23 13:00:19,066 INFO scripts.py:645 -- To connect to this Ray runtime from another node, run
2022-06-23 13:00:19,066 INFO scripts.py:649 --   ray start --address='192.168.1.98:6379' --redis-password='5241590000000000'
2022-06-23 13:00:19,066 INFO scripts.py:654 -- Alternatively, use the following Python code:
2022-06-23 13:00:19,066 INFO scripts.py:656 -- import ray
2022-06-23 13:00:19,066 INFO scripts.py:660 -- ray.init(address='auto', _redis_password='5241590000000000')
2022-06-23 13:00:19,067 INFO scripts.py:670 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 13:00:19,067 INFO scripts.py:672 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 13:00:19,067 INFO scripts.py:674 -- Python code:
2022-06-23 13:00:19,067 INFO scripts.py:676 -- import ray
2022-06-23 13:00:19,067 INFO scripts.py:677 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 13:00:19,067 INFO scripts.py:682 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 13:00:19,067 INFO scripts.py:687 -- To terminate the Ray runtime, run
2022-06-23 13:00:19,067 INFO scripts.py:688 --   ray stop
(ServeController pid=224) 2022-06-23 13:00:20,562       INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) 2022-06-23 13:00:20,677       INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
2022-06-23 13:00:21,329 INFO api.py:521 -- Started detached Serve instance in namespace 'serve'.
(HTTPProxyActor pid=258) INFO:     Started server process [258]
2022-06-23 13:00:24,572 DEBUG deploy:run -- Total Resources = {'node:192.168.1.98': 1.0, 'accelerator_type:RTX': 1.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'memory': 84230525133.0, 'GPU': 2.0}
2022-06-23 13:00:24,573 DEBUG deploy:run -- Available Resources = {'node:192.168.1.98': 0.98, 'GPU': 2.0, 'memory': 84230525133.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'accelerator_type:RTX': 1.0}
2022-06-23 13:00:25,503 INFO api.py:262 -- Updating deployment 'Api'. component=serve deployment=Api
(ServeController pid=224) 2022-06-23 13:00:25,602       INFO deployment_state.py:920 -- Adding 1 replicas to deployment 'Api'. component=serve deployment=Api
(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO  api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO  api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024
2022-06-23 13:00:27,254 INFO api.py:274 -- Deployment 'Api' is ready at `http://127.0.0.1:7777/api`. component=serve deployment=Api
2022-06-23 13:00:27,276 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}

Example terminal output/docker logs with Ray v1.12.0 or newer (problem):

❯ docker logs api
...
2022-06-23 12:54:38,725 INFO usage_lib.py:320 -- Usage stats collection is enabled by default without user confirmation because this stdin is detected to be non-interactively. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2022-06-23 12:54:38,725 INFO scripts.py:715 -- Local node IP: 192.168.1.98
2022-06-23 12:54:40,577 INFO services.py:1470 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 12:54:40,898 SUCC scripts.py:757 -- --------------------
2022-06-23 12:54:40,899 SUCC scripts.py:758 -- Ray runtime started.
2022-06-23 12:54:40,899 SUCC scripts.py:759 -- --------------------
2022-06-23 12:54:40,899 INFO scripts.py:761 -- Next steps
2022-06-23 12:54:40,899 INFO scripts.py:762 -- To connect to this Ray runtime from another node, run
2022-06-23 12:54:40,899 INFO scripts.py:765 --   ray start --address='192.168.1.98:6379'
2022-06-23 12:54:40,899 INFO scripts.py:770 -- Alternatively, use the following Python code:
2022-06-23 12:54:40,899 INFO scripts.py:772 -- import ray
2022-06-23 12:54:40,899 INFO scripts.py:776 -- ray.init(address='auto')
2022-06-23 12:54:40,899 INFO scripts.py:788 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 12:54:40,899 INFO scripts.py:792 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 12:54:40,899 INFO scripts.py:796 -- Python code:
2022-06-23 12:54:40,900 INFO scripts.py:798 -- import ray
2022-06-23 12:54:40,900 INFO scripts.py:799 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 12:54:40,900 INFO scripts.py:808 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 12:54:40,900 INFO scripts.py:816 -- To terminate the Ray runtime, run
2022-06-23 12:54:40,900 INFO scripts.py:817 --   ray stop
(ServeController pid=224) INFO 2022-06-23 12:54:42,497 controller 224 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) INFO 2022-06-23 12:54:42,600 controller 224 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
(HTTPProxyActor pid=256) INFO:     Started server process [256]
2022-06-23 12:54:46,307 DEBUG deploy:run -- Total Resources = {'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'node:192.168.1.98': 1.0, 'CPU': 32.0, 'GPU': 2.0, 'memory': 85425603584.0}
2022-06-23 12:54:46,308 DEBUG deploy:run -- Available Resources = {'CPU': 32.0, 'node:192.168.1.98': 0.98, 'GPU': 2.0, 'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'memory': 85425603584.0}
2022-06-23 12:54:49,501 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}

The container/service is named, “api”. Terminal output, i.e. STDOUT, is the same but there are statements from other services and containers in the multi-container application. Only the api service logs are shown for clarity. Only the api service uses Ray. The following statements are missing from the output in Ray v1.12 or newer compared to Ray v1.11.1 or older:

(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO  api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO  api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024

These missing statements are just part of the missing log statements. It is not just the Api actor/deployment. After the Api.deploy() is called, the log statements from any other task or actor are missing from the terminal output and docker logs. These still appear in separate logs in the Ray Dashboard for each task and actor.

The last three DEBUG statements in the Ray v1.12 or newer output from deploy:run are log statements before the Api.deploy() function is called. The deploy.py file:

logger = logging.get_logger(__name__)

def run(context: Configuration):
    logger.debug("Total Resources = %s", ray.cluster_resources())
    logger.debug("Available Resources = %s", ray.available_resources())
    Api.deploy(context)
    logger.debug("Deployments = %s", ray.serve.list_deployments())


if __name__ == "__main__":
    run(server.start())

and the server.py module for the server.start() process. There is a separate deploy.py module so I can setup a file watch and automatically refresh/redeploy/reload on code changes during development.

def start():
    context = cli.run()
    ray.init(address="auto", namespace="serve")
    return context


def run():
    deploy.run(start())

    while True:
        time.sleep(5)


if __name__ == "__main__":
    run()

FYI, I have PYTHONUNBUFFERED=1 in the Dockerfile based on the Ray docker image, regardless of the Ray version.

Versions / Dependencies

  • Python v3.8
  • Ray v1.11 or older (working) and Ray v1.12 or newer (problem)
  • FROM rayproject/ray:1.11.1-py38 or FROM rayproject/ray:1.12.1-py38 docker base image used
  • Host OS: Ubuntu 20.04 Linux x86, aarch64 (NVidia Jetsons), and macOS M1
  • FastAPI v0.70 or newer

Reproduction script

This is copied-n-pasted from our application code and trimmed of sensitive implementation details. The above example logs are from the application, not from this minimal example. Running docker build -f Dockerfile -t example:latest . followed by docker run -p 8000:8000 example:latest will reproduce the behavior.

Dockerfile:

# FROM rayproject/ray:1.11.1-py38
FROM rayproject/ray:1.13.0-py38

ENV PYTHONFAULTHANDLER=1 \
        PYTHONUNBUFFERED=1

COPY . /app

WORKDIR /app

RUN python3 -m pip install fastapi

COPY --chown=ray:users entrypoint.sh ./
RUN chmod a+x ./entrypoint.sh

ENTRYPOINT ["/app/entrypoint.sh"]

CMD ["python3", "-m", "server"]

Entrypoint script:

#!/usr/bin/env bash

ray start --head
serve start

exec "$@"

Server.py:

#!/usr/bin/env python3

import deploy
import ray
import time


def start():
    ray.init(address="auto", namespace="serve")


def run():
    start()
    deploy.run()

    while True:
        time.sleep(5)


if __name__ == "__main__":
    run()

Deploy.py:

#!/usr/bin/env python3

import ray
import server
import utils

from api import Api

logger = utils.get_logger(__name__)


def run():
    logger.debug("Total Resources = %s", ray.cluster_resources())
    logger.debug("Available Resources = %s", ray.available_resources())
    Api.deploy()
    logger.debug("Deployments = %s", ray.serve.list_deployments())


if __name__ == "__main__":
    server.start()
    run()

Utils.py:

#!/usr/bin/env python3

import logging

logging.addLevelName(logging.CRITICAL, "FATAL")
logging.addLevelName(logging.ERROR, "ERROR")
logging.addLevelName(logging.WARNING, "WARN")
logging.addLevelName(logging.INFO, "INFO")
logging.addLevelName(logging.DEBUG, "DEBUG")


def get_logger(name, level=logging.DEBUG):
    logger = logging.getLogger(name)
    ch = logging.StreamHandler()
    formatter = logging.Formatter(
        "%(asctime)s %(levelname)-5s %(module)s:%(funcName)s -- %(message)s"
    )
    ch.setFormatter(formatter)
    logger.addHandler(ch)
    logger.setLevel(level)
    return logger


def get_middleware_logger(level=logging.DEBUG):
    logger = logging.getLogger("middleware")
    ch = logging.StreamHandler()
    formatter = logging.Formatter(
        "%(asctime)s %(levelname)-5s %(method)-6s %(url_path)s -- %(message)s"
    )
    ch.setFormatter(formatter)
    logger.addHandler(ch)
    logger.setLevel(level)
    return logger

Api.py:

#!/usr/bin/env python3

import time
import utils

from fastapi import FastAPI, Request
from ray import serve

logger = utils.get_logger(__name__)

app = FastAPI()

middleware_logger = utils.get_middleware_logger()

# Middleware logging missing from Ray v1.12 or newer
@app.middleware("http")
async def log_requests(request: Request, call_next):
    request_record = {"method": request.method, "url_path": request.url.path}
    middleware_logger.debug("Begin", extra=request_record)
    start_time = time.time()

    response = await call_next(request)

    process_time = (time.time() - start_time) * 1000
    formatted_process_time = "{0:.2f}".format(process_time)
    middleware_logger.info(f"Latency={formatted_process_time}ms", extra=request_record)
    middleware_logger.debug("End", extra=request_record)

    return response


@serve.deployment(route_prefix="/api")
@serve.ingress(app)
class Api:
    def __init__(self):
        # These statements are missing in Ray v1.12 or newer
        logger.debug("Debug statement")
        logger.info("Info statement")
        logger.warning("Warn statement")
        logger.error("Error statement")
        logger.critical("Fatal statement")
        print("A print statement")

    @app.get("/ping")
    def ping(self):
        # These statements are missing in Ray v1.12 or newer
        logger.debug("Debug statement")
        logger.info("Info statement")
        logger.warning("Warn statement")
        logger.error("Error statement")
        logger.critical("Fatal statement")
        print("A print statement")
        return "pong"

Issue Severity

Medium: It is a significant difficulty but I can work around it.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 19 (15 by maintainers)

Most upvoted comments

Small update, I have tried adding a filter and various combinations of getting the ray.serve logger. Nothing I do will remove the following example message:

api         | (HTTPProxyActor pid=235) INFO 2022-09-04 22:48:50,916 http_proxy 192.168.1.98 http_proxy.py:315 - GET /api 200 4.3ms

I looked at the Ray v1.13.1 source code. The relevant file is ray/python/ray/serve/http_proxy.py. The log statement is created at Line 310 to 317. The logger is using ray.serve for the name. I verified I am getting the ray.serve logger. I think this might be because the HttpProxy is wrapped in the HttpProxyActor class and logging needs to be configured on a per actor basis, at least based on my understanding and reading of the documentation. However, there is the configure_component_logger function within the actor that appears to configure the logger. So, I am kind of stumped.

Any guidance or suggestions would be greatly appreciated.