ray: [Core][Serve] Log statements missing in Ray v1.12 or newer
What happened + What you expected to happen
This is largely a repost from Log statements missing in docker logs since v1.12 in the forum.
Log and print statements appear in the terminal and docker logs until the Deployment, i.e. Api.deploy() is called. Once our deployment actor, implemented with FastAPI ingress, is started, log and print statements do not appear in terminal output or docker logs with Ray v1.12 or newer. Log statements appear as expected in Ray v1.11.1 or older.
Example terminal output/docker logs with Ray v1.11.1 or older (working):
❯ docker logs api
...
2022-06-23 13:00:16,998 INFO scripts.py:600 -- Local node IP: 192.168.1.98
2022-06-23 13:00:18,747 INFO services.py:1412 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 13:00:19,066 SUCC scripts.py:640 -- --------------------
2022-06-23 13:00:19,066 SUCC scripts.py:641 -- Ray runtime started.
2022-06-23 13:00:19,066 SUCC scripts.py:642 -- --------------------
2022-06-23 13:00:19,066 INFO scripts.py:644 -- Next steps
2022-06-23 13:00:19,066 INFO scripts.py:645 -- To connect to this Ray runtime from another node, run
2022-06-23 13:00:19,066 INFO scripts.py:649 -- ray start --address='192.168.1.98:6379' --redis-password='5241590000000000'
2022-06-23 13:00:19,066 INFO scripts.py:654 -- Alternatively, use the following Python code:
2022-06-23 13:00:19,066 INFO scripts.py:656 -- import ray
2022-06-23 13:00:19,066 INFO scripts.py:660 -- ray.init(address='auto', _redis_password='5241590000000000')
2022-06-23 13:00:19,067 INFO scripts.py:670 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 13:00:19,067 INFO scripts.py:672 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 13:00:19,067 INFO scripts.py:674 -- Python code:
2022-06-23 13:00:19,067 INFO scripts.py:676 -- import ray
2022-06-23 13:00:19,067 INFO scripts.py:677 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 13:00:19,067 INFO scripts.py:682 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 13:00:19,067 INFO scripts.py:687 -- To terminate the Ray runtime, run
2022-06-23 13:00:19,067 INFO scripts.py:688 -- ray stop
(ServeController pid=224) 2022-06-23 13:00:20,562 INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) 2022-06-23 13:00:20,677 INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
2022-06-23 13:00:21,329 INFO api.py:521 -- Started detached Serve instance in namespace 'serve'.
(HTTPProxyActor pid=258) INFO: Started server process [258]
2022-06-23 13:00:24,572 DEBUG deploy:run -- Total Resources = {'node:192.168.1.98': 1.0, 'accelerator_type:RTX': 1.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'memory': 84230525133.0, 'GPU': 2.0}
2022-06-23 13:00:24,573 DEBUG deploy:run -- Available Resources = {'node:192.168.1.98': 0.98, 'GPU': 2.0, 'memory': 84230525133.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'accelerator_type:RTX': 1.0}
2022-06-23 13:00:25,503 INFO api.py:262 -- Updating deployment 'Api'. component=serve deployment=Api
(ServeController pid=224) 2022-06-23 13:00:25,602 INFO deployment_state.py:920 -- Adding 1 replicas to deployment 'Api'. component=serve deployment=Api
(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024
2022-06-23 13:00:27,254 INFO api.py:274 -- Deployment 'Api' is ready at `http://127.0.0.1:7777/api`. component=serve deployment=Api
2022-06-23 13:00:27,276 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}
Example terminal output/docker logs with Ray v1.12.0 or newer (problem):
❯ docker logs api
...
2022-06-23 12:54:38,725 INFO usage_lib.py:320 -- Usage stats collection is enabled by default without user confirmation because this stdin is detected to be non-interactively. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2022-06-23 12:54:38,725 INFO scripts.py:715 -- Local node IP: 192.168.1.98
2022-06-23 12:54:40,577 INFO services.py:1470 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 12:54:40,898 SUCC scripts.py:757 -- --------------------
2022-06-23 12:54:40,899 SUCC scripts.py:758 -- Ray runtime started.
2022-06-23 12:54:40,899 SUCC scripts.py:759 -- --------------------
2022-06-23 12:54:40,899 INFO scripts.py:761 -- Next steps
2022-06-23 12:54:40,899 INFO scripts.py:762 -- To connect to this Ray runtime from another node, run
2022-06-23 12:54:40,899 INFO scripts.py:765 -- ray start --address='192.168.1.98:6379'
2022-06-23 12:54:40,899 INFO scripts.py:770 -- Alternatively, use the following Python code:
2022-06-23 12:54:40,899 INFO scripts.py:772 -- import ray
2022-06-23 12:54:40,899 INFO scripts.py:776 -- ray.init(address='auto')
2022-06-23 12:54:40,899 INFO scripts.py:788 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 12:54:40,899 INFO scripts.py:792 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 12:54:40,899 INFO scripts.py:796 -- Python code:
2022-06-23 12:54:40,900 INFO scripts.py:798 -- import ray
2022-06-23 12:54:40,900 INFO scripts.py:799 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 12:54:40,900 INFO scripts.py:808 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 12:54:40,900 INFO scripts.py:816 -- To terminate the Ray runtime, run
2022-06-23 12:54:40,900 INFO scripts.py:817 -- ray stop
(ServeController pid=224) INFO 2022-06-23 12:54:42,497 controller 224 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) INFO 2022-06-23 12:54:42,600 controller 224 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
(HTTPProxyActor pid=256) INFO: Started server process [256]
2022-06-23 12:54:46,307 DEBUG deploy:run -- Total Resources = {'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'node:192.168.1.98': 1.0, 'CPU': 32.0, 'GPU': 2.0, 'memory': 85425603584.0}
2022-06-23 12:54:46,308 DEBUG deploy:run -- Available Resources = {'CPU': 32.0, 'node:192.168.1.98': 0.98, 'GPU': 2.0, 'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'memory': 85425603584.0}
2022-06-23 12:54:49,501 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}
The container/service is named, “api”. Terminal output, i.e. STDOUT, is the same but there are statements from other services and containers in the multi-container application. Only the api service logs are shown for clarity. Only the api service uses Ray. The following statements are missing from the output in Ray v1.12 or newer compared to Ray v1.11.1 or older:
(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024
These missing statements are just part of the missing log statements. It is not just the Api actor/deployment. After the Api.deploy() is called, the log statements from any other task or actor are missing from the terminal output and docker logs. These still appear in separate logs in the Ray Dashboard for each task and actor.
The last three DEBUG statements in the Ray v1.12 or newer output from deploy:run are log statements before the Api.deploy() function is called. The deploy.py file:
logger = logging.get_logger(__name__)
def run(context: Configuration):
logger.debug("Total Resources = %s", ray.cluster_resources())
logger.debug("Available Resources = %s", ray.available_resources())
Api.deploy(context)
logger.debug("Deployments = %s", ray.serve.list_deployments())
if __name__ == "__main__":
run(server.start())
and the server.py module for the server.start() process. There is a separate deploy.py module so I can setup a file watch and automatically refresh/redeploy/reload on code changes during development.
def start():
context = cli.run()
ray.init(address="auto", namespace="serve")
return context
def run():
deploy.run(start())
while True:
time.sleep(5)
if __name__ == "__main__":
run()
FYI, I have PYTHONUNBUFFERED=1 in the Dockerfile based on the Ray docker image, regardless of the Ray version.
Versions / Dependencies
- Python v3.8
- Ray v1.11 or older (working) and Ray v1.12 or newer (problem)
FROM rayproject/ray:1.11.1-py38orFROM rayproject/ray:1.12.1-py38docker base image used- Host OS: Ubuntu 20.04 Linux x86, aarch64 (NVidia Jetsons), and macOS M1
- FastAPI v0.70 or newer
Reproduction script
This is copied-n-pasted from our application code and trimmed of sensitive implementation details. The above example logs are from the application, not from this minimal example. Running docker build -f Dockerfile -t example:latest . followed by docker run -p 8000:8000 example:latest will reproduce the behavior.
Dockerfile:
# FROM rayproject/ray:1.11.1-py38
FROM rayproject/ray:1.13.0-py38
ENV PYTHONFAULTHANDLER=1 \
PYTHONUNBUFFERED=1
COPY . /app
WORKDIR /app
RUN python3 -m pip install fastapi
COPY --chown=ray:users entrypoint.sh ./
RUN chmod a+x ./entrypoint.sh
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["python3", "-m", "server"]
Entrypoint script:
#!/usr/bin/env bash
ray start --head
serve start
exec "$@"
Server.py:
#!/usr/bin/env python3
import deploy
import ray
import time
def start():
ray.init(address="auto", namespace="serve")
def run():
start()
deploy.run()
while True:
time.sleep(5)
if __name__ == "__main__":
run()
Deploy.py:
#!/usr/bin/env python3
import ray
import server
import utils
from api import Api
logger = utils.get_logger(__name__)
def run():
logger.debug("Total Resources = %s", ray.cluster_resources())
logger.debug("Available Resources = %s", ray.available_resources())
Api.deploy()
logger.debug("Deployments = %s", ray.serve.list_deployments())
if __name__ == "__main__":
server.start()
run()
Utils.py:
#!/usr/bin/env python3
import logging
logging.addLevelName(logging.CRITICAL, "FATAL")
logging.addLevelName(logging.ERROR, "ERROR")
logging.addLevelName(logging.WARNING, "WARN")
logging.addLevelName(logging.INFO, "INFO")
logging.addLevelName(logging.DEBUG, "DEBUG")
def get_logger(name, level=logging.DEBUG):
logger = logging.getLogger(name)
ch = logging.StreamHandler()
formatter = logging.Formatter(
"%(asctime)s %(levelname)-5s %(module)s:%(funcName)s -- %(message)s"
)
ch.setFormatter(formatter)
logger.addHandler(ch)
logger.setLevel(level)
return logger
def get_middleware_logger(level=logging.DEBUG):
logger = logging.getLogger("middleware")
ch = logging.StreamHandler()
formatter = logging.Formatter(
"%(asctime)s %(levelname)-5s %(method)-6s %(url_path)s -- %(message)s"
)
ch.setFormatter(formatter)
logger.addHandler(ch)
logger.setLevel(level)
return logger
Api.py:
#!/usr/bin/env python3
import time
import utils
from fastapi import FastAPI, Request
from ray import serve
logger = utils.get_logger(__name__)
app = FastAPI()
middleware_logger = utils.get_middleware_logger()
# Middleware logging missing from Ray v1.12 or newer
@app.middleware("http")
async def log_requests(request: Request, call_next):
request_record = {"method": request.method, "url_path": request.url.path}
middleware_logger.debug("Begin", extra=request_record)
start_time = time.time()
response = await call_next(request)
process_time = (time.time() - start_time) * 1000
formatted_process_time = "{0:.2f}".format(process_time)
middleware_logger.info(f"Latency={formatted_process_time}ms", extra=request_record)
middleware_logger.debug("End", extra=request_record)
return response
@serve.deployment(route_prefix="/api")
@serve.ingress(app)
class Api:
def __init__(self):
# These statements are missing in Ray v1.12 or newer
logger.debug("Debug statement")
logger.info("Info statement")
logger.warning("Warn statement")
logger.error("Error statement")
logger.critical("Fatal statement")
print("A print statement")
@app.get("/ping")
def ping(self):
# These statements are missing in Ray v1.12 or newer
logger.debug("Debug statement")
logger.info("Info statement")
logger.warning("Warn statement")
logger.error("Error statement")
logger.critical("Fatal statement")
print("A print statement")
return "pong"
Issue Severity
Medium: It is a significant difficulty but I can work around it.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 19 (15 by maintainers)
Small update, I have tried adding a filter and various combinations of getting the
ray.servelogger. Nothing I do will remove the following example message:I looked at the Ray v1.13.1 source code. The relevant file is
ray/python/ray/serve/http_proxy.py. The log statement is created at Line 310 to 317. The logger is usingray.servefor the name. I verified I am getting theray.servelogger. I think this might be because theHttpProxyis wrapped in theHttpProxyActorclass and logging needs to be configured on a per actor basis, at least based on my understanding and reading of the documentation. However, there is theconfigure_component_loggerfunction within the actor that appears to configure the logger. So, I am kind of stumped.Any guidance or suggestions would be greatly appreciated.