powertools-lambda-python: Bug: AWS Powertools Logger does not display utf-8 encoded (e.g. Japanese) characters consistently with print() or native logging library
Expected Behaviour
{"level":"INFO","location":"<module>:8","message":"スコビルデモ2","timestamp":"2023-12-08 15:21:28,502+0000","service":"service_undefined"}
スコビルデモ2
INFO:root:スコビルデモ2
Current Behaviour
{"level":"INFO","location":"<module>:8","message":"\u30b9\u30b3\u30d3\u30eb\u30c7\u30e2\uff12","timestamp":"2023-12-08 15:21:28,502+0000","service":"service_undefined"}
スコビルデモ2
INFO:root:スコビルデモ2
Code snippet
from aws_lambda_powertools import Logger
import logging
japanese_string = "\u30b9\u30b3\u30d3\u30eb\u30c7\u30e2\uff12"
logger = Logger()
logger.info(japanese_string)
print(japanese_string)
import logging
logging.basicConfig(level=logging.INFO)
logging.info(japanese_string)
Possible Solution
Use a custom formatter via Logger(logger_formatter=formatter)
- although I’d need to do some research as to what would be the correct formatter to fix this issue, it does seem e.g.
class LocalPlainFormatter(LambdaPowertoolsFormatter):
# pylint: disable = no-self-use, too-few-public-methods
"""Converts from the JSON structured data to a plain logging format"""
def serialize(self, log: dict[Any, Any]) -> str:
"""Converts from the JSON structured data to a plain logging format.
:param log: A JSON log record.
:return: Plan logging format version of the input `log`.
"""
timestamp = log["timestamp"]
levelname = log["level"]
func_name_location = log["location"]
message = log["message"]
exception_name = log.get("exception_name", "")
traceback = log.get("exception", "")
local_context = log.get("local_context", "")
if exception_name or traceback:
exception_info = f" - {exception_name} - {traceback}"
else:
exception_info = ""
return f"{levelname} — {timestamp} — {func_name_location} — {message}{local_context}" \
f"{exception_info}"
Can work around this issue, at the expense of losing the structured logging
I also tried
def myserializer(x):
return json.dumps(x, ensure_ascii=False)
logger = Logger(json_serializer=myserial)
Which I thought might be the issue, but it seems no matter what I pass to json_serializer
I get the same output
Steps to Reproduce
- aws_lambda_powertools-2.29.0
- python 3.11
Powertools for AWS Lambda (Python) version
latest
AWS Lambda function runtime
3.10
Packaging format used
Lambda Layers
Debugging logs
No response
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 21 (15 by maintainers)
@bml1g12 could you try the latest version and let us know, please? 2.29.1 on
pip install
and Lambda Layer version 53@bml1g12 Verdict – we’re going with option 3… changing the default. We did our due diligence as much as possible and consider this to be a long standing bug.
Main reasons but not limited to why we’ve decided to flip the default:
ensure_ascii=False
only affects non-ascii (~99% of the use cases atm).and no performance impact when toggling so. Python 2 and Python 3 would’ve been a different story (unicode obj vs str).hyperfine
test). Largely due to doing less checks typically done for non-ascii that needs to be reverted and escaped.TypeError
shared earlier is related to a non-JSON serializable type which is already handled today via thedefault
method.ensure_ascii=False
as the default tooGreat, thank you @bml1g12 for gathering everyone’s consensus (say hello to team on our behalf!) – We’re running some additional tests and benchmark today for option 3.
Did additional research on weekend in CPython code, Loguru and Requests library, old CPython bug tracking, Python 2 vs Python 3 necessary changes, to make us more comfortable with optional 3 – if our benchmark and additional tests work out nicely today, we’d be comfortable with the risk, provide a quick way to rollback if we must, and if we do need to rollback we’ll make it the default in v3.
I think I found a solution but we’d need some form of mutation testing to put my mind at ease — using ensure_ascii=False + our internal JSON Encoder default method possibly combined with a singledispatch.
I’ll ping Nate from the SDK team who’s got experience with this ~6 years ago in the requests module back in the days.
That should solve the edge cases since we’ve had this working for ~4 years and only discovered it now.
On Fri, 8 Dec 2023 at 20:57, Benjamin Lowe @.***> wrote:
@leandrodamascena got to jump on a meeting I can’t reschedule, could you take over, pretty please? Need to do some extensive exploratory testing besides what we have just in case.
There could be data loss when converting to
str
when getting it back too… hmmm tricky. If we can’t be 99% sure, I’d add under a flag, and document the behaviour