serve: Java index out of bounds exception when running many requests through server

Context

Trying to loadtest a torch serve model to gauge performance on a custom handler.

torchserve version: 0.2.0
torch version: 1.6.0
java version: openjdk 11.0.8
Operating System and version: Debian via the python 3.7-buster image.

Your Environment

Are you planning to deploy it using docker container? [yes/no]: yes
Is it a CPU or GPU environment?: CPU
Using a default/custom handler? custom
What kind of model is it e.g. vision, text, audio?: feed forward for custom input.
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? from model store
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: number of netty threads=32

Expected Behavior

Expected torch serve not to throw this error or understand what properties of the environment I could change to address it. It only seems to happen on medium load.

Current Behavior

With a load of ~5rps and varying batch size and CPU memory and count allocations the server will throw an errors in ~4%+ of requests.

Failure Logs [if any]

2020-10-17 00:16:41,887 [INFO ] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_MODEL_LOADED 2020-10-17 00:16:41,887 [ERROR] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - Unknown exception io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException: readerIndex(1021) + length(4) exceeds writerIndex(1024): PooledUnsafeDirectByteBuf(ridx: 1021, widx: 1024, cap: 1024) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:404) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:371) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IndexOutOfBoundsException: readerIndex(1021) + length(4) exceeds writerIndex(1024): PooledUnsafeDirectByteBuf(ridx: 1021, widx: 1024, cap: 1024) at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1477) at io.netty.buffer.AbstractByteBuf.readInt(AbstractByteBuf.java:810) at org.pytorch.serve.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:56) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)

Thank you in advance for any help you can provide!

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 38 (16 by maintainers)

Most upvoted comments

@harshbafna I was debugging this further and observed the following: Python backend sends the complete response of all the batched request , but when the frontend server gets its , its fragemented. Example for the below scenario , for the total response size of 500777 , the Message decoder gets the fragments

2020-12-15 03:36:08,577 [INFO ] W-9000-bert-base-nli-mean-tokens-embeddings_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #DEBUG 500777 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 65536 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 131072 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 196608 2020-12-15 03:36:08,578 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 262144 2020-12-15 03:36:08,578 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 327680 2020-12-15 03:36:08,580 [ERROR] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - Unknown exception io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException: readerIndex(327678) + length(4) exceeds writerIndex(327680): PooledUnsafeDirectByteBuf(ridx: 327678, widx: 327680, cap: 524288)

I suspect the issue is caused due to incorrect decoding of these fragments , What are your throught on this ? Shouldnt the reassembly of these fragments be done at a lower level and then be decoded at the application level ?

punshriv on Dec 15, 2020

@harshbafna

Input to test:

curl -X POST
http://XXXXXXXXX/predictions/distilbert-base-uncased-distilled-squad-reader/3.0
-H ‘content-type: application/json’
-d ‘{ “q_id”: “1”, “text”:“Who is the prime minister of India ?”, “content”: “Narendra Damodardas Modi is an Indian politician serving as the 14th and current Prime Minister of India since 2014” }’

This should return the following response:

{ “q_id”: “1”, “answer”: “narendra damodardas modi” }

config.properties: inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 number_of_netty_threads=32 job_queue_size=1000 model_store=/home/model-server/model-store

punshriv on Dec 14, 2020

So I re-ran with a batch size > 1 and the logging enabled. I am seeing some non-alpha numeric characters for the batch output: “:”, “)”, “(”, “>”, and “-” are all present in the output. However, these are also present in the batch_size=1 run. Not sure if this could cause the issue. Other than that, the shapes of the outputs all look correct batch-wise.

When I run it through the model for just a single request, the one-batch request also looks the same with the 0-th dimension just being length 1 instead of batch_size.

I am doing the dimension handling myself. For example, when I handling the input data, I run a for-loop over the items in the batch in my pre-processing (assuming each item is a request body) and then torch.cat along dimension 0 for them. So then the output tensors have shape [batch-size, D], which I pass along.

adumit on Oct 20, 2020

I’m also more than happy to continue helping try and uncover what is causing the netty thread issue for larger batch sizes

It would be great if you could help bring this to closure 😃.

harshbafna on Oct 20, 2020

@harshbafna good call on the batch size check… batch_size=1 appears to totally fix the issue. The model is small enough to run sub 50ms latency without batching. I’ll test with the logging statement next and batch_size > 1.

Fine to close this issue if you’d like to stop investigating for now as this solves my immediate need. However, I’m also more than happy to continue helping try and uncover what is causing the netty thread issue for larger batch sizes. Let me know.

adumit on Oct 20, 2020