spark: Support of event based (non-blocking) request processing.
Currently, for request processing, SparkJava completely relies on HTTP thread pool of Jetty (by default 8 - 200 threads).
Which in it’s own turn is non-blocking on networking end, but it’s blocking on request processing (business logic) side (Handlers/Filters/Matchers…). Any blocking (I/O bound, JDBC etc) operation has a potential to exhaust the Jetty’s HTTP thread pool.
In that sense, Spark currently is not leveraging existing asynchronous Servlet 3.1 implementation by Jetty.
To increase performance potential of Spark, framework needs to add support of event based (non-blocking) processing based on it’s own thread pool.
Which is currently easily achievable with a combination of Servlet 3.1 + Java 8 CompletableFuture API.
With this combination there is no need for integration with high level Akka or RX frameworks.
Following is a sample code that can actually achieve the goal stated above:
/**
* Simple Jetty Handler
*
* @author Per Wendel
*/
public class JettyHandler extends SessionHandler {
private Filter filter;
public JettyHandler(Filter filter) {
this.filter = filter;
}
@Override
public void doHandle(
String target,
Request baseRequest,
HttpServletRequest request,
HttpServletResponse response) throws IOException, ServletException {
if (NOT_ASYNCH) {
filter.doFilter(wrapper, response, null);
} else {
AsyncContext asyncContext = wrapper.startAsync();
asyncContext.setTimeout(60000);
CompletableFuture.runAsync(() -> {
try {
filter.doFilter(wrapper, response, null);
}
catch (IOException | ServletException ex) {
throw new RuntimeException(ex);
}
}, executor)
.thenAccept((Void) -> {
baseRequest.setHandled(!wrapper.notConsumed());
asyncContext.complete();
});
}
}
}
The SPARKS_OWN_ASYNC_EXECUTOR above may look like this:
private static final ThreadPoolExecutor executor = new ThreadPoolExecutor(200, 200, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
static { executor.allowCoreThreadTimeOut(true); }
If you run a benchmark with significant amount of simultaneously active client sockets, and monitor threads of the active application with change above. You’ll see that Jetty will create multiple times less of HTTP threads (you can identify them by “qtp” prefix) than it typically does. An those created will be nicely alternating between being busy and parked. Instead, there will be Spark’s own thread pool created witch will have all threads being 100% busy under high load. Which is the exactly the goal with event based approach. And in case on of Sparks own thread to block, then it will not degrade the performance of Jetty’s performance.
The idea is to cut the reliance on Jetty’s HTTP thread pool. And not force Spark users do following:
public static void main(String[] args) {
get("/benchmark", (request, response) -> {
AsyncContext ac = request.raw().startAsync();
CompletableFuture<Void> cf = CompletableFuture.supplyAsync(() -> getMeSomethingBlocking());
cf.thenAccept((Void) -> ac.complete());
return "ok";
});
}
(This comment was edited by @tipsy, to fix formatting issues)
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 13
- Comments: 51 (21 by maintainers)
Commits related to this issue
- Issue #549 - async future tests — committed to mj1618/spark by mj1618 7 years ago
- Issue #549 - async impl — committed to mj1618/spark by mj1618 7 years ago
- Issue #549 - use correct response — committed to mj1618/spark by mj1618 7 years ago
- Issue #549 - test comment for sleep — committed to mj1618/spark by mj1618 7 years ago
- Issue #549 - remove unnecessary output — committed to mj1618/spark by mj1618 7 years ago
NOT AGAIN! Asynchronous Spark has already been beaten to death in another issue. -1500456 😦
Not all software we create is intended to grow big. Why waste energy in unblocking a piece of software that is blocking to begin with and then use it for stuff that does not grow big? Now THAT is a waste. Use spark for what it is good at. If you grow big you have time to switch to something that fits your future needs. Only using tools and libraries that are geared towards big-bigger-biggest is lunacy. Why would I want to gear up a bunch of replicated hardware with all the difficulties that brings with it for a system that only needs to process let’s say 10 requests per minute? Blocking IO in that case is a much easier to understand model and Spark is covering that excellently. Keep in mind that a lot of systems are small and that only the very lucky few will grow big. And those that do will go through lots of increments and lots of changes in the tooling.
In fact, if you say on the front page that spark can be used as a great replacement for node, you should provide this functionality… otherwise it’s not entirely true, is it?
I haven’t had time to look through it properly, but my first impression from skimming through the code was good. The internal change seems quite big, and it looks like some refactoring is needed. I’ll have a closer look next year (holidays soon). Ultimately it’s not up to me though.
Hey guys - just had a go at a basic implementation, just so we’ve got something concrete to look at. I know this isn’t good to merge and will get plenty of feedback, but at least it’s something to chat about.
https://github.com/perwendel/spark/pull/953
Basically if you return a CompletableFuture, then it will make the request async. Otherwise everything remains the same as before.
I did break up the two quite long functions, though I understand if this is too much change and I’d be happy to rewrite it based on your comments.
Thoughts?
#208 - spark is simple and blocking. that is why it is simple to use.
You know what? I give up.
I encountered this thread while researching the framework. I understand that the spark developer would trade it for the ease of usage (not suitable for some cases, but good selling point).
What I don’t understand is why you let ppl like @ruurd, who clearly doesn’t understand how and why asynchronous processing could mitigate IO latency, kept trolling almost every single issue about
async
support.Frankly, I was though that @ruurd is one of
spark
maintainers (which is unbelievable)Read above and #208 then judge it yourself
Hehheh… I also have to find some time to do that. After doing my 8-hour shift at work.
@phongphan I disagree about @ruurd trolling and that he “clearly doesn’t understand”. His posts are a bit abrasive and exaggerated, but he makes some good points, and he understands the intention behind Spark. We’re considering async for v3, but we’ll only do it if we can maintain the simplicity Spark currently has.
Only if you can make it work without bothering @ruurd 😁
I know that seems hard since he’s bothered by people even discussing it, but if you manage to introduce it to the framework in a way that non-async people don’t notice it, I think it should be included.
I think @mlengle is right.
no modern software should be blocking
this is the whole point. Blocking is simple, but some kind of a waste. All of us agree blocking should be avoided if you want to grow big.@ruurd @Lewiscowles1986 There’s Play framework heavily built upon async principles, check it if you’re curious. There’re things like streaming and websockets which create long living connections by their definition. And yeah, there’re long latency services, sometimes because of a bad architecture, sometimes because of huge amount of data to process, or heavy load with limited hardware resources and the storage systems that may have their own failures, that’s reality and web frameworks have to deal with real world problems, not with the perfectly polished backend code. And we’re talking here not about some ugly hack, but about a popular principle in modern software development: “don’t block”
hey @ruurd yea I read your comments - you make some good points and I really respect your opinion.
Spark is simple and lets keep it that way - 100% agree with you on that, I think most people do. And given most Java libraries are sync + blocking (e.g. jdbc protocol itself is sync blocking) most request handlers are going to be the same. However it does seem the community is keen to at least have the option of async on the odd occasion it comes in handy.
For me at least, I’d like to have the option to use an async library in my request handlers. E.g. an async http client, or a call to a streaming application such as kafka.
The reason is not to offload to another thread pool - I agree thread shifting is just unnecessary overhead. But for truly non-blocking calls (similar to what NIO provides) it can be better to not chew up a thread waiting for bits to go over the wire.
But considering lean principles I think it’s prudent to introduce only the most basic capability though - and to keep the default sync + blocking. Then we can see how the community responds - and see if it actually does open up any opportunities. If not - you can always remove it! A lot of great open source projects remove features that made sense at the time but turned out to be unnecessary, or were surpassed - e.g. GC in Rust.
One downside to
request.async()
is that there is still a required return value - so this might be confusing when doing async. Express gets around this by having the response passed to theresponse
object and not having a return value. But that would change the protocol here.How about we just handle the return value being a Future? That way it would change nothing about the current interfaces, so to the average sync+blocking handlers it would make no difference. And when we see a Future as the return value, we async the request and respond on the future completing?
If it’s ok with you @ruurd I’ll code up some sample code for you and @tipsy to review.
I think it’s good to have the option to respond to a request asynchronously. I don’t think it has to compromise the simplicity of the current blocking features.
Simply allowing the
async()
method will allow people to use libraries that provide callbacks instead of blocking for IO. In this case I couldn’t find a way to make Spark allow the response to happen on this callback.startAsync()
doesn’t seem to work for me, the output stream seems to get closed anyway.I think it’s an important feature as more and more java libraries out there are moving towards non-blocking IO, and by making this feature optional to Spark users means the framework has more flexibility. And as long as it’s an optional one I don’t see how it would make anything more complicated.
Tbh I don’t see the point of another Thread pool for this, that doesn’t seem necessary to accomplish async.
Is this something you’re keen to have? If so I don’t mind doing some work on it.
@tipsy Cause this is completely wrong way to the non-blocking web layer 😃 The right way with jetty is mentioned in the docs https://wiki.eclipse.org/Jetty/Feature/Continuations#Using_Continuations
so, we should have separate async handler which suspends continuation, then async-supporting controller code should return CompletionStage, and we add processing stage like this:
@ruurd nice trolling, but no. and reread the link I’ve sent, it’s not about async io