ponyc: Segmentation fault when actor receives a reference to itself via a class created in a different actor.
_EDIT:_ Skip down to https://github.com/ponylang/ponyc/issues/1118#issuecomment-238431412 to see the most reduced example of the issue.
I gutted the HTTP server down to this, which I think is a reproduction of the seg fault in #937. I kept the original type names so that they could be somewhat related back to that package if necessary. Seems to have something to do with the partial function this~answer()
. At least if I interrupt anything after that assignment, it the seg fault disappeared.
interface val ResponseHandler
fun val apply(request: Payload val, response: Payload val): Any
interface val RequestHandler
fun val apply(request: Payload): Any
primitive Handle
fun val apply(request: Payload) =>
(consume request).respond(Payload)
actor _ServerConnection
let _handler: RequestHandler
new create(h: RequestHandler) => _handler = h
be dispatch(request: Payload) =>
request.handler = recover this~answer() end
_handler(consume request)
be answer(request: Payload val, response: Payload val) =>
None
class iso Payload
var handler: (ResponseHandler | None) = None
fun iso respond(response': Payload) =>
try
let h = (handler) as ResponseHandler
h(consume this, consume response')
end
actor Main
new create(env: Env) =>
let t = Test
for i in Range(0, 10_000) do
t.do_it()
end
actor Test
be do_it() =>
_ServerConnection(Handle).dispatch(Payload)
This could probably be further reduced, but I wanted to maintain at least a slight semblance to the original code… and it’s the middle of the night so I’m going to 😴.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 57 (52 by maintainers)
@peterzandbergen - it definitely still needs to be fixed, and it’s still marked as high priority. Runtime segmentation faults are not acceptable .
To summarize what we discussed in the Zulip thread:
The current Pony runtime has a correctness bug due to what is usually a valid optimization, but in this case is not.
Specifically, the Pony runtime traces immutable (
val
) objects shallowly - that is, it skips tracing of fields within such objects. This saves time by reducing how much tracing has to happen, and it is described as a safe optimization in the ORCA paper, because the “outer”val
object acts as an upper bound on the lifetime of the “inner” objects referred to by its fields.While that optimization is safe within the limited scope of what was considered in the ORCA paper, the reasoning ignores the counting of actor references (which was outside the scope of the ORCA paper).
If a
val
object has references (either directly as its fields, or transitively as fields of its fields) to any actors, those actors need to be traced. Hence, for such an object we cannot keep this optimization in place.But for
val
objects which are known via static analysis to not possibly refer to any actors, this optimization is safe and we’d like to keep it in place if possible, to keep the part of the benefit of this optimization for some workloads.As such, we want to add a new kind of static analysis to the compiler that can classify any given data type as “definitely contains no actor references” or “may possibly contain an actor reference”. If we can mark an type with the internal designation
contains_no_actors
, then it is valid for that type to participate in the above mentioned optimization, and the compiler should generate a trace function for that type which uses the optimized path when immutable. Otherwise, it would need to take a new pessimistic path for the sake of correctness, tracing it at runtime so that any actors it may contain are traced.To determine if a type should be marked as
contains_no_actors
:If any field type is an
actor
type, or a composite type (tuple, union, intersection) referring to anactor
type => returnfalse
.If any field type refers (possibly within a composite type) to a type which is not marked
contains_no_actors
=> returnfalse
.contains_no_actors
, and every time we recurse into a type we push it onto that list, such that we will surely terminate and no type will mark itself ascontains_no_actors
without some cause which is not itself.If the type under consideration is an abstract type (such as an
interface
ortrait
), and reachability analysis shows that the abstract type has in the reachable program subsumed any type which is not markedcontains_no_actors
=> returnfalse
.Otherwise, the type has been shown to not possibly contain any actors => return
true
.no one is suggesting not fixing. the concern is, can we find a way to fix without having a performance impact.
I think I got it. This is a premature free issue.
Full tracing of
val
objects sent in messages is deferred until a GC cycle on the owner. This means the_ServerConnection
assigned to thePayload
and then sent in a message (via thePayload
) has a reference count greater than 0 but isn’t aware of it until thePayload
is traced. By then, the_ServerConnection
might have been GC’d, which of course results in bad stuff happening.@SeanTAllen Thanks, and yes, with the
Main
andTest
actors from the example above it, it should reproduce the seg fault. I’ll put a full version of the smallest example in this comment.It’ll be interesting to see if this actually fixes the httpserver example. There was some other odd behavior that I noted in issue #937 which is where my investigation began, but this is certainly a piece of the puzzle if not the entire thing.
Here’s the most simplified yet complete code that I could find to reproduce the issue:
Notes:
_ServerConnection
is assigned to thePayload
via a partial function binding instead of being assigned directly.Payload
object is created directly indispatch()
instead of at the call location indo_it()
, there’s no seg fault.None
to the.handler
field instead of the_ServerConnection
object, there’s no seg fault.Range
with10_000
, than with100
.