exo: Proposal for Returning Relevant Cursors

Problem:

We have been talking about having scheduling ops support the idea of “Returning relevant cursors”. For example, when using stage_mem, we generate an allocation (which we might wanna further schedule to for example set the memory), and at least one of a load stage or store stage:

def foo(n: size, A: i8[n] @ DRAM):
    assert n % 4 == 0
    for io in seq(0, n / 4):
        for ii in seq(0, 4):
            A[4 * io + ii] += 0.2

foo = stage_mem(foo, "for ii in _:_", "A[io*4:io*4+4]", "tile")

def foo(n: size, A: i8[n] @ DRAM):
    assert n % 4 == 0
    for io in seq(0, n / 4):
        tile: i8[4] @ DRAM      <----- Allocation
        for i0 in seq(0, 4):        <----- Load Stage
            tile[i0] = A[i0 + 4 * io]
        for ii in seq(0, 4):
            tile[ii] += 0.2
        for i0 in seq(0, 4):        <------ Store Stage
            A[i0 + 4 * io] = tile[i0]

It would be convenient if scheduling operations returned cursors to those relevant pieces of the procedure. Currently the alternative to this is using the argument cursor as an anchor and using navigation to get to such cursors from the scheduling code. The goal is to offload that work from the user to the tool.

Proposals: Proposal 1: Return a dataclass algonside the proc. The dataclass contains fields which are the relevant cursors.

Definition goes along with the scheduling operations definition, serves as a documentation of what can be returned:

@dataclass
class stage_mem_cursors:
        alloc: AllocCursor = InvalidCursor()
        load_stage: AssignCursor | LoopCursor = InvalidCursor()
        store_stage: AssignCursor | Reduce_Crursor | LoopCursor = InvalidCursor()

Usage:

foo, cursors = stage_mem(foo, "for ii in _:_", "A[io*4:io*4+4]", "tile")
load_stage = cursors.load_stage # Save for later
foo, cursors = set_memory(foo, cursors.alloc, DRAM_STATIC)

Advantages:

Explicit API
Uniform return values convention (Procedure, dataclass of relevant cursors)
It is easy to be forward compatiable and add more cursors in the future

Disatvantages:

Have to define a dataclass per scheduling op
Might be too verbose: cursors.load_stage
Users may need to name for each returned dataclass to save them as temporary varaibles. Alternatively, overwrite previous one just like we do for proedures; you need to manually save which cursors you actually care about. So, the advantage your really save is not having to do a lot of navigation which I think is worth it.
Unclear what to return when there is nothing to return. Some “Null” dataclass.

Proposal 2:

Similar to proposal 1, but we keep the return value of primitives to be just a procedure. Then, we implement a standard library wrapper around each primitive that returns a procedure and a dataclass of relative cursors.

Advantages:

Keeping the primitives implementation as small as possible.

Disatvantages:

Duplicate shceduling operations that perform the same rewrite.
Naming problems:
- Use same name
- or add a suffix to the wrappers e.g. _c.

Proposal 3: Instead of returning multiple relevant cursor, we can return exactly on relevant cursor along with the procedure. In the case of stage_mem, this could be the new block cursor including the load stage, the original block we staged around, and the store stage.

There is again the question here of whether this would a part of the primitive or a standard library wrapper.

Advantages:

Might be easier to use since you don’t have to think about the name bindings (in a dataclass or a dictionary) anymore.

Disatvantages:

Returning one curosor is not alawys a viable options when the relevant cursor are not co-located i.e. they cannot be all grouped in one block. It is esepcially difficult with operations that could have an aribitrary number relevant cursors in arbitrary places in the procedure.
We will still require a fair amount of navigation in the user-code.

About this issue

Original URL
State: open
Created 6 months ago
Comments: 18 (7 by maintainers)

Most upvoted comments

Re auto-coercion, note that the Argument Processors do this. That is one concept of coercion. Another concept concerns totally standard liftings between different SchedOp signatures. eg an op with signature (p,c,…) -> p can be lifted to an op with signature (p,c,…) -> (p,c) implicitly (a coercion) whenever supplied as input to repeat or a similar combinator.The argument processors are currently a bit tricky to disentangle from the compiler implementation into a user-level concept, which is why I’m suggesting turning that (and more general coercions, op decorators, etc) into one or more future proposals.— GilbertOn Dec 26, 2023, at 5:36 PM, Samir Droubi @.***> wrote:

Yeah, I agree with you that there should be a supported scheduling mode that allows that user to ignore anything cursors related. I think the consise arguemnt you described makes sense, it is not clear what it does from looking at it, but it is in every operation and will be common in code that users will just learn. Agreed. I agree it should be okay to expect the user to unpack the dataclass in that one off case. Agreed. I am not sure exactly what you mean by auto-coersion, it would be great if you could elaborate more on this. I was generally thinking that we would extend the @sched_op decorator so that it automatically adds the new default argument rc=1. We can potentially make the dataclass be automatically generated and associated with a scheduling op via the same decorator or a different one: @returned_cursors(“alloc”, “load”, “block”, “store”) which then would serve as a docuemntation. If we go with the two decorators apporach, then the sched_op decorator should likely enforce the returned_cursors decorator. How existing and new decorators can be exported to the user world is really important discussion as well. The current decorator and argument processors can greatly simplify the user ops implementation and provide easy documentation. I think simply exporting them as is may not be a terrible idea since the almost all return API cursors. There are some exceptions (NewExprA) where the return value is a non-API object. One option is to not export any thing that could return a non-API object for now and make the user-level argument signature be a string instead which is what you would do now anyways. Another remaining point of the discussion is (1.c). How should this be implemented?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

gilbo on Dec 27, 2023