sqlalchemy: Filtering in async with relationship doesn't work

Describe the bug

Selection with inload works normally, it loads all the relationship, but if I am also trying to filter this selection, then the relationship for some reason stops loading.

To Reproduce

import asyncio
from pprint import pprint

from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.future import select
from sqlalchemy.orm import (declarative_base, relationship, selectinload,
                            sessionmaker)

Base = declarative_base()

class Node(Base):
    __tablename__ = "nodes"

    id = Column(Integer, primary_key=True, unique=True)
    parent_id = Column(Integer, ForeignKey('nodes.id'), nullable=True)
    children = relationship("Node")

    name = Column(String)


def serialize(o):
    if isinstance(o, Base):
        return {name: serialize(getattr(o, name))
                for name in dir(o)
                if not name.startswith("_")}
    elif isinstance(o, list):
        return [serialize(i) for i in o]
    return o


async def async_main():
    engine = create_async_engine(
        "sqlite+aiosqlite:///./sqlite.db",
        future=True,
        connect_args={"check_same_thread": False},
    )

    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
        await conn.run_sync(Base.metadata.create_all)

    async_session = sessionmaker(
        bind=engine, expire_on_commit=False, class_=AsyncSession
    )

    root_id = 1
    async with async_session() as session:
        async with session.begin():
            session.add(Node(name="root", children=[
                Node(name="1", children=[Node(name="11"), Node(name="12")]),
                Node(name="2", children=[Node(name="21")]),
                Node(name="3", children=[Node(name="31"), Node(name="32")]),
            ]))

    async with async_session() as session:
        async with session.begin():
            stmt = (select(Node)
                    .options(selectinload(Node.children))
                    # neither of those work
                    # .filter(Node.id == root_id)
                    # .filter_by(id=root_id)
                )
            result = await session.execute(stmt)
            pprint(serialize(result.scalars().first()))


asyncio.run(async_main())

Error

Uncomment one of filter or filter_by:

Traceback (most recent call last):
    ...
    raise exc.MissingGreenlet(
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/14/xd2s)
sys:1: RuntimeWarning: coroutine 'Connection.cursor' was never awaited

Versions

  • OS: Windows10
  • Python: 3.10.0rc1
  • SQLAlchemy: 1.4.34
  • Database: sqlite
  • DBAPI (eg: psycopg, cx_oracle, mysqlclient): aiosqlite

Additional context

I expect to see the same result both with filtration and without it:

{'children': [{'children': [{'children': [],
                             'id': 3,
                             'metadata': MetaData(),
                             'name': '11',
                             'parent_id': 2,
                             'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>},
                            {'children': [],
                             'id': 4,
                             'metadata': MetaData(),
                             'name': '12',
                             'parent_id': 2,
                             'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>}],
               'id': 2,
               'metadata': MetaData(),
               'name': '1',
               'parent_id': 1,
               'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>},
              {'children': [{'children': [],
                             'id': 6,
                             'metadata': MetaData(),
                             'name': '21',
                             'parent_id': 5,
                             'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>}],
               'id': 5,
               'metadata': MetaData(),
               'name': '2',
               'parent_id': 1,
               'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>},
              {'children': [{'children': [],
                             'id': 8,
                             'metadata': MetaData(),
                             'name': '31',
                             'parent_id': 7,
                             'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>},
                            {'children': [],
                             'id': 9,
                             'metadata': MetaData(),
                             'name': '32',
                             'parent_id': 7,
                             'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>}],
               'id': 7,
               'metadata': MetaData(),
               'name': '3',
               'parent_id': 1,
               'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>}],
 'id': 1,
 'metadata': MetaData(),
 'name': 'root',
 'parent_id': None,
 'registry': <sqlalchemy.orm.decl_api.registry object at 0x0000017F7F1942E0>}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 31 (23 by maintainers)

Most upvoted comments

I think it’s a typo in mike snippet, since the description talks of set_committed_value.

I’ve updated the snipped

I think if im reading correctly your code is going to create a bunch of pending change events that you dont want. You need to write to a standalone list first, then use set_committed_value(). if you set_committed_value() with an empty list, that does essentially nothing. that list is not used in the object’s state, it’s copied.

def assemble_nodes(in_nodes: List[Node]) -> List[Node]:
    nodes: Dict[int, Tuple[Node, List[Node]]] = {}
    for node in in_nodes:
        nodes[node.id] = (node, [])

    top_ids: Set[int] = set(nodes.keys())
    for node in nodes.values():
        parent = nodes.get(node.parent_id, None)
        if parent is not None:
            top_ids.remove(node.id)
            parent[1].append(node)

    for node, collection in nodes.values():
        set_committed_value(node, "children", collection)

    return [nodes[id] for id in top_ids]


Yeah, ctes are the way ^-^

If I omit some details, then we can get this nice piece of code Works like a charm, correct and fast (now only fractions of seconds)


def assemble_nodes(in_nodes: List[Node]) -> List[Node]:
    nodes: Dict[int, Node] = {}
    for node in in_nodes:
        set_committed_value(node, "children", [])
        nodes[node.id] = node

    top_ids: Set[int] = set(nodes.keys())
    for node in nodes.values():
        parent = nodes.get(node.parent_id, None)
        if parent is not None:
            top_ids.remove(node.id)
            parent.children.append(node)

    return [nodes[id] for id in top_ids]

def one(nodes: List[Node]) -> Node:
    if len(nodes) == 1:
        return nodes[0]
    if len(nodes) == 0:
        raise NoResultFound(
            "No row was found when one was required")
    raise MultipleResultsFound(
        "Multiple rows were found when exactly one was required")

async def node(db: DB, id: int) -> Node:
    cte = select(Node).filter(Node.id == id).cte(recursive=True)
    cte = cte.union_all(select(Node).filter(Node.parent_id == cte.c.id))
    stmt = select(Node).join(cte, Node.id == cte.c.id)
    q = await db.execute(stmt)
    return one(assemble_nodes(q.scalars().all()))

Thaks for help and directions!

OK so , right now, dont use very deep selectinloads because they dont solve your problem anyway, you dont want to run 75 queries deep. we will have to further enhance the new “recusrion” thing to take a numeric depth number and then figure out some way to have it use the same cache entry over and over without actually building a long option.

It’s not taking me 100s of seconds, but there’s definitely something wrong going on. Will open an issue. Thanks for reporting it

there isn’t, but interestingly, as you said that I realized that “selectin” loading is probably the one eager loading where such a thing would be theoretically possible, since it naturally stops as collections are empty. it would also have to detect loops, but that might also resolve naturally as the objects located are already present in the identity map and have loaded collections.

then i was going to say, OK, that’s interesting but it would be a lot of work someday but with 2.0’s new options architecture I can make your test case work with a two line hack in selectinload. that’s interesting. 2.0 is hoped to be early betas by the end of the summer at the earliest so things are a long way away, but I can maybe add an experimental, alpha, not guaranteed to work very well paramter on selectinload called “auto_recurse” and add that to 2.0.