sqlalchemy: Chained joinedload causes duplicate sqla objects, when run with pypy

Migrated issue, originally created by Ashley Chaloner

This bug appears when running in pypy but not cpython. It appears when using sqlite or postgres backends (these are the only two I’ve tested with). The bug does not cause a stack trace, but results in duplicate database rows being returned in non-identical sqlalchemy objects.

Versions

  • pypy: Python 2.7.13 (5.8.0+dfsg-2~ppa2~ubuntu16.04, Jun 17 2017, 18:50:19) [PyPy 5.8.0 with GCC 5.4.0 20160609]
  • cffi==1.10.1
  • greenlet==0.4.12
  • readline==6.2.4.1
  • six==1.10.0
  • SQLAlchemy==1.1.13
  • (if using UUIDType: SQLAlchemy-Utils==0.32.16)

bash script for venv setup

virtualenv -p /usr/bin/pypy ~/test-venv
source ~/test-venv/bin/activate
pip install \
    cffi==1.10.1 \
    greenlet==0.4.12 \
    readline==6.2.4.1 \
    six==1.10.0 \
    SQLAlchemy==1.1.13

# if using UUIDType:
pip install SQLAlchemy-Utils==0.32.16

Python script to reproduce bug

from sqlalchemy import (
    Column,
    ForeignKey,
    Index,
    Table,
    create_engine
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import joinedload, relationship, sessionmaker
from sqlalchemy.types import Integer

Base = declarative_base()


class Base2(Base):

    __abstract__ = True
    attrlist = ["id"]

    # for easier debugging
    def __repr__(self):
        attrs = ["%s:%s" % (a, getattr(self, a)) for a in self.attrlist]
        return "<%s.%s object at 0x%016x: %s>" % (
            self.__class__.__module__, self.__class__.__name__,
            id(self), ", ".join(attrs))


class Scene(Base2):

    __tablename__ = "scene"

    id = Column(Integer, primary_key=True)


class ActScenes(Base2):

    __tablename__ = "act_scenes"
    attrlist = ["id", "act_id", "scene_id"]

    id = Column(Integer, primary_key=True)

    act_id = Column(
        Integer(),
        ForeignKey("act.id"),
        nullable=False)

    scene_id = Column(
        Integer(),
        ForeignKey("scene.id"),
        nullable=False)
    scene = relationship(Scene, lazy="joined")


class Act(Base2):

    __tablename__ = "act"

    id = Column(Integer, primary_key=True)

    scenes = relationship(ActScenes)


acts = Table(
    "acts", Base.metadata,
    Column("play_id", Integer(), ForeignKey("play.id")),
    Column("act_id", Integer(), ForeignKey("act.id"))
)
Index("ix_acts", acts.c.play_id, acts.c.act_id, unique=True)


class Play(Base2):

    __tablename__ = "play"

    id = Column(Integer, primary_key=True)

    acts = relationship("Act", secondary=acts)


def run_test(i):
    """Run one test, and return True if something failed."""

    engine = create_engine('sqlite:///:memory:')
    asessionmaker = sessionmaker()
    asessionmaker.configure(bind=engine)
    Base.metadata.create_all(engine)
    session = asessionmaker()

    play1 = Play()
    act = Act()
    act.scenes = [
        ActScenes(
            act_id=act.id,
            scene=Scene()
        )
        for _ in xrange(1000)
    ]
    play1.acts.append(act)

    scenecounts1 = len(play1.acts[0].scenes)

    session.add(play1)
    session.commit()

    # Comment out this block and watch the bug vanish.
    session.query(Play).options(
        joinedload(Play.acts).joinedload(Act.scenes)
    ).filter_by(id=play1.id).first()
    # End block.

    scenecounts1_again = len(play1.acts[0].scenes)
    if scenecounts1 != scenecounts1_again:
        print "Iteration {} failed".format(i)

        seen_scene_ids = set()
        for scene in act.scenes:
            if scene.id in seen_scene_ids:
                # import pdb; pdb.set_trace()
                print "Duplicate scene.id spotted: {}".format(scene.id)
                scenes_with_id = [s for s in act.scenes if s.id == scene.id]
                print "Scenes with this id:\n{}".format(
                    "\n".join(repr(s) for s in scenes_with_id))
            else:
                seen_scene_ids.add(scene.id)

    return scenecounts1 != scenecounts1_again


if __name__ == "__main__":
    N = 25
    results = [run_test(i) for i in xrange(N)]
    print "{} out of {} failed.".format(sum(results), N)

Sample output

(test-venv) user@host:~$ pypy test_chained_joinedload.py
Iteration 0 failed
Duplicate scene.id spotted: 185
Scenes with this id:
<__main__.ActScenes object at 0x00007f77de383948: id:185, act_id:1, scene_id:185>
<__main__.ActScenes object at 0x00007f77de9c4790: id:185, act_id:1, scene_id:185>
1 out of 25 failed.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19

Most upvoted comments

Michael Bayer (@zzzeek) wrote:

gerrit at https://gerrit.sqlalchemy.org/#/q/I9f6ae3fe5b078f26146af82b15d16f3a549a9032 a patched version is available for early testing at https://gerrit.sqlalchemy.org/changes/504/revisions/1ceb88eb53bdce1fa98c6b044f996fb995645876/archive?format=tgz

thanks for the effort on this great bug report this must have been very difficult to isolate !