GRDB.swift: Wrong association aggregate count when using HasManyThrough, HasMany and Self Joins

I have a Task record which has a HasManyThrough association (many-to-many connection) with a Tag record. The Task record also has a HasMany association (one-to-many connection) to other Task records using Self Joins. Thus, the Task record has two types: project and subtask:

struct Task: Identifiable {
    var id = UUID()
    var projectId: UUID?
    var title: String
    var status: TaskStatus
}

extension Task: TableRecord {
    // Projects and tasks.
    static let subtasks = hasMany(Task.self, key: "subtasks")
    static let project = belongsTo(Task.self, key: "project")
}

I want to select all projects and associated tags and subtasks. The following works as expected:

let request = Task
    .all()
    .including(optional: Task.project)
    .including(all: Task.subtasks)
    .including(all: Task.tags)
    .annotated(with: Task.tags.count)

I also want to include the total count of subtasks. Thus, I have added annotated() below:

let request = Task
    .all()
    .including(optional: Task.project)
    .including(all: Task.subtasks)
    .including(all: Task.tags)
    .annotated(with: Task.subtasks.count.forKey("totalSubtaskCount"))

totalSubtaskCount is correct. However, I also want to include the total count of active subtasks. To achieve this, I have tried to use sum():

let request = Task
    .all()
    .including(optional: Task.project)
    .including(all: Task.subtasks)
    .including(all: Task.tags)
    .annotated(with: Task.subtasks.count.forKey("totalSubtaskCount"))
    .annotated(with: Task.subtasks.sum(Task.Columns.status == TaskStatus.Active).forKey("remainingSubtaskCount"))

If the number of associated tags is equal to 1, totalSubtaskCount is correct. However, if the number of associated tags is > 1, the sum is multiplied by the number of associated tags. For example, if a project has 3 associated subtasks and 2 associated tags, totalSubtaskCount becomes 3*2=6 when the correct number is 3.

Environment

**GRDB flavor(s): GRDB + GRDBCombine **GRDB version: 5.0.0-beta.2 **Installation method: SPM **Xcode version: 11.5 beta 2 (11N605f) **Swift version: 5.2.2 **Platform(s) running GRDB: iOS **macOS version running Xcode: 10.15.4 (19E287)

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 22 (14 by maintainers)

Most upvoted comments

Sorry for being quiet for so long, @groue. Due to a new permanent job, I havn’t had time to do any iOS development since May. Hopefully I will be able to continue on my project (where I use GRDB) later. I’ll promise to give it a new try when I get back in business 😃

simensol on Sep 6, 2020

Sorry for the confusion, @groue! Here is the minimum schema:

Task.swift:

struct Task: Identifiable {
    var id = UUID()
    var projectId: UUID?
    var title: String
    var status: TaskStatus
}

extension Task: TableRecord {
    // Projects and tasks.
    static let subtasks = hasMany(Task.self, key: "subtasks")
    static let project = belongsTo(Task.self, key: "project")
}

extension Task: Codable, FetchableRecord, MutablePersistableRecord {
    // Define database columns from CodingKeys.
    enum Columns {
        static let id = Column(CodingKeys.id)
        static let projectId = Column(CodingKeys.projectId)
        static let title = Column(CodingKeys.title)
        static let status = Column(CodingKeys.status)
    }
}

Tag.swift:

struct Tag: Identifiable {
    var id = UUID()
    var name: String
}

extension Tag: TableRecord {
    static let taskTags = hasMany(TaskTag.self)
    static let tasks = hasMany(Task.self, through: taskTags, using: TaskTag.task)
}

extension Tag: Codable, FetchableRecord, MutablePersistableRecord {
    enum Columns {
        static let id = Column(CodingKeys.id)
        static let name = Column(CodingKeys.name)
    }
}

TaskTag.swift:

struct TaskTag: Codable, FetchableRecord, MutablePersistableRecord {
    var taskId: UUID
    var tagId: UUID
}

extension TaskTag: TableRecord {
    static let task = belongsTo(Task.self)
    static let tag = belongsTo(Tag.self)
}

extension TaskTag {
    enum Columns {
        static let taskId = Column(CodingKeys.taskId)
        static let tagId = Column(CodingKeys.tagId)
    }
}

And here is a a minimum request:

let request = Task
    .all()
    .including(optional: Task.project)
    .including(all: Task.subtasks)
    .including(all: Task.tags)
    .annotated(with: Task.subtasks.sum(Task.Columns.status == TaskStatus.Active).forKey("remainingSubtaskCount"))

My original code is very complex, so this is cut-and-paste (that is, not complied). Let me know if you need anything more!

simensol on May 25, 2020

Hello @simensol. Thanks for investigating a little more.

You are right, I think count better expresses the need of your application. It’s easier to read than the sum of 0/1.

But this sum of 0/1 is equivalent to a filtered count. There is no good reason for this sum to be wrongly computed.

The way count is implemented (using the DISTINCT sql modifier) is a sign that something is wrong in the way aggregates are computed. I know that some tests will break if I remove that DISTINCT. But there is no such equivalent of DISTINCT for sum and other mathematical aggregates. Conclusion: we have a plain bug that needs to be fixed. #778 is an early stage pull request that fixes it.

groue on May 19, 2020

That seems reasonable, @groue. If you make any changes to the beta, I can test it on my database. Here is a simplified version of the sql query produced by GRDB for my initial solution:

SELECT
   "task1"."id",
   "task1"."projectId",
   "task1"."title",
   "task1"."status",
   COUNT(DISTINCT "task3"."id") AS "totalSubtaskCount",
   SUM("task3"."status" = 'Active') AS "remainingSubtaskCount",
   COUNT(DISTINCT "tag"."id") AS "tagCount",
   "task2".* 
FROM
   "task" "task1" 
   LEFT JOIN
      "task" "task2" 
      ON "task2"."id" = "task1"."projectId" 
   LEFT JOIN
      "task" "task3" 
      ON "task3"."projectId" = "task1"."id" 
   LEFT JOIN
      "taskTag" 
      ON "taskTag"."taskId" = "task1"."id" 
   LEFT JOIN
      "tag" 
      ON "tag"."id" = "taskTag"."tagId" 
GROUP BY
   "task1"."id"

The above query produces remainingSubtaskCount which are multiplied by the number of associated tags (tagCount).

simensol on May 15, 2020

I have experienced much more trouble in my years using vanilla sqlite, so when I say that GRDB makes my sqlite life fun, its no understatement 😉 When started with GRDB I often wrote my own queries. However, after learning more, most of these queries are now replaced by GRDB.

As for the mental model, I think the documentation does quite well. At least compared to other libraries I have worked with. Especially the figures in The Types of Associations section have been invaluable to me ❤️

I will keep your 3 suggestions in mind. Thanks, again!

simensol on May 14, 2020