silverstripe-framework: [GRIDFIELD] BUG: pagination in GridField does not work if DataObject default_sort field values are not unique

SUMMARY:

Pagination does not work well when the data is provided by MySQL 5.6 / 5.7 (php 7.1) where the data-set is sorted by a field (or fields) that are not enforced to be unique on a database level. The reason for this is that MySQL may return datasets in a random order (beyond what it has been asked to sort by).

You can possibly replicate this bug by yourself by using the tools listed below:

This means, for example, that:

  • batch processing may not work correctly (you will process some records twice and other ones not at all)
  • grid field pagination is broken (some items listed twice or more, on separate pages, others not at all)
  • pagination on the front-end is broken (some items listed twice or more, on separate pages, others not at all)

There are still a lot of questions about this bug, but I have been able to consistently demonstrate this error on several machines with clean installs of 3.6.1

Details

the issue: when the default_sort value for a dataobject contains non-unique values for one or more rows then the GridField ends up NEVER showing some records while showing other records more than once:

How to replicate:

  1. install SS 3.6.1 using standard installer

  2. add the following dataobject (MyDataObject):


<?php


class MyDataObject extends DataObject
{

    private static $db = array(
        'Title' => 'Varchar(20)',
        'SameSame' => 'Varchar(20)'
    );


    private static $default_sort = array(
        'SameSame' => 'ASC'
    );

    function requireDefaultRecords()
    {
        DB::query('DELETE FROM MyDataObject');
        for($i = 1; $i < 47; $i++) {
            $filter = array('Title' => "item #".$i);
            if(! MyDataObject::get()->filter($filter)->first()) {
                $obj = MyDataObject::create($filter);
                $obj->SameSame = 'same';
                $obj->write();
            }
        }
    }

}

then add modeladmin (MyModelAdmin.php):


<?php

class MyModelAdminTest extends ModelAdmin {

    private static $managed_models = array('MyDataObject');

    private static $url_segment = 'test';

    private static $menu_title = 'Test';
}

You get the following results: image

image

For my client, I was sorting a GridField by EndDate and StartDate. Some of the items has the same EndDate and StartDate and thus were missing. Making it impossible for them to be found by the client.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 56 (43 by maintainers)

Most upvoted comments

I think some research may be useful:

  • what are the consequences of adding a sort by ID ASC to all sql statements (after any other sort statements - if any), so that you always have a predictable outcome? This would be my solution and apart from performance reductions, I can’t see any other reason why this is not a good solution. [We may consider only doing this lists that are “limited”).

  • I noticed that dataobjects do sort randomly but pages do not and I wonder why that is. It would be worthwhile to work out why (maybe to do with joins?)

Also - while no one has noticed this issues, this is not a benign one. At least two of my clients noticed it (and were unable to edit data because of it) in gridfields (where I had set the sort order to a field that was something like a date field).

Summary:

  • When an ambiguous sort column is used, the database is free to return the rows in any order it wants, making the operation non-deterministic.
  • This makes pagination unpredictable (e.g. repeated entries across pages)
  • As an implementor of Sortable, DataList should sort deterministically.
  • Therefore, we should patch it
  • Forcing ID ASC into the query is too much magic. Something like default_sort_column would be more appropriate
  • The bug has existed at least since SS2, and no one raised it until now, so impact/low is fitting.

A challenge I’ve considered with a default_sort_column is that there needs to be a way to tell a datalist to reverse it in the case where a non-default sort column (E.g. Title) is reversed, otherwise groups of rows with the duplicate Title column would not reverse. That makes the solution not able to be applied so transparently.

We’ve had a bit of a chat about this internally, and there’s some consensus that a halfway measure to mitigate this is appropriate.

First, the condition of MySQL returning a non deterministic result set when ambiguous sort columns are used is absolutely a feature, not a bug. From the MySQL docs:

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns. One factor that affects the execution plan is LIMIT, so an ORDER BY query with and without LIMIT may return rows in different orders.

While MySQL is providing the “expected result” (albeit non deterministic) in this case, it’s fair to say DataList is not. As a implementor of the Sortable interface, DataList is expected to sort deterministically. Further, it is the job of the ORM to obscure the “how” from the user, and create the list in a consistent, uniform way, whether you’re using MySQL, Postgres, etc, or just plain ArrayData.

All that said, to force ID into the sort is still too much magic, but ensuring default_sort is added seems like a reasonable halfway point. The problem is, with the direction being included as part of the default sort, it makes sort direction toggling (I.e. GridField column headers) unreliable. Therefore, a more sensible setting might be something more like default_sort_column, with null being allowed to disable forced deterministic sorts.

Thoughts?

The best practice, in these scenarios, is to encourage developers to implement composite default_sort values for their models, if they are aware that their sort column is not unique. I concur with @unclecheese on this.

E.g.

private static $default_sort = '"BaseTable"."Sort" ASC, "BaseTable"."ID" ASC';

These can also be specified via array syntax.

private static $default_sort = [
	'Sort' => 'ASC',
	'ID' => 'ASC'
];

I believe the array syntax supports ORM to SQL column mapping under the hood, so that might be preferable to raw SQL fragments. 😃

Shall we make the outcome of this problem a documentation update?

I’m not convinced this is a SilverStripe issue. The way that MySQL resolves an ambiguous sort is highly unpredictable. See (https://dba.stackexchange.com/questions/6051/what-is-the-default-order-of-records-for-a-select-statement-in-mysql).

MySQL sorts the rows any way it wants to, without any guarantee of consistency.

That I can’t replicate it should be an indication that we’re dealing with some kind of unknown. It could be as simple as differing storage engines, MySQL versions, or both. In short, don’t use ambiguous sorts.

Have you tried adding ID as the last item in the list? This does look like a genuine bug, but perhaps adding ID would be a workaround for it