jabref: Duplicate groups in an input file are not detected

JabRef version 5.2, 5.3, master commit 049acb9 on Linuxmint-20.1.

JabRef 100.0.0 Linux 5.8.0-45-generic amd64 Java 14.0.2 JavaFX 16+8

Duplicated groups in different branches of the group tree are not detected when a database that contains them is read in.

But on a second thought, and after playing a bit with JabRef 5.x master branch, I start thinking that this is not a bug but actually a very useful feature! I’ve opened a detailed discussion in https://discourse.jabref.org/t/hierarchical-groups-with-duplicated-names-are-actually-working-in-jabref-5-x/2619. So don’t fix it!

Steps to reproduce the behavior:

  1. Start a new JabRef process, create a new library, add an entry, save the database as ‘new.bib’ (see the attached ‘biblio.zip’);
  2. Show the groups interface, create two groups ‘Human’ and ‘Computer’, create a subgroup ‘Languages’ in ‘Human’, save over the ‘new.bib’;
  3. Assign the new entry to ‘Human/Languages’;
  4. Copy ‘new.bib’ to ‘new-duplicate-group.bib’;
  5. Open the ‘new-duplicate-group.bib’ with a text editor, and duplicate the ‘Languages’ group in the ‘Computer’ branch:
saulius@starta duplicate-groups-created-by-hand-are-not-detected/ $ diff -u new.bib new-duplicate-group.bib 
--- new.bib	2021-03-20 14:13:07.189182296 +0200
+++ new-duplicate-group.bib	2021-03-20 13:35:47.348262111 +0200
@@ -17,4 +17,5 @@
 1 StaticGroup:Human\;2\;1\;0x8a8a8aff\;\;\;;
 2 StaticGroup:Languages\;0\;1\;0x8a8a8aff\;\;\;;
 1 StaticGroup:Computer\;0\;1\;0x8a8a8aff\;\;\;;
+2 StaticGroup:Languages\;0\;1\;0x6a7a9aff\;\;\;;
 }

The resulting BibTeX files are attached in ‘biblio.zip’: biblio.zip

  1. Load the edited file into JabRef; no error report is produced, both groups are shown (correctly?), even though the GUI would not allow me to create such situation; the entry is reported to belong to both groups: Screenshot from 2021-03-20 13-37-01

The manually created duplicate is used only to reproduce a minimal example of this behaviour; in real life large number of duplicates emerged when porting previous group trees from the previous revisions of my database.

This behaviour actually seems to be very useful, since it allows me to create (once again!) groups with identical names in different places of the group hierarchy, and assignment to one such group adds the same entry to all such groups, which might be very useful (detailed discussion in https://discourse.jabref.org/t/hierarchical-groups-with-duplicated-names-are-actually-working-in-jabref-5-x/2619 ).

This is how it looks after some editing of the database: Screenshot from 2021-03-20 16-38-55

The resulting bibtex file is added in ‘experimets-with-group-hierarchy.zip’. experimets-with-group-hierarchy.zip

Log File
Paste an excerpt of your log file here

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 65 (50 by maintainers)

Most upvoted comments

I’m not sure why you would like to search for groups = .... In this case it’s probably easier to just select the group in the tree… Let’s not worry about this too much until users report they use this often for reason xyz. (Except if I’m missing an important use case)

Well, I use this all the time, since finding the group in the tree takes much longer if you have to go through thousands of groups. So, this is a real use case.

Thanks for the ongoing discussion. The twitter poll clearly showed that many users would like to have groups with the same name. However, it became also clear that it might be confusing if the entries are automatically shared. In the devcall we also discussed that there are also scenarios where the groups with the same name actually don’t show the same entries (e.g. if one uses the hierarchical modifiers), potentially leading to even more confusion.

For this reason I would strongly propose to go the id-based solution:

  • Each group gets an unique identifier
  • The identifier is stored in the groups field (and no longer the name of the group)
  • Users can create completely independent groups with the same name (even in the same hierarchy level if they desire, although a warning should be shown in this case)
  • In case, they want to have a mirror group they can use an automatic keyword group pointing to the groups field and searching for the identifier of the other group.

This unique identifier will also be very helpful for other scenarios. For example, we are planning a new feature in connection with JabRef Online, where users can share a group. For this a unique identifier is essential, since one needs to a way to identify the group even for different users.

@sauliusg your work here in this PR is very much appreciated. But I think the id-based system is the more universal way forward. Would you be interested in implementing it? We core developers help of course where we can!

I agree with you that it sometimes could be advantages to have the same group in two different places. However,

subgroup can also belong to several different super-groups (together with all papers assigned to it)

is a highly unusual concept from a user-interface perspective. For example, files with the same name but in different folders don’t automatically sync their content. I think we should really stick to the tree nature of the group interface, meaning that groups are independent.

As a way forward, I would propose the following:

  • Allow to have groups with the same name in different positions. I.e. properly fix https://github.com/JabRef/jabref/issues/1495. Warning: this is might be very complicated.
  • Once this is done, we can think about adding a “symlink” group that acts as a copy of a different group.

What do you think? Help on this of course very much appreciated.

So lets sum up the current situation:

  1. The current group implementation has useful features (nicely summarised in #629: independence of BibTeX entries, possibility to manually edit BibTeX file for grouping, among the others);
  2. Users want flexible hierarchy and want to be able to create groups with whatever names they want, without a tool standing in their way;

A possible (easy?) way to reconcile these two requirements could be:

  • JabRef supports (as it already does now!) groups with identical names anywhere in the hierarchy; no changes is needed here, just the promise that the future versions of JabRef will not “disallow” such use of JabRef 😃;
  • The GUI allows creating such groups if the user wants to (so that we do not have to save the database, close JabeRef, edit the file by hand, restart Jabref every time I want the group that has a name already in the hierarchy). For this and the following functionality, a change would be needed in JabRef but the change will be small and localised (affecting, presumably, only the group creation dialogue).
  • The GUI warns the user that the group already exists elsewhere in the hierarchy, and that of created this group will contain the same entries as all other groups with the same name (I estimate that 50% cases this is what is actually needed);
  • The GUI suggests a unique name based on the hierarchy, by appending previous levels of the tree, and possibly numbering, until the name becomes unique; a user can chose that name by a single click or can continue using the duplicated name (with merged entries.

Example:

Groups:
Bioinformatics
- Drug design
-- Diabetes
--- Treatment
Medicine
- Diabetes mellitus
-- Treatment
- Diabetes
-- Treatment
- Asthma

When creating a subgroup called Treatment in Asthma, the GUI could respond:

Warning:
A group named 'Treatment' already exists in 'Medicine/Diabetes/Treatment', 'Medicine/Diabetes mellitus/Treatment'. 
If you create the group 'Treatment' here, it will contain entries from all mentioned groups.
[x] Use a unique name 'Treatment (Asthma)'
[OK] <- the button is *always* active!

I deliberately suggest using ‘Treatment (Asthma)’ instead of ‘Asthma/Treatment’ to avoid impression that the group name is strictly bound to group’s position in the hierarchy. It is not; so moving group ‘Treatment (Asthma)’, say, one level up to ‘Medicine’ would not make its name “incorrect”, in the sense that the group still contains papers about asthma treatment no matter where in the hierarchy it resides. This also means that JabRef does not need to bother about renaming groups when their position in the hierarchy changes (as is implemented now).

The suggested change would:

  • mostly solve most peoples problems with the current group implementation;
  • make the suggested workaround for group uniqueness more usable;
  • Add additional capability to JabRef (at no or very little implementation cost!) that would allow the same group to be placed in different parts of the hierarchy; a use case demonstrated by my SQL example shows that this can be very useful.
  • Explicit warnings in group creation dialogue and complete information about the same group names wold remove (most of) the confusion.