biotite: Altloc ID handling does not work properly

Hi I just tried to figure out how small molecules are currently handled within the structure objects (AtomArrayStack).

If I extract for instance all residues using residues: tuple=biotite.structure.get_residues(mystructure)

then I get a nice tuple containing water, ions and amino-acids, but I don’t see the small molecule in this list for instance on structure 5po6 (residue name 8SS).

How can I access these?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

The filter does not affect insertion codes, so these should always be included. In contrast, altlocs mean that the same atom(s) may be present at two or more different positions. Hence, I think it is not meaningful to have the same atom twice (with different coordinates) in the same model. However, if there is a use case for it, we could add a parameter to get_structure() to omit filtering at all. Maybe, something like altloc="all"?

The ParmEd package uses a string parameter to select altlocs: https://parmed.github.io/ParmEd/html/api/parmed/parmed.formats.pdb.html#parmed.formats.pdb.CIFFile.write.

  • 'all': return all alternate locations
  • 'first' : return only the first alternate locations
  • 'occupancy' : return the one with the largest occupancy. If two conformers have the same occupancy, the first one to occur is printed.

alt_loc would be another optional annotation array, that is added to the AtomArray or AtomArrayStack, if 'all' is chosen. Thus the user can select the altloc they want in the same way they would select atoms by any other criteria.

In case of Biotite, 'first' would be default, in consistency with the current behavior.

Do you think this would be a sensible solution to this issue?

Altloc and insertion handling is really a cumberstone in PDB formats since they are hard to read and interpret in general. Also they are sometimes abused to show something they are not intended for, making it very hard to find a “jack of all trades” solution. I think @padix-key 's suggestion of having an altloc="all" flag that defaults to altloc=None would be a good way to allow users to get every atom and customize in corner cases.

I inspected the PDB file: The filtering in filter_altoc() is actually the place where 8SS is removed. Biotite only allows one altoc ID per residue, to avoid duplicate residues/atoms. However, by default it uses residues with altloc ID A. In this case 8SS has the altoc ID D, which is filtered out.

Usually, alternative altoc IDs can be selected via the altloc parameter, but the parameter currently raises an error:

cif_file = pdbx.PDBxFile.read(rcsb.fetch("5po6", "cif"))
structure = pdbx.get_structure(cif_file, model=1, altloc=[("A", 203, "D")])
NameError: name 'altloc' is not defined

In summary, there are two issues to fix in Biotite:

  • By default, use the first altloc ID, instead of altloc ID A
  • Fix NameError in filter_altloc()