biotite: Altloc ID handling does not work properly
Hi I just tried to figure out how small molecules are currently handled within the structure objects (AtomArrayStack).
If I extract for instance all residues using
residues: tuple=biotite.structure.get_residues(mystructure)
then I get a nice tuple containing water, ions and amino-acids, but I don’t see the small molecule in this list for instance on structure 5po6 (residue name 8SS).
How can I access these?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (7 by maintainers)
The filter does not affect insertion codes, so these should always be included. In contrast, altlocs mean that the same atom(s) may be present at two or more different positions. Hence, I think it is not meaningful to have the same atom twice (with different coordinates) in the same model. However, if there is a use case for it, we could add a parameter to
get_structure()
to omit filtering at all. Maybe, something likealtloc="all"
?The
ParmEd
package uses a string parameter to select altlocs: https://parmed.github.io/ParmEd/html/api/parmed/parmed.formats.pdb.html#parmed.formats.pdb.CIFFile.write.'all'
: return all alternate locations'first'
: return only the first alternate locations'occupancy'
: return the one with the largest occupancy. If two conformers have the same occupancy, the first one to occur is printed.alt_loc
would be another optional annotation array, that is added to theAtomArray
orAtomArrayStack
, if'all'
is chosen. Thus the user can select the altloc they want in the same way they would select atoms by any other criteria.In case of Biotite,
'first'
would be default, in consistency with the current behavior.Do you think this would be a sensible solution to this issue?
Altloc and insertion handling is really a cumberstone in PDB formats since they are hard to read and interpret in general. Also they are sometimes abused to show something they are not intended for, making it very hard to find a “jack of all trades” solution. I think @padix-key 's suggestion of having an
altloc="all"
flag that defaults toaltloc=None
would be a good way to allow users to get every atom and customize in corner cases.I inspected the PDB file: The filtering in
filter_altoc()
is actually the place where8SS
is removed. Biotite only allows one altoc ID per residue, to avoid duplicate residues/atoms. However, by default it uses residues with altloc IDA
. In this case8SS
has the altoc IDD
, which is filtered out.Usually, alternative altoc IDs can be selected via the
altloc
parameter, but the parameter currently raises an error:In summary, there are two issues to fix in Biotite:
A
NameError
infilter_altloc()