scanpy: sc.tl.umap numba error when used with init_pos="paga"

Hi Alex,

UMAP throws an error if I use scanpy.tl.ump with initial positions from sc.tl.paga. Based on the error (see below) I thought it was a problem of UMAP itself. However, the error is not thrown when called without initial positions from paga.

Here is the output / error:

sc.tl.umap(adata, init_pos='paga')

computing UMAP
    using 'X_pca' with n_pcs = 50

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-35-924452b37e5b> in <module>
----> 1 sc.tl.umap(adata, init_pos='paga')

/opt/conda/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy)
    137         neigh_params.get('metric', 'euclidean'),
    138         neigh_params.get('metric_kwds', {}),
--> 139         verbose=max(0, verbosity-3))
    140     adata.obsm['X_umap'] = X_umap  # annotate samples with UMAP coordinates
    141     logg.info('    finished', time=True, end=' ' if _settings_verbosity_greater_or_equal_than(3) else '\n')

/opt/conda/lib/python3.7/site-packages/umap/umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, verbose)
    984         initial_alpha,
    985         negative_sample_rate,
--> 986         verbose=verbose,
    987     )
    988 

/opt/conda/lib/python3.7/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws)
    348                 e.patch_message(msg)
    349 
--> 350             error_rewrite(e, 'typing')
    351         except errors.UnsupportedError as e:
    352             # Something unsupported is present in the user code, add help info

/opt/conda/lib/python3.7/site-packages/numba/dispatcher.py in error_rewrite(e, issue_type)
    315                 raise e
    316             else:
--> 317                 reraise(type(e), e, None)
    318 
    319         argtypes = []

/opt/conda/lib/python3.7/site-packages/numba/six.py in reraise(tp, value, tb)
    656             value = tp()
    657         if value.__traceback__ is not tb:
--> 658             raise value.with_traceback(tb)
    659         raise value
    660 

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of type(CPUDispatcher(<function rdist at 0x7fb3c827f840>)) with parameters (array(float64, 1d, C), array(float64, 1d, C))
Known signatures:
 * (array(float32, 1d, A), array(float32, 1d, A)) -> float32
 * parameterized
[1] During: resolving callee type: type(CPUDispatcher(<function rdist at 0x7fb3c827f840>))
[2] During: typing of call at /opt/conda/lib/python3.7/site-packages/umap/umap_.py (776)


File "../../../opt/conda/lib/python3.7/site-packages/umap/umap_.py", line 776:
def optimize_layout(
    <source elided>

                dist_squared = rdist(current, other)
                ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new

What I basically do from raw UMI counts:

  1. total counts normalization / logarithmization
  2. PCA, bbknn, louvain
  3. combat, HVG, PCA, UMAP (works well)
  4. Paga (with louvain from 2., works well)
  5. UMAP (with positions from 4., does not work)

Any idea? Any further info needed? Best, Jens

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Although it states so:

UMAP only uses the representation of a data matrix for determining the number of connected components of the graph for the init conditions [if these aren’t explicitly defined (they are if choosing init_pos='paga'): https://github.com/lmcinnes/umap/blob/948f60ff0caf7ccef0ab68626c7b99a11e66f1bb/umap/umap_.py#L958-L965

In fact, the only place where it enters is for the computation of the mean positions of the disconnected components: https://github.com/lmcinnes/umap/blob/948f60ff0caf7ccef0ab68626c7b99a11e66f1bb/umap/spectral.py#L50

Implementation-wise, it’s a bit unfortunate that the data matrix is carried through all these functions just for that reason… But it’s not a problem for the results.

The confusing logging is fixed via https://github.com/theislab/scanpy/commit/a5bd1ecd8ab04ec79369f60d3656f578a4cde40c

Issue #666 👹 🙈