scikit-learn: Fix broken links in the documentation
Below is the list of broken links in the documention from a make linkcheck run, together with the file the link appears in and the error message.
If you want to work on this, please:
- do one Pull Request per link
- add a comment in this issue saying which link you want to tackle so that different people can work on this issue in parallel
- mention this issue (
#23631) in your Pull Request description so that progress on this issue can more easily be tracked
Possible solutions for a broken link include:
- find a replacement for the broken link. In case of links to articles, being able to link to a resource where the article is openly accessible (rather than behind a paywall) would be nice.
- The link can be added to the
linkcheck_ignorevariable: https://github.com/scikit-learn/scikit-learn/blob/59473a91d4528503c63d71ad5843dac1b20a3d67/doc/conf.py#L590. This is the only thing to do for example when:- the link is broken with no replacement (for example in testimonials some companies were acquired and their website does not exist)
- the link works fine in a browser but is flagged as broken by
make linkchecktool. This may happen because some websites are trying to prevent bots to scrape the content of their website
Something that may be useful in the complicated cases is to search on the Internet Archive for the broken link. You may be able to look at the old content and it may help you to find an appropriate link replacement.
-
http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdfmodules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.rst403 Client Error: Forbidden for url: http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf -
http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdfmodules/decomposition.rst404 Client Error: Not Found for url: https://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf -
http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.pymodules/generated/sklearn.datasets.make_swiss_roll.rst403 Client Error: Forbidden for url: http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py - https://github.com/scikit-learn/scikit-learn/pull/23679
http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdfmodules/linear_model.rstHTTPConnectionPool(host='users.jyu.fi', port=80): Max retries exceeded with url: /~samiayr/pdf/ayramo_eurogen05.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02da35c340>, 'Connection to users.jyu.fi timed out. (connect timeout=10)')) - #23660
http://www.ats.ucla.edu/stat/r/dae/rreg.htmmodules/linear_model.rstHTTPConnectionPool(host='www.ats.ucla.edu', port=80): Max retries exceeded with url: /stat/r/dae/rreg.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f02dfd53a60>, 'Connection to www.ats.ucla.edu timed out. (connect timeout=10)')) -
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.htmldatasets/real_world.rst404 Client Error: Not Found for url: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html -
http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdfmodules/decomposition.rst404 Client Error: Not Found for url: http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf -
http://www.iucnredlist.org/apps/redlist/details/3038/0auto_examples/neighbors/plot_species_kde.rst404 Client Error: Not Found for url: https://www.iucnredlist.org/apps/redlist/details/3038/0 -
http://www.recognition.mccme.ru/pub/papers/SVM/sch99estimating.pdfmodules/outlier_detection.rstHTTPSConnectionPool(host='www.recognition.mccme.ru', port=443): Max retries exceeded with url: /pub/papers/SVM/sch99estimating.pdf (Caused by SSLError(SSLCertVerificationError("hostname 'www.recognition.mccme.ru' doesn't match 'kvant.ras.ru'"))) -
http://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdfmodules/generated/sklearn.metrics.hinge_loss.rst404 Client Error: Not Found for url: https://www.ttic.edu/sigml/symposium2011/papers/Moore+DeNero_Regularization.pdf -
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdfmodules/decomposition.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))) -
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdfmodules/kernel_approximation.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.227.1802&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))) -
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdfmodules/linear_model.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.392.8794&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))) -
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdfmodules/decomposition.rstHTTPSConnectionPool(host='citeseerx.ist.psu.edu', port=443): Max retries exceeded with url: /viewdoc/download?doi=10.1.1.68.5164&rep=rep1&type=pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))) -
https://dev.pandas.io/docs/development/maintaining.htmldevelopers/bug_triaging.rstHTTPSConnectionPool(host='dev.pandas.io', port=443): Max retries exceeded with url: /docs/development/maintaining.html (Caused by SSLError(SSLCertVerificationError("hostname 'dev.pandas.io' doesn't match either of '*.numericable.fr', 'numericable.fr'"))) -
https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.htmldevelopers/contributing.rst404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/dev/contributor/development_workflow.html - #23697
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.htmlmodules/grid_search.rst404 Client Error: Not Found for url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.reciprocal.html - #23739
https://doi.org/10.13140/RG.2.2.35280.02565modules/generated/sklearn.cluster.spectral_clustering.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/354448354?channel=doi&linkId=6138e932a3a397270a8f1300&showFulltext=true -
https://imageio.readthedocs.io/en/latest/userapi.htmldatasets/loading_other_datasets.rst404 Client Error: Not Found for url: https://imageio.readthedocs.io/en/latest/userapi.html -
https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_pythonpresentations.rstHTTPSConnectionPool(host='newcircle.com', port=443): Max retries exceeded with url: /s/post/1152/scikit-learn_machine_learning_in_python (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f02da1007c0>, 'Connection to newcircle.com timed out. (connect timeout=10)')) -
https://pythonhosted.org/joblib/memory.htmlmodules/compose.rst404 Client Error: Not Found for url: https://pythonhosted.org/joblib/memory.html -
https://staff.washington.edu/jakevdppresentations.rst404 Client Error: for url: https://staff.washington.edu/jakevdp -
https://trevorhastie.github.iomodules/generated/sklearn.metrics.d2_absolute_error_score.rst404 Client Error: Not Found for url: https://trevorhastie.github.io/ -
https://users.soe.ucsc.edu/~optas/papers/jl.pdfmodules/generated/sklearn.random_projection.SparseRandomProjection.rst404 Client Error: Not Found for url: https://users.soe.ucsc.edu/~optas/papers/jl.pdf -
https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdfmodules/generated/sklearn.decomposition.IncrementalPCA.rstHTTPSConnectionPool(host='mic.net.technion.ac.il', port=443): Max retries exceeded with url: //doc/skl-ip.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))) -
https://www.datascience-paris-saclay.fr/about.rstHTTPSConnectionPool(host='www.datascience-paris-saclay.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))) -
https://www.frs-fnrs.be/-fnrsabout.rst404 Client Error: Not Found for url: https://www.frs-fnrs.be/fr/-fnrs -
https://www.jstor.org/stable/2984099modules/generated/sklearn.impute.IterativeImputer.rst403 Client Error: Forbidden for url: https://www.jstor.org/stable/2984099 - This link is working in a browser, it should be addded to
linkcheck_ignoresimilarly to what was done in #23737https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdfmodules/svm.rstHTTPSConnectionPool(host='www.microsoft.com', port=443): Read timed out. (read timeout=10) -
https://www.numfocus.org/support-numfocus.htmlabout.rst403 Client Error: Forbidden for url: https://www.flipcause.com/secure/cause_pdetails/MjM2OA== -
https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysismodules/clustering.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/233096619_A_Dendrite_Method_for_Cluster_Analysis - This link is working in a browser, it should be addded to
linkcheck_ignoresimilarly to what was done in #23737https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_airmodules/generated/sklearn.datasets.load_boston.rst403 Client Error: Forbidden for url: https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air -
https://www.sri.com/sites/default/files/publications/ransac-publication.pdfmodules/generated/sklearn.linear_model.RANSACRegressor.rst404 Client Error: Not Found for url: https://www.sri.com/sites/default/files/publications/ransac-publication.pdf -
https://www.stat.washington.edu/research/reports/2000/tr371.pdfmodules/cross_decomposition.rstHTTPSConnectionPool(host='www.stat.washington.edu', port=443): Max retries exceeded with url: /research/reports/2000/tr371.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 44 (40 by maintainers)
Commits related to this issue
- fixed issue ##23631 donate link for numfocus.org — committed to bkhanal4351/scikit-learn by bkhanal4351 2 years ago
- provided new link for `SVD based initialization` link on line 959, regarding issue #23631 — committed to ShehanAT/scikit-learn by ShehanAT 2 years ago
- Updated URL for scikit-learn-Jake Vanderplas-tut **Reference Issues/PRs** Fixes [#23631](https://github.com/scikit-learn/scikit-learn/issues/23631) **What does this implement/fix? Explain your ch... — committed to varunjain3/scikit-learn by varunjain3 2 years ago
- doc-matrix-tensor-fac-alg-#23631 — committed to omtarful/scikit-learn by omtarful 2 years ago
Alright @lesteve
I will be working on:
This link is working in a browser, it should be addded to linkcheck_ignore similarly to what was done in https://github.com/scikit-learn/scikit-learn/pull/23737 https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf modules/svm.rst
HTTPSConnectionPool(host=‘www.microsoft.com’, port=443): Read timed out. (read timeout=10)
modules/generatedfiles are automatically generated files during the documentation build, you should look at the corresponding.pyfile:sklearn/cluster/spectral_clustering.py. The link is likely part of a docstring.This link is working fine in a browser actually (at least for me but please double-check), you should add it to linkcheck_ignore as in https://github.com/scikit-learn/scikit-learn/pull/23737