pulsar: [Bug] It's hard to find latest docs using search engines like Google

Search before asking

  • I searched in the issues and found nothing similar.

Version

Minimal reproduce step

Screen Shot 2022-10-25 at 11 39 54 AM Screen Shot 2022-10-25 at 11 40 17 AM

What did you expect to see?

Documentation for the lastest version of Pulsar.

What did you see instead?

The latest version of Pulsar is 2.10.x, but I see docs for 2.3.2. I constantly see the documentation for random old versions of Pulsar.

Anything else?

It should be possible to fix it by adding the following HTML meta tag to all pages for old Pulsar versions docs.

<head>
  <meta name="robots" content="noindex, nofollow" />
</head>

If you are using Docusaurus, it should be doable by conditionally adding required meta tag to its config file:

https://docusaurus.io/docs/seo

After a couple of weeks, Google should reindex these pages.

More info

Are you willing to submit a PR?

  • I’m willing to submit a PR!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (18 by maintainers)

Most upvoted comments

Here is a patch apache/pulsar-site#481

so nice, and very cool, then I think my PR could be closed.

@urfreespace perhaps you can eject something like DocHead by yarn swizzle.

I encountered the same issue, and before reporting it here, I found it was filed in this thread. image

Not sure if there is a possible way to configure the canonical URLs globally across the site? @urfreespace do you have any idea? I think it would be better than configuring the front matter of each markdown file.

This is an important issue to solve. In addition to discoverability, we often expose old docs. For example, when I search pulsar java admin client, the second result is for incubator docs.

Screen Shot 2022-10-28 at 10 25 31 AM

Note that the first link is for the java client, not the java admin client. If you dig into the java client page, it points to the admin client. However, I have heard from users that it was hard to find the admin client documentation.

@ishu-thakur I think you should somehow conditionally configure the Docusaurus config instead of adding the meta tag in every html file.

If you take it, it’s your task to find out how to do it in a better way.

Cool! I think you can open an issue against https://github.com/apache/pulsar-site under site2/website-next where we place the source code of the website.

cc @urfreespace @michaeljmarshall @Anonymitaet @momo-jun