metascraper: [metascraper-media-provider] YouTube randomly fails under heavy load

  • I’m using the last version.
  • My node version is the same as declared as package.json.

Youtube scrape returns null for most fields.

The package works, there are no errors but author, date, description and logo are all null. The title is always “Youtube”.

The only correctly returned is the image (see attached image) youtube_scrape

NOTE: I have added the metascraper-youtube package.

Steps to reproduce

function meta(id, targetUrl, body, DataSource) {
  const metascraper = require("metascraper")([
    require("metascraper-author")(),
    require("metascraper-date")(),
    require("metascraper-description")(),
    require("metascraper-image")(),
    require("metascraper-logo")(),
    require("metascraper-clearbit")(),
    require("metascraper-publisher")(),
    require("metascraper-title")(),
    require("metascraper-url")(),
    require("metascraper-youtube")()
  ]);

  (async (request, response, next) => {
    try {
      const metadata = await metascraper({ html: body, url: targetUrl });
      const imageURL = metadata.image ? sanitizeUrl(metadata.image) : "";

      const linkMetaData: any = {
        title: formatTitleField(metadata.title),
        description: processSummaryField(metadata.description, 200),
        image: imageURL,
        url: formatInputFields(metadata.url),
        publisher: formatInputFields(metadata.publisher)
      };

      DataSource.update(id, {
        $set: { linkMetaData: linkMetaData }
      });

      if (imageURL) {
        const dimensions = getImageDimensionsFromUrlRequest(imageURL);
        if (dimensions) {
          linkMetaData.imageWidth = dimensions.width;
          linkMetaData.imageHeight = dimensions.height;
        }
        DataSource.update(id, {
          $set: { linkMetaData: linkMetaData }
        });
      }
    } catch (error) {
      log.error(error);
    }
  })();
}

Expected behaviour

Title and Description fields should have data.

Actual behaviour

These fields are null.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I’m going to close this because there is nothing we can do there since YouTube is clearly banning consecutive requests under a production scenario.

I want to suggest two workarounds there:

  1. Use a proxy for doing consecutive requests for getting the HTML.
  2. Use metascraper-iframe, it’s a new package with oembed support (.cc @iamandyk)