metascraper: [metascraper-media-provider] YouTube randomly fails under heavy load
- I’m using the last version.
- My node version is the same as declared as
package.json
.
Youtube scrape returns null for most fields.
The package works, there are no errors but author, date, description and logo are all null. The title is always “Youtube”.
The only correctly returned is the image (see attached image)
NOTE: I have added the metascraper-youtube package.
Steps to reproduce
function meta(id, targetUrl, body, DataSource) {
const metascraper = require("metascraper")([
require("metascraper-author")(),
require("metascraper-date")(),
require("metascraper-description")(),
require("metascraper-image")(),
require("metascraper-logo")(),
require("metascraper-clearbit")(),
require("metascraper-publisher")(),
require("metascraper-title")(),
require("metascraper-url")(),
require("metascraper-youtube")()
]);
(async (request, response, next) => {
try {
const metadata = await metascraper({ html: body, url: targetUrl });
const imageURL = metadata.image ? sanitizeUrl(metadata.image) : "";
const linkMetaData: any = {
title: formatTitleField(metadata.title),
description: processSummaryField(metadata.description, 200),
image: imageURL,
url: formatInputFields(metadata.url),
publisher: formatInputFields(metadata.publisher)
};
DataSource.update(id, {
$set: { linkMetaData: linkMetaData }
});
if (imageURL) {
const dimensions = getImageDimensionsFromUrlRequest(imageURL);
if (dimensions) {
linkMetaData.imageWidth = dimensions.width;
linkMetaData.imageHeight = dimensions.height;
}
DataSource.update(id, {
$set: { linkMetaData: linkMetaData }
});
}
} catch (error) {
log.error(error);
}
})();
}
Expected behaviour
Title and Description fields should have data.
Actual behaviour
These fields are null.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (7 by maintainers)
I’m going to close this because there is nothing we can do there since YouTube is clearly banning consecutive requests under a production scenario.
I want to suggest two workarounds there: