gatsby: [gatsby-source-contentful] downloadLocal broken by gatsby-source-filesystem
Description
#20843 introduced a timeout for createRemoteFileNode
. I’m almost certain this breaks localFile for contentful projects with ~15 or greater assets.
I fixed transitive dependencies on gatsby-source-filesystem
to 2.1.47 (right before #20843) and the issue was fixed.
Steps to reproduce
Attempt using gatsby-source-contentful
with downloadLocal
enabled. If gatsby develop
takes > 30 seconds, createRemoteFileNode
will silently timeout. Build will complete, but most localFile
fields in graphiql will be null
.
Expected result
localFile
fields are populated.
Actual result
localFile
fields are null
Other Notes
I think all of the createRemoteFileNode
calls are actually completing, but the timeout has some nasty side effect.
I’d love to see this reverted as I have to resort to the very hacky npm-force-resolutions
Environment
System: OS: Linux 4.4 Ubuntu 18.04.4 LTS (Bionic Beaver) CPU: (8) x64 Intel® Core™ i7-8550U CPU @ 1.80GHz Shell: 4.4.20 - /bin/bash Binaries: Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node Yarn: 1.22.1 - /usr/bin/yarn npm: 6.13.4 - ~/.nvm/versions/node/v12.16.1/bin/npm Languages: Python: 2.7.17 - /usr/bin/python npmPackages: gatsby: ^2.17.4 => 2.20.20 gatsby-image: ^2.2.30 => 2.3.2 gatsby-plugin-brotli: ^1.3.1 => 1.3.1 gatsby-plugin-emotion: ^4.1.18 => 4.2.1 gatsby-plugin-manifest: ^2.2.41 => 2.3.3 gatsby-plugin-netlify: ^2.1.32 => 2.2.1 gatsby-plugin-postcss: ^2.1.16 => 2.2.1 gatsby-plugin-prefetch-google-fonts: 1.4.3 => 1.4.3 gatsby-plugin-react-helmet: ^3.1.13 => 3.2.2 gatsby-plugin-react-svg: ^3.0.0 => 3.0.0 gatsby-plugin-remove-fingerprints: 0.0.2 => 0.0.2 gatsby-plugin-resolve-src: ^2.0.0 => 2.0.0 gatsby-plugin-sharp: ^2.2.32 => 2.5.4 gatsby-source-contentful: ^2.1.73 => 2.2.7 gatsby-transformer-remote-filesystem: ^0.2.0 => 0.2.0 gatsby-transformer-sharp: ^2.3.0 => 2.4.4 npmGlobalPackages: gatsby-cli: 2.11.8
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 5
- Comments: 23 (6 by maintainers)
Got the same result here as well:
success Downloading remote files - 30.130s - 56/93 3.09/s
Missing randomlocalFile
data of some images. Download gets cut off right at the 30 seconds mark.I did some further digging:
It seems the timeout is created in the
requestRemoteNode
.https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L152-L157
All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.
The Timeout error is not handled in the
download-contentful-assets
and therefor fails silently.https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-contentful/src/download-contentful-assets.js#L88-L90
When actually logging the error you get the following trace:
Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.
As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.
https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L76
It seems
gatsby-source-filesystem
assumes you can download 200 files concurrently. But this might not work with the contentful API? I don’t know what the limit is here. This is however adjustable via an environment variable.Setting the following config seems to fix the timeout issue for me. :
gatsby-config.js
new output:
As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?
@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?
I’ve looked into this a bit and from what I can see there’s a few issues at play here:
The
gatsby-source-contentful
plugin swallows exceptions fromcreateRemoteFileNode
. These will most likely be networking errors if the downloads timeout or something else unexpected happens like a TCP connection being reset.gatsby-source-contentful
then assumes the file has been downloaded successfully when it hasn’t and errors crop up later in the build when null references are hit.The 30s timeout
got
is configured with. It’s possible this will be hit if you’re downloading a large asset and you don’t have the bandwidth to complete the download in 30s. The easiest way to reproduce this is throttling your network connection and running a build. On MacOS, I used the Network Link Conditioner. Note: The asset size limit in Contentful is 1GBThe default number of concurrent downloads in
create-remote-file-node
. The default is 200 and this seems to cause all sorts of problems for me running a local build in a large Contentful space. Since there’s more downloads happening concurrently, a timeout is more likely for any individual file plus I’m also seeing the occasional connection reset before a timeout happens. It’s likely this is less of an issue if you’ve got a high bandwidth connection to Contentful’s asset CDN (aka CloudFront) but I wonder if this is a sensible default from a reliability standpoint. Maybe this could be determined more intelligently, e.g. if network errors are encountered perform some kind of exponential backoff.I’m going to start working on a PR to fix point 1 immediately. I don’t think the Contentful source plugin should ever swallow errors. I would love to get someone’s thoughts on points 2 & 3. Happy to work on these as well.
I am very open for a PR, but it should contain:
This should be fixed with improved network error handling in the latest release v5.3
A backport to v4 (gatsby v2) should happen soon. (https://github.com/gatsbyjs/gatsby/projects/25)
oh well my bad sorry… I’ve passed it down at the wrong place. I need a rubber duck
@jayhostan it works fine for me on my local machine (adding
downloadLocal: true
&& it downloads file). Are you sure you pass the options correctly in gatsby-config.js? 🙈Running into this as well - have 217 assets in my Contentful (including some larger video files - all below 50MB), it claims to have completed the download and all of that. However, when I query, I’m getting this:
File exists on Contentful, file does not exist on my local server. There’s no notice or warning that stuff is failing, it’s just in the background not completing.
Getting this in the terminal:
success Downloading remote files - 30.672s - 174/217 7.07/s
Quick update: removed the plugin, installed it again, cleaned, etc, and got this:
success Downloading remote files - 30.591s - 164/217 7.09/s
- so looks like there’s definitely something with the timeout where it gets right over 30.5s and decides to fail.