amplify-js: Appsync GraphQL (backed by DynamoDB) fails to return all filtered data in a single call / AppSync GraphQL fails for large data
Before opening, please confirm:
- I have searched for duplicate or closed issues and discussions.
- I have read the guide for submitting bug reports.
- I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.
JavaScript Framework
React
Amplify APIs
GraphQL API
Amplify Categories
auth, storage, function, hosting
Environment information
# Put output below this line
# Put output below this line
System:
OS: macOS 12.4
CPU: (8) x64 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Memory: 37.61 MB / 16.00 GB
Shell: 5.8.1 - /bin/zsh
Binaries:
Node: 16.15.1 - /usr/local/bin/node
Yarn: 1.22.19 - /usr/local/bin/yarn
npm: 8.11.0 - /usr/local/bin/npm
Browsers:
Chrome: 103.0.5060.114
Safari: 15.5
npmPackages:
@asseinfo/react-kanban: 2.2.0 => 2.2.0
@aws-amplify/analytics: ^5.2.11 => 5.2.11
@aws-amplify/api: ^4.0.44 => 4.0.44
@aws-amplify/auth: ^4.5.8 => 4.5.8
@aws-amplify/cli: ^9.1.0 => 9.1.0
@aws-amplify/core: ^4.5.8 => 4.5.8
@aws-amplify/interactions: ^4.0.44 => 4.0.44
@aws-amplify/storage: ^4.4.27 => 4.4.27
@aws-amplify/ui: ^3.12.1 => 3.12.1 (2.0.5)
@aws-amplify/ui-react: ^3.0.4 => 3.0.4
@aws-amplify/ui-react-internal: undefined ()
@aws-amplify/ui-react-legacy: undefined ()
@aws-amplify/xr: ^3.0.44 => 3.0.44
@emotion/cache: 11.7.1 => 11.7.1 (11.9.3)
@emotion/react: 11.7.1 => 11.7.1
@emotion/styled: 11.6.0 => 11.6.0
@fullcalendar/daygrid: 5.10.0 => 5.10.0 (5.10.1)
@fullcalendar/interaction: 5.10.0 => 5.10.0
@fullcalendar/react: 5.10.0 => 5.10.0
@fullcalendar/timegrid: 5.10.0 => 5.10.0
@material-ui/core: ^4.12.4 => 4.12.4
@material-ui/icons: ^4.11.3 => 4.11.3
@material-ui/lab: ^4.0.0-alpha.61 => 4.0.0-alpha.61
@mui/icons-material: 5.4.1 => 5.4.1
@mui/material: 5.4.1 => 5.4.1
@mui/styled-engine: 5.4.1 => 5.4.1 (5.8.0)
@react-jvectormap/core: 1.0.1 => 1.0.1
@react-jvectormap/world: 1.0.0 => 1.0.0
@testing-library/jest-dom: 5.16.2 => 5.16.2
@testing-library/react: 12.1.2 => 12.1.2
@testing-library/user-event: 13.5.0 => 13.5.0
@types/chroma-js: 2.1.3 => 2.1.3
@types/dropzone: 5.7.4 => 5.7.4
@types/jest: 27.4.0 => 27.4.0 (28.1.4)
@types/node: 16.11.21 => 16.11.21 (18.0.0)
@types/react: 17.0.38 => 17.0.38 (17.0.14)
@types/react-dom: 17.0.11 => 17.0.11
@types/react-flatpickr: 3.8.5 => 3.8.5
@types/react-table: 7.7.9 => 7.7.9
@types/uuid: 8.3.4 => 8.3.4
aws-amplify: ^4.3.26 => 4.3.26
chart.js: 3.4.1 => 3.4.1
chart.js-auto: undefined ()
chart.js-helpers: undefined ()
chroma-js: 2.4.2 => 2.4.2
dropzone: 5.9.2 => 5.9.2
flatpickr: 4.6.9 => 4.6.9 (4.6.13)
formik: 2.2.9 => 2.2.9
html-react-parser: 1.4.8 => 1.4.8
prettier: 2.5.1 => 2.5.1
query-string: ^7.1.1 => 7.1.1
react: 17.0.2 => 17.0.2
react-chartjs-2: 3.0.4 => 3.0.4
react-dom: 17.0.2 => 17.0.2
react-flatpickr: 3.10.7 => 3.10.7
react-github-btn: 1.2.1 => 1.2.1
react-images-viewer: 1.7.1 => 1.7.1
react-quill: 1.3.5 => 1.3.5
react-router-dom: 6.2.1 => 6.2.1
react-scripts: 5.0.0 => 5.0.0
react-spinners: ^0.13.3 => 0.13.3
react-table: 7.7.0 => 7.7.0
stylis: 4.0.13 => 4.0.13
stylis-plugin-rtl: 2.0.2 => 2.0.2
typescript: ^4.7.4 => 4.7.4
uuid: 8.3.2 => 8.3.2 (3.4.0, 3.3.2)
web-vitals: 2.1.4 => 2.1.4
yup: 0.32.11 => 0.32.11
npmGlobalPackages:
@aws-amplify/cli: 8.5.1
corepack: 0.10.0
npm: 8.11.0
typescript: 4.7.4
yarn: 1.22.19
Describe the bug
Appsync, backed by DyanamoDB, does not return all the filtered data in a single call.
-
AppSync API’s have filter conditions to search for specific data sets, however the way AppSync GraphQL filters are applied is to the first 1000 records and any matching records in that 1000 records are returned. This way, if there was a 1million (1,000,000) records in a table 1,000,000/1000 = 1000 call in worst case, pagination after pagination with nextToken !!!- NOT PERFORMANT AT ALL, NO one will use this kind of API and the App will fail badly in performance.
-
Solution that AWS support team offered is “Use @Searchable” on the entity, this will configure OpenSearch Service which is powered by Elastic Search.
- IT IS VERY EXPENSIVE - depending on the data size of your app’s table you will need different size ec2 (?? not sure) instances - it costed 12$ just for 4 days in a test environment with not a lot of data at all!!!
- Even if we were to ignore the cost (NOT possible at bill, this will burn the pocket!!!) - there is a limit - the search size limit with @Searchable is 10,000 now for the same data of 1 Million records in the table -> 1,000,000 / 10,000 = 100 (yes one hundred!!!) calls, pagination after pagination with nextToken - Not performant at all again, no one can use such poor searches that too with a very high financial cost along with an impossible performance.
@AWS team please advise on how to go about in cases like these where there are 1Million+ records in different tables - this seems like a very real world scenario. In traditional RDBMS, good indexes will get this job done without hiccups and the search/where condition is applied to the entire data set rather than a limit/sub set size of data like in AppSync. Please recommend the best solution.
Expected behavior
Should be able to get all filter/search condition matched data in 1 or couple of calls to keep the performance of the application using Appsync optimal.
Reproduction steps
NA
Code Snippet
// Put your code below this line.
Log output
// Put your logs below this line
aws-exports.js
No response
Manual configuration
No response
Additional configuration
No response
Mobile Device
No response
Mobile Operating System
No response
Mobile Browser
No response
Mobile Browser Version
No response
Additional information and screenshots
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (7 by maintainers)
Wow @iartemiev - that answer is very comprehensive - one of the best answers ever in github imo.
Very clear @iartemiev - we will try this on our app and post an update.
If there is anyway to promote this answer or add this use case to your amplify appsync docs - many many more folks will find it very useful.
Sincere gratitude for such detailed answer! Damn!
I would prefer that we communicate via GitHub. That way, other customers who have similar questions can benefit from the discussion.
Alternatively, feel free to post your questions in the
#graphql-helpchannel on the AWS Amplify Discord Server. It’s actively monitored by experienced community members as well as AWS engineers.@BBopanna for large data sets we recommend you utilize indexes based on your data access patterns to avoid having DynamoDB perform scan operations against your entire table. Using indexes will allow AppSync to function much more efficiently.
Please see these guides https://docs.amplify.aws/cli/graphql/examples-and-solutions/ https://docs.amplify.aws/cli/graphql/data-modeling/#configure-a-primary-key