aws-sdk-js-v3: S3.GetObject no longer returns the result as a string
Describe the bug
I’m using the GetObjectCommand
with an S3Client
to pull a file down from S3. In v2 of the SDK I can write response.Body.toString('utf-8')
to turn the response into a string. In v3 of the SDK response.Body
is a complex object that does not seem to expose the result of reading from the socket.
It’s not clear if the SDK’s current behaviour is intentional, but the change in behaviour since v2 is significant and undocumented.
SDK version number 3.1.0
Is the issue in the browser/Node.js/ReactNative? Node.js
Details of the browser/Node.js/ReactNative version
v12.18.0
To Reproduce (observed behavior)
import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
export async function getFile() {
const client = new S3Client({ region: 'eu-west-1' });
const cmd = new GetObjectCommand({
Bucket: 'my-bucket',
Key: '/readme.txt',
});
const data = await client.send(cmd);
console.log(data.Body.toString('utf-8'));
}
Expected behavior It should print the text of the file.
Additional context
data.Body
is a complex object with circular references. Object.keys(data.Body)
returns the following:
[
"_readableState",
"readable",
"_events",
"_eventsCount",
"_maxListeners",
"socket",
"connection",
"httpVersionMajor",
"httpVersionMinor",
"httpVersion",
"complete",
"headers",
"rawHeaders",
"trailers",
"rawTrailers",
"aborted",
"upgrade",
"url",
"method",
"statusCode",
"statusMessage",
"client",
"_consuming",
"_dumped",
"req"
]
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 93
- Comments: 109 (10 by maintainers)
Links to this issue
Commits related to this issue
- fix(api): correct return type See https://github.com/aws/aws-sdk-js-v3/issues/1877. — committed to dargmuesli/creal by dargmuesli 2 years ago
- Handle S3 stream response See https://github.com/aws/aws-sdk-js-v3/issues/1877 — committed to pureskillgg/awsjs by razor-x 2 years ago
This happens as
data.Body
is now of typeReadable | ReadableStream | Blob
https://github.com/aws/aws-sdk-js-v3/blob/25cb359e69966c549eb505956c2aeee809819245/clients/client-s3/models/models_0.ts#L6560For your specific example, you can write a streamToString function to convert ReadableStream to a string.
@igilham Does this resolve your query?
Thanks, @trivikr. This works in my application but raises a few concerns about the library that are worth sharing:
response.Body
is. It tells me that it’sany
. Perhaps the library configuration could be improved to export the correct type information.ReadableStream
andBlob
so it appears to be impossible to make this code type-safe.For reference, I’ve rewritten the
streamToString
with the missing types added back in to comply with my team’s linter settings.A bit late to the party, but I just cannot hold it:
Guys, when you did design this API - did you really tried it yourself? I understand why it was improved in a way it was, but this improvement shouldn’t be done at the cost of practicality. Like for real, do you think it is ok to write this every time I simply need to read an object in memory:
Should I now just memorize it? Keep it in my personal list of handy AWS snippets? Add an entire 3rd party dependency that does it in one line?
I bet this is how the API design session goes: – Hey Dave, we’re doing a new v3 API and it’s really not a simple thing to read a file, looks like 99.999999% of our users will suffer from that. Whaddaya think, should we improve it? – Nah.
Like seriously, the most complex and obscure thing in the API of the file storage… is the file read itself. Come on, guys.
I also wasted lots of time on
GetObject
and the trifecta of its types. Also, the fact thatReadableStream | Blob
is only Browser, andReadable
only Node made it extremely annoying 😃The
streamToString
solution posted above works for Node. For the browser, I found that using theResponse
object fromfetch
seems a shorter solution:This will return a
Response
object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.Full example:
It’s quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like
if (resp.Body instanceof Readable)
, or declare special interfaces to avoid differences between browser/Node.Documentation for getObject operation lists that
GetObjectOutput.Body
isReadable | ReadableStream | Blob
API Reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3.html#getobjectScreenshot
I’m using Visual Studio Code, and it shows type of
response.Body
asinternal.Readable | ReadableStream<any> | Blob
on hover.Please create a new issue with details of your IDE and code if problem persists.
Screenshot
I’m also very confused about how to read S3 Body responses with SDK v3. The SDK documentation for GetObjectCommand does not describe how to do it, and the SDK examples are also missing it (https://github.com/awsdocs/aws-doc-sdk-examples/issues/1677).
I would ask the AWS SDK team to include in the SDK a simple way to read S3 Body responses. We don’t want to re-implement complicated event handlers and helper functions for this simple purpose every time we use GetObject in a project.
In v2 we could just say something like
JSON.parse(response.Body?.toString())
. Please make it as simple in v3. Stream-based processing is also important, but it should be only an alternative for the simple case for parsing small JSON objects.For reference, I was able to do this in Node.js by utilizing node-fetch. I would like something like this be included in AWS SDK.
Starting from node 16.7, you can simply use the utility consumer functions :
Edit: Added
await
toconsumers.text
result. Thanks @AHaydar.Reopening as lot of customers have raised questions. Tagging @AllanZhengYP for comment.
I’ve been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:
Whereas in the previous JS SDK, it was simply:
I don’t understand this new direction with the SDK. I can’t say I’m a fan. Maybe @trivikr can weigh in.
As this code is run on Node.js, you can pass
Body as Readable
as follows:@all following this issue:
IMPORTANT UPDATE
I have received news that Bezos officially defunded the AWS core services. Do not fret! He’s reallocated these funds to lawyers fees. These fees will be used to prevent SpaceX from progressing. Please, rest assured that the lack of basic, simple, progress on these core AWS libraries is clearly being put to good use!
Hope this update is found useful and promising to all engineers that rely on AWS S3.
A one-line alternative is to use get-stream package, as posted here: https://github.com/aws/aws-sdk-js-v3/issues/1096#issuecomment-616743375
I understand the reason for returning a ReadableStream, but a built-in helper method would be nice. Reading the whole body into string in memory is probably good for 99% of cases.
If some helper method would be a part of the SDK we could just call it as
readStream(response.Body)
and everyone would be happy not having to add another dependency or 10 lines of boilerplate code to every new project.I can’t believe a commercial SDK like AWS will provide such useless interfaces.
Seriously? Are we in year 2022 or 2000?
Please showcase how your AWS dev teams use such interface without writing quite a few lines of helper functions. Why would you assume SDK customers want to use such interface?? You of course can provide low-level interface, but the design of good interface 101 already tells us: “make it simple for your users”. Interface design is not about you - the author; it’s about customers. Go back and check the SDKs of other platforms such as .NET, Java – see how they work with customers. No one really wants to fiddle around with
internal.Readable | ReadableStream | Blob
, no matter how accurate it is in Computer Science. You may think it’s a technically precise and perfect response type – it doesn’t matter, because it’s useless to customers.How hard is it to implement something more useful like below??
P.S. In case anyone simply wants to download text object:
You can also use async iterators to consume readable streams if you’re at nodejs 11.14.0 or above: https://nodejs.org/api/stream.html#readablesymbolasynciterator
Guess it’s back to AWS SDK v2.
Can we please get an official comment from someone at AWS about this? Getting an object from S3 should NOT be this difficult…
It’s ridiculous that people have to resort to this GitHub thread to parse their data. Why does a company like Amazon have such shitty and obscure documentation?
For any lost souls trying to parse simple JSON data using GetObjectCommand and don’t want to mess with filestreams, readers, or buffers, https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-793028742 worked like a charm on my NodeJS/Express server.
I’m sure there’s a lot of really good suggestions in this thread too, hope that there’s a solution for this sometime soon.
It boogles my mind that I had to read this open issue to find what I was looking for: How to read objects in v3 Really, this is what everyone would expect to find in the information hierarchy:
There’s no mention to the word “object” in this entire list. This should not be just a section many pages below a headline.
This is throwing an error in a NodeJS app, because TS config does not load DOM libs.
This results in the
Body
being set toany
.the use of Response looks as the neatest solution right now, for json and text payloads.
Reading the answers above, in my case, to resize an image to a specific size and do other little things with sharp, I just needed to remove the
.toString
method from thestreamToString
function.And everything works great:
For anyone that got stuck trying to download the s3 item into a local repository (I’m doing so in lambda), you can reference this code, it utilizes nodejs Readable Streams.
Be sure to replace
tmpFileSuffix
with the correct file suffix.This is confusing as hell. Even if you’re not going to fix it right away, just put a massive note in the documentation that it does NOT return a
Blob
if you are using nodejs regardless of what the typescript declaration saysThat would at least allow people to figure out what’s wrong without having to spend an hour reading through the whole of this thread
Turns out the
body
(ReadableStream) property of fetch responses was implemented slightly later than the other methods (like blob(), text(), etc). So looks like AWS is trying to support a few older browsers by falling back onBlob()
when body/ReadableStream isn’t available.Relevant code: https://github.com/aws/aws-sdk-js-v3/blob/608e606c20b3bb1614518de6de313a184db6129f/packages/fetch-http-handler/src/fetch-http-handler.ts#L75-L94
So if you want to support those older browsers you need to be prepared to handle both a
Blob
and aReadableStream
in browser.This kind of thing should really be in the docs…
The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.
Here is my implementation of handling the ReadableStream:
Okay so after spending few hours I got it right. This way we can pipe our s3 response body into sharp and later use the
.toBuffer()
to push it to bucket.But AWS team please please you need update your docs, I know there is a lot to update but as developer its just so much struggle to use AWS services because of insufficient docs.
Bumping because of how much time I wasted trying to fix this issue…
AWS provides a migration guide that is clearly not thorough enough to simplify the process: https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/migrating-to-v3.html
If there’s going to be a migration guide at all, details like this need to be covered.
This is now documented in the root readme with an example: https://github.com/kuhe/aws-sdk-js-v3/tree/main#streams
You do not need to import
sdkStreamMixin
explicitly. As of that version It is applied to stream objects in command outputs.If it helps someone who doesn’t want to use a library for converting streams to buffers, here’s the custom function I’m using:
Moreover, it seems to be a little bit faster than the lib.
Then you can use it like that:
I couldn’t get the stream.on solution to work under React. I kept getting the error ‘stream.on is not a function’. It turns out in this environment AWS returns a ReadableStream not a Readable. I ended up have to write my own converter, to handle ReadableStream and work under Typescript with eslint. No extra packages needed to be installed and used.
I thought I’d written my last do while loop a decade ago, but it turned out I still needed to use it here. I couldn’t get an iterator solution to work, and I refused to do the tail recursion I’d seen in the getReader() examples, which wouldn’t compile under Typescript anyway.
And I call it like this:
AWS team: it is just about November 2021, and, still, compared to the SDK v2, reading in a simple JSON object from S3 is way too convoluted. If this your idea of a private joke, it’s not funny. Fix it already.
Took two solid years, but hey, we have an official solution…
Hard to understand why getting a string from a simple storage service is unnecessary complicated.
@porteron This might help too. This is what I use in my code:
If you find a method that works that’s shorter than mine, let me know!
Here is an example of how to download an object from S3 and write that as a file to disk, while keeping it as a stream. This example is typescript targeting node.
It seems silly to me if we’re going to all this trouble of having a stream coming from AWS that we then convert that to a buffer or string to write to disk.
I also agree with the sentiments expressed by others in this thread. It is crazy that getObject has become such a complicated operation in the V3 SDK compared with the V2 SDK and is going to trip many people up for years to come.
Why should everyone have to copy and paste the same boilerplate everywhere? Have you guys not heard of encapsulation? If you are making a new version that you want everyone to use, then it must be better, not worse. I’ve never seen something so obtuse.
The fact people are using a Typescript type assertion in many of these examples stems from the fact AWS has typed this thing poorly. Why is
body
even a union type? That is basically AWS telling us “we don’t know the shape of the response we’re going to give you, only that it will be one of these three things, but you need to write code to check which of these we gave you”. I think someone at AWS should look into making this an intersection type instead of a union type?Why should I have to check the return type? Shouldn’t the library know what the return type is already?
It seems it is the case that the returned object can be used as any of the three interfaces (intersection type), and it does not seem to be the case that it can only be used as one of the interfaces (as the types would suggest with union type)
If a union type is in fact correct, it begs the question under what scenario I should expect the library to not return this interface to me. This is not documented. Why not just add an example to your docs that writes a “hello world” string and reads it back and outputs it, at the very least, using Typescript so you dogfood your own types (if you did this you’d realize your current types are “not optimal”)
Because AWS has this as a union type, people’s editors will suggest
.toString()
only (assuming the user has not discriminated the union). However this just prints “[object]” which is pretty poor DX especially consideringtoString()
used to work for v2 of the API from the sounds of it. Perhaps consider structuring your types so that the “trigger suggest” command in VSCode guides the user into a working “hello world” command without all of this fuss.Typescript version for node (not for browser).
For browser usage, the reasons behind this making-thing-complicated decision in SDK v3, also the explanation of the typecasting, I explain it in my blog post.
@dzmitry-kankalovich definitely summed it up correctly, I believe. A high level manager that doesn’t actually care about usability is in charge of the documentation specs as well as things like reading an object to memory (pretty sure it’s all about total lines of docs, not whether or not they matter?). No doubt this SDK is really powerful, and well written (props to the devs)… but reading an object to memory clearly has slipped through the cracks, as well as the entirety of dev UX concerning docs. Been using AWS for years, but this issue (existence, duration, lack of quick response time from AWS team) is pretty jaw dropping - to say the least.
There are 3 clear implementations/packages stated from the community. Adopt one. Ensure type safety. Patch it. Make hundreds, if not thousands, of other dev’s lives easier… that or have us devs asking our companies to never have to use AWS S3 with node again, I guess.
Also, as far as documentation goes… take a look at the Google/Angular team and take some notes @aws
Found an easy solution using
transformToString
if wanting to parse a JSON file in S3.So it looks like going by the latest PR: https://github.com/aws/aws-sdk-js-v3/pull/3977/files
Edit: See below post for official answer
The recommended way to do this is now:
Of course we can all write our own wrappers and functions, but then we are all repeating the same work and re-inventing the same wheel everywhere and we will all have to change it again when there is another API change. That’s why in my mind it makes sense to centralise very common tasks in the library itself. We’re not talking about something esoteric, here.
Imagine a 5-liner copied and pasted everywhere (such things tend to be posted on discussions such as this) and then someone realises there is a bug in it. It’s easier to maintain things in one place.
Even if a bunch of different people decide to publish their solutions (i.e. as libraries), you end up with
n
solutions and potentially many bugs, so there is no shared benefit across the board when one is fixed, as there would be if we all subscribed to one central solution.Also,
PutObject
lets you send a string, so the API is not symmetrical.I agree about the build sizes, though, it’s why I decided to use v3.
Throwing into the pot here, I lost 4 hours of time trying to track this down.
@igilham has it right in his earlier message.
It’s still not noted in the docs that the latter two shapes are from DOM and only apply in browsers. I agree it’s frustrating that helper methods for common scenarios aren’t included with the client.
We’ve run into similar issues so often we’ve taken to wrapping all SDK clients in our own classes to extend their operations to handle situations like this. S3 shouldn’t be one of the services we should have to do this for. With such a large library of offerings, I know its hard to keep up with everything, but S3 is hands down one of the most used resources Amazon has available, so it should be receiving the most attention when it comes to DX. Bad experiences on the most common use cases definitely sour the impression of your products, and lowers the likelihood of developer evangelization.
Here’s to hoping 2021 closes out with a packaged implementation for this scenario.
i think now can use for s3.GetObject to return the result as a string https://newbedev.com/how-to-get-response-from-s3-getobject-in-node-js
@berenddeboer you are right, it was added to the documentation, but the documentation itself matches perfectly the poor attention to details this whole issue is about. It is poorly idented (not a problem itself) and not behaving as one would expect, as there’s a return statement that might work for the guy responsible for the unit tests (and apparently also for the documentation), but not for the other poor lost guy who had to face the complex use case of reading a text file and after sometime may overlook a
return data; // For unit tests.
so, @dzmitry-kankalovich feel free to add to the script – Ok, so whaddaya think, should we improve its documentation? – Nah…
maybe it’s just our fault to expect a usable SDK for the biggest object storage product from the bigger cloud provider
Thanks. I think that covers my remaining frustrations. I appreciate that it can take time for documentation elsewhere to catch up when a major version is released.
A shorter version is:
stream didn’t work for me. This is how I ended up getting a JSON file from the s3 bucket
I just noticed that a lot of the comments here are also covered in the v3 documentation, see section “Getting a file from an Amazon S3 bucket”.
@HerrSchwarz I imagine it was for flexibility, so the dev can decide if they want to stream data into another process or just grab a string directly. This saves having to come up with multiple API functions (getObject, getStream).
But I agree, it is a bit of a pain. There might’ve been a simpler way to accomplish this and have better DX (developer experience).
I’m also seeing a typescript compilation error when using this via “aws-sdk”: “^2.1175.0” on node 14.x Isn’t
ReadableStream
a node 16+ only interface?Also ended up here after some time wasted trying to figure out the API. A year later, and there is still no proper documentation in docs.aws.amazon.com.
Will be reverting to v2. Despite the monolith design the API is simpler, smaller, and it only pulls in 10 dependencies.
@sinsys where do you get
data.Payload
from? I do not see that in the GetObject model.I only see
data.Body
.I’m generally a bit concerned about this SDK being maintained over time. There are very serious issues like this one that have been open since May with no resolution.
Oh yes, I use Blobs a lot in browser. I’m just wondering in what circumstances the AWS SDK will return a Blob rather than a ReadableStream since it has typings for both. I’m guessing it might be if the browser doesn’t support ReadableStreams…
Does anyone know in what circumstances body will be a
Blob
or aReadableStream
(for browser env)?My code is currently returning ReadableStreams, but it makes me wonder what APIs or configs result in blobs being returned.
@dzmitry-kankalovich Amazon is well-known to not have a solid focus on DX. Often times they build tools that work, but are not great experiences to work with.
IMO, every internal team at Amazon should be looking at Amplify as a model of how a library or SDK should be designed and documented.
This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!
With Node 18 introducing Web Streams API will this affect s3 download streams in any way? To my knowledge if you were working in node you could assume the Body was always gonna be a Readable. Will it now also support ReadableStream?
Poking around in the SDK a bit, it looks like there are some stream consumers already available for browser and node environments, respectively:
These methods appear to already be used in some of the protocol implementations, like https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-s3/src/protocols/Aws_restXml.ts#L12024
That being said, @AllanZhengYP 's change will be a nice usability improvement when it makes it in. Keep up the good work Allan and the SDK team! Can you comment on if these above interfaces are stable enough to depend on?
This answer makes a lot of sense but for some reason the import is failing for me here. The line:
fails to the error:
Module not found: Error: Can't resolve 'stream/consumers'
This feels like such a dumb error, but I can’t find a way around it. I’ve tried all of the following to no avail:
npm install stream
This works, but my error does not go away
npm install "stream/consumers"
This fails since the module does not exist
npm install stream-consumers
This works, but it’s a different package so my error still doesn’t go away.
The documentation on the node.js website: is not helpful here since they import it in a totally different way:
Could someone please tell me the correct way to install and import the ‘stream/consumers’ package. Thank you so much for the help!
That’s awesome. Thanks for sharing. Please note that
consumers.text
returns a promise. So we’d need to await it as well 🙌I agree. I love AWS but finding this thread took a bit of research…
I heard you want to save result of getObject to disk? Who would want to do that anyway? Sounds crazy. Anyway, here is how I managed to do it.
Oh hmm i see it was kind of mentioned here https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-886308913
Has anyone gotten this to work with binary data a react native environment? I get a Blob from the SDK, but I’m having a doozy of a time to get that blob saved on disk.
The response method above https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-776187712 works great for text, but doesn’t seem to do the trick for binary data.
RN Blobs don’t implement stream https://github.com/facebook/react-native/blob/main/Libraries/Blob/Blob.js and the rn-fetch-blob library the open source project I’m using expects a stream. (I’m new to modern JS and just wanna help fix this bug 😞)
I’ve tried my hand at a few different methods of taking the blob to a stream but all seem to not work on React Native…
That is awesome @berenddeboer, it has been added recently to the documentation. For me, the biggest problem is that I can’t be sure my own ad-hoc stream implementations are 100% correct and error-resilient in every situation. When AWS provides the implementation, I will trust it.
Thanks, but that’s not the issue I’m facing. I correctly fixed my s3 output Body stream retrieval using
get-stream
to convert it to a Buffer. The problem I have now is that I have the typescript compiler set to check definition files for libraries. This create the errors I cited above, because, rightly so, it doesn’t findReadableStream
andBlob
types, among others (seeFile
from dynamodb native attribute types). I don’t know how to manage a scenario like this one where a library supports multiple environments but I only need type definitions for nodejs env. Obviously, this was not an issue with aws sdk v2.@kristiannotari rather than adding “DOM” to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.
You can use
assembleStream
function from @ffxsam comment or get-stream package I posted.If you’re ok with using a (very popular) 3rd party library, this is shorter - https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-799697205
@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses
data.Payload
to return a response.@trivikr Thanks for that link to the docs! I didn’t even know they existed till just now.
I’ve created documentation update request at https://github.com/aws/aws-sdk-js-v3/issues/1878