aws-sdk-js-v3: S3.GetObject no longer returns the result as a string

Describe the bug I’m using the GetObjectCommand with an S3Client to pull a file down from S3. In v2 of the SDK I can write response.Body.toString('utf-8') to turn the response into a string. In v3 of the SDK response.Body is a complex object that does not seem to expose the result of reading from the socket.

It’s not clear if the SDK’s current behaviour is intentional, but the change in behaviour since v2 is significant and undocumented.

SDK version number 3.1.0

Is the issue in the browser/Node.js/ReactNative? Node.js

Details of the browser/Node.js/ReactNative version v12.18.0

To Reproduce (observed behavior)

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';

export async function getFile() {
  const client = new S3Client({ region: 'eu-west-1' });
  const cmd = new GetObjectCommand({
    Bucket: 'my-bucket',
    Key: '/readme.txt',
  });
  const data = await client.send(cmd);

  console.log(data.Body.toString('utf-8'));
}

Expected behavior It should print the text of the file.

Additional context

data.Body is a complex object with circular references. Object.keys(data.Body) returns the following:

[
  "_readableState",
  "readable",
  "_events",
  "_eventsCount",
  "_maxListeners",
  "socket",
  "connection",
  "httpVersionMajor",
  "httpVersionMinor",
  "httpVersion",
  "complete",
  "headers",
  "rawHeaders",
  "trailers",
  "rawTrailers",
  "aborted",
  "upgrade",
  "url",
  "method",
  "statusCode",
  "statusMessage",
  "client",
  "_consuming",
  "_dumped",
  "req"
]

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 93
  • Comments: 109 (10 by maintainers)

Commits related to this issue

Most upvoted comments

This happens as data.Body is now of type Readable | ReadableStream | Blob https://github.com/aws/aws-sdk-js-v3/blob/25cb359e69966c549eb505956c2aeee809819245/clients/client-s3/models/models_0.ts#L6560

For your specific example, you can write a streamToString function to convert ReadableStream to a string.

const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");

(async () => {
  const region = "us-west-2";
  const client = new S3Client({ region });

  const streamToString = (stream) =>
    new Promise((resolve, reject) => {
      const chunks = [];
      stream.on("data", (chunk) => chunks.push(chunk));
      stream.on("error", reject);
      stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
    });

  const command = new GetObjectCommand({
    Bucket: "test-aws-sdk-js-1877",
    Key: "readme.txt",
  });

  const { Body } = await client.send(command);
  const bodyContents = await streamToString(Body);
  console.log(bodyContents);
})();

@igilham Does this resolve your query?

Thanks, @trivikr. This works in my application but raises a few concerns about the library that are worth sharing:

  • There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.
  • My IDE can’t tell me what the type of response.Body is. It tells me that it’s any. Perhaps the library configuration could be improved to export the correct type information.
  • It’s nice to have options for data processing, but I shouldn’t be forced to write boilerplate I/O code for the most common use case.
  • As noted below, I can’t find an export of ReadableStream and Blob so it appears to be impossible to make this code type-safe.

For reference, I’ve rewritten the streamToString with the missing types added back in to comply with my team’s linter settings.

import { Readable } from 'stream';

// Apparently the stream parameter should be of type Readable|ReadableStream|Blob
// The latter 2 don't seem to exist anywhere.
async function streamToString (stream: Readable): Promise<string> {
  return await new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', (chunk) => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
  });
}

A bit late to the party, but I just cannot hold it:

Guys, when you did design this API - did you really tried it yourself? I understand why it was improved in a way it was, but this improvement shouldn’t be done at the cost of practicality. Like for real, do you think it is ok to write this every time I simply need to read an object in memory:

const streamToString = (stream) =>
      new Promise((resolve, reject) => {
        const chunks = [];
        stream.on("data", (chunk) => chunks.push(chunk));
        stream.on("error", reject);
        stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
      });

Should I now just memorize it? Keep it in my personal list of handy AWS snippets? Add an entire 3rd party dependency that does it in one line?

I bet this is how the API design session goes: – Hey Dave, we’re doing a new v3 API and it’s really not a simple thing to read a file, looks like 99.999999% of our users will suffer from that. Whaddaya think, should we improve it? – Nah.

Like seriously, the most complex and obscure thing in the API of the file storage… is the file read itself. Come on, guys.

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.

Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

I also wasted lots of time on GetObject and the trifecta of its types. Also, the fact that ReadableStream | Blob is only Browser, and Readable only Node made it extremely annoying 😃

The streamToString solution posted above works for Node. For the browser, I found that using the Response object from fetch seems a shorter solution:

new Response(response!.body, {});

This will return a Response object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.

Full example:

const s3 = new S3({
  region: "us-east-1",
  credentials: {
    accessKeyId: "replace-it",
    secretAccessKey: "replace-it",
  },
});
const resp = await s3.getObject({
  Bucket: "your-bucket",
  Key: "your-object-key",
});
console.info(await new Response(resp.Body, {}).text())

It’s quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like if (resp.Body instanceof Readable), or declare special interfaces to avoid differences between browser/Node.

There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.

Documentation for getObject operation lists that GetObjectOutput.Body is Readable | ReadableStream | Blob API Reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3.html#getobject

Screenshot

Screen Shot 2021-01-06 at 9 19 12 AM

My IDE can’t tell me what the type of response.Body is. It tells me that it’s any. Perhaps the library configuration could be improved to export the correct type information.

I’m using Visual Studio Code, and it shows type of response.Body as internal.Readable | ReadableStream<any> | Blob on hover.

Please create a new issue with details of your IDE and code if problem persists.

Screenshot

Screen Shot 2021-01-06 at 9 13 01 AM

I’m also very confused about how to read S3 Body responses with SDK v3. The SDK documentation for GetObjectCommand does not describe how to do it, and the SDK examples are also missing it (https://github.com/awsdocs/aws-doc-sdk-examples/issues/1677).

I would ask the AWS SDK team to include in the SDK a simple way to read S3 Body responses. We don’t want to re-implement complicated event handlers and helper functions for this simple purpose every time we use GetObject in a project.

In v2 we could just say something like JSON.parse(response.Body?.toString()). Please make it as simple in v3. Stream-based processing is also important, but it should be only an alternative for the simple case for parsing small JSON objects.

For reference, I was able to do this in Node.js by utilizing node-fetch. I would like something like this be included in AWS SDK.

npm install node-fetch
npm install --save-dev @types/node-fetch
import { Response } from 'node-fetch'

const response = new Response(s3Response.Body)
const data = await response.json()

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = await consumers.text(stream)

Edit: Added await to consumers.text result. Thanks @AHaydar.

Reopening as lot of customers have raised questions. Tagging @AllanZhengYP for comment.

I’ve been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don’t understand this new direction with the SDK. I can’t say I’m a fan. Maybe @trivikr can weigh in.

  • As noted below, I can’t find an export of ReadableStream and Blob so it appears to be impossible to make this code type-safe.

For reference, I’ve rewritten the streamToString with the missing types added back in to comply with my team’s linter settings.

import { Readable } from 'stream';

// Apparently the stream parameter should be of type Readable|ReadableStream|Blob
// The latter 2 don't seem to exist anywhere.
async function streamToString (stream: Readable): Promise<string> {
  return await new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', (chunk) => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
  });
}

As this code is run on Node.js, you can pass Body as Readable as follows:

const bodyContents = await streamToString(Body as Readable);

@all following this issue:

IMPORTANT UPDATE

I have received news that Bezos officially defunded the AWS core services. Do not fret! He’s reallocated these funds to lawyers fees. These fees will be used to prevent SpaceX from progressing. Please, rest assured that the lack of basic, simple, progress on these core AWS libraries is clearly being put to good use!

Hope this update is found useful and promising to all engineers that rely on AWS S3.

A one-line alternative is to use get-stream package, as posted here: https://github.com/aws/aws-sdk-js-v3/issues/1096#issuecomment-616743375

I understand the reason for returning a ReadableStream, but a built-in helper method would be nice. Reading the whole body into string in memory is probably good for 99% of cases.

If some helper method would be a part of the SDK we could just call it as readStream(response.Body) and everyone would be happy not having to add another dependency or 10 lines of boilerplate code to every new project.

I can’t believe a commercial SDK like AWS will provide such useless interfaces.

Seriously? Are we in year 2022 or 2000?

internal.Readable | ReadableStream | Blob

Please showcase how your AWS dev teams use such interface without writing quite a few lines of helper functions. Why would you assume SDK customers want to use such interface?? You of course can provide low-level interface, but the design of good interface 101 already tells us: “make it simple for your users”. Interface design is not about you - the author; it’s about customers. Go back and check the SDKs of other platforms such as .NET, Java – see how they work with customers. No one really wants to fiddle around with internal.Readable | ReadableStream | Blob, no matter how accurate it is in Computer Science. You may think it’s a technically precise and perfect response type – it doesn’t matter, because it’s useless to customers.

How hard is it to implement something more useful like below??

const response = await GetS3Object();
// don't want to play with response.Body? Fine, use helper functions:
const str = await response.asString()
const bytes = await resaponse.asBytes()
const theDataTypeMostCustomersWant = await response.asMostOfYouWant()

P.S. In case anyone simply wants to download text object:

export async function getObjectAsString(
  bucket: string,
  key: string
): Promise<string> {
  const client = new S3Client()
  const response = await client.send(
    new GetObjectCommand({
      Bucket: bucket,
      Key: key
    })
  )

  // The code like below should really be provided as nice interfaces by the SDK itself.
  return new Promise((resolve, reject) => {
    if (!response.Body) {
      reject("No Body on response.");
    } else {
      const chunks: Uint8Array[] = [];
      const bodyStream = response.Body! as Readable;
      bodyStream.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
      bodyStream.on("end", () =>
        resolve(Buffer.concat(chunks).toString("utf-8"))
      );
    }
  });
}

You can also use async iterators to consume readable streams if you’re at nodejs 11.14.0 or above: https://nodejs.org/api/stream.html#readablesymbolasynciterator

const s3Response = await s3Client.send(
  new GetObjectCommand({
    Bucket: bucket,
    Key: key,
  })
);
let s3ResponseBody = "";
for await (const chunk of s3Response.Body) {
  s3ResponseBody += chunk;
}
// const result = JSON.parse(s3ResponseBody)

Guess it’s back to AWS SDK v2.

Can we please get an official comment from someone at AWS about this? Getting an object from S3 should NOT be this difficult…

It’s ridiculous that people have to resort to this GitHub thread to parse their data. Why does a company like Amazon have such shitty and obscure documentation?

For any lost souls trying to parse simple JSON data using GetObjectCommand and don’t want to mess with filestreams, readers, or buffers, https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-793028742 worked like a charm on my NodeJS/Express server.

I’m sure there’s a lot of really good suggestions in this thread too, hope that there’s a solution for this sometime soon.

I just noticed that a lot of the comments here are also covered in the v3 documentation, see section “Getting a file from an Amazon S3 bucket”.

It boogles my mind that I had to read this open issue to find what I was looking for: How to read objects in v3 Really, this is what everyone would expect to find in the information hierarchy:

  • Amazon S3 examples
    • Amazon S3 browser examples
    • Amazon S3 Node.js examples
      • Creating and using Amazon S3 buckets
      • Configuring Amazon S3 buckets
      • Getting a file from an Amazon S3 bucket (this oughta be the top1 reference search for the API!)
      • Managing Amazon S3 bucket access permissions
      • Working with Amazon S3 bucket policies
      • Using an Amazon S3 bucket as a static web host

There’s no mention to the word “object” in this entire list. This should not be just a section many pages below a headline.

export interface GetObjectOutput {
    /**
     * <p>Object data.</p>
     */
    Body?: Readable | ReadableStream | Blob;

  // ... snip
}

image


This is throwing an error in a NodeJS app, because TS config does not load DOM libs.

This results in the Body being set to any.

image

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable. Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

I also wasted lots of time on GetObject and the trifecta of its types. Also, the fact that ReadableStream | Blob is only Browser, and Readable only Node made it extremely annoying 😃

The streamToString solution posted above works for Node. For the browser, I found that using the Response object from fetch seems a shorter solution:

new Response(response!.body, {});

This will return a Response object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.

Full example:

const s3 = new S3({
  region: "us-east-1",
  credentials: {
    accessKeyId: "replace-it",
    secretAccessKey: "replace-it",
  },
});
const resp = await s3.getObject({
  Bucket: "your-bucket",
  Key: "your-object-key",
});
console.info(await new Response(resp.Body, {}).text())

It’s quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like if (resp.Body instanceof Readable), or declare special interfaces to avoid differences between browser/Node.

the use of Response looks as the neatest solution right now, for json and text payloads.

Reading the answers above, in my case, to resize an image to a specific size and do other little things with sharp, I just needed to remove the .toString method from the streamToString function.

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    const chunks = [];
    stream.on("data", (chunk) => chunks.push(chunk));
    stream.on("error", reject);
    // I removed the .toString here
    stream.on("end", () => resolve(Buffer.concat(chunks)));
  });
};

And everything works great:

 const getObjectCommand = new GetObjectCommand({
    Bucket: bucket,
    Key: key,
 });
 const getObjectResponse = await s3Client.send(getObjectCommand);
 const body = await streamToString(getObjectResponse.Body);

 const image = sharp(body);
 const resizedImage = await image.resize({ ... }).toBuffer();

For anyone that got stuck trying to download the s3 item into a local repository (I’m doing so in lambda), you can reference this code, it utilizes nodejs Readable Streams.

import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import { Readable } from "stream";
import fs from "fs";
import path from "path";

const command = new GetObjectCommand(params);
const object = await s3Client.send(command);
const { ContentType, Body } = object;

const body = Body as Readable;
const tempFileName = path.join("/tmp", "downloadedimage." + tmpFileSuffix);
const tempFile = fs.createWriteStream(tempFileName);
body.pipe(tempFile);

Be sure to replace tmpFileSuffix with the correct file suffix.

This is confusing as hell. Even if you’re not going to fix it right away, just put a massive note in the documentation that it does NOT return a Blob if you are using nodejs regardless of what the typescript declaration says

That would at least allow people to figure out what’s wrong without having to spend an hour reading through the whole of this thread

Turns out the body (ReadableStream) property of fetch responses was implemented slightly later than the other methods (like blob(), text(), etc). So looks like AWS is trying to support a few older browsers by falling back on Blob() when body/ReadableStream isn’t available.

Relevant code: https://github.com/aws/aws-sdk-js-v3/blob/608e606c20b3bb1614518de6de313a184db6129f/packages/fetch-http-handler/src/fetch-http-handler.ts#L75-L94

So if you want to support those older browsers you need to be prepared to handle both a Blob and a ReadableStream in browser.

This kind of thing should really be in the docs…

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.

Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

Okay so after spending few hours I got it right. This way we can pipe our s3 response body into sharp and later use the .toBuffer() to push it to bucket.

  const getObj = new GetObjectCommand({
    Bucket,
    Key: objectKey,
  });

  const s3ImgRes = await s3Client.send(getObj);

  const sharpImg = sharp().resize({ width: 500 }).toFormat("webp");

  // pipe the body to sharp img
  s3ImgRes.Body.pipe(sharpImg);

  const putObj = new PutObjectCommand({
    Bucket,
    Key: `converted/${objectKey.replace(/[a-zA-Z]+$/, "webp")}`,
    Body: await sharpImg.toBuffer(),
  });

  await s3Client.send(putObj);

But AWS team please please you need update your docs, I know there is a lot to update but as developer its just so much struggle to use AWS services because of insufficient docs.

Bumping because of how much time I wasted trying to fix this issue…

AWS provides a migration guide that is clearly not thorough enough to simplify the process: https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/migrating-to-v3.html

If there’s going to be a migration guide at all, details like this need to be covered.

This is now documented in the root readme with an example: https://github.com/kuhe/aws-sdk-js-v3/tree/main#streams

You do not need to import sdkStreamMixin explicitly. As of that version It is applied to stream objects in command outputs.

import { S3 } from "@aws-sdk/client-s3";

const client = new S3({});

const getObjectResult = await client.getObject({
  Bucket: "...",
  Key: "...",
});

// env-specific stream with added mixin methods.
const bodyStream = getObjectResult.Body; 

// one-time transform.
const bodyAsString = await bodyStream.transformToString();

// throws an error on 2nd call, stream cannot be rewound.
const __error__ = await bodyStream.transformToString();

If it helps someone who doesn’t want to use a library for converting streams to buffers, here’s the custom function I’m using:

import { Stream } from 'stream';

export async function stream2buffer(stream: Stream): Promise<Buffer> {

    return new Promise<Buffer>((resolve, reject) => {

        const _buf = Array<any>();

        stream.on('data', chunk => _buf.push(chunk));
        stream.on('end', () => resolve(Buffer.concat(_buf)));
        stream.on('error', err => reject(`error converting stream - ${err}`));

    });
}

Moreover, it seems to be a little bit faster than the lib.

Then you can use it like that:

const data = await this.client.getObject({
  Key: path.replace(/^\//g, ''),
  Bucket: this.bucket
});

const file_stream = data.Body;
let content_buffer: Buffer | null = null;

if (file_stream instanceof Readable) {
  content_buffer = await stream2buffer(file_stream); // Here's the buffer
} else {
  throw new Error('Unknown object stream type.');
}
...

I couldn’t get the stream.on solution to work under React. I kept getting the error ‘stream.on is not a function’. It turns out in this environment AWS returns a ReadableStream not a Readable. I ended up have to write my own converter, to handle ReadableStream and work under Typescript with eslint. No extra packages needed to be installed and used.

I thought I’d written my last do while loop a decade ago, but it turned out I still needed to use it here. I couldn’t get an iterator solution to work, and I refused to do the tail recursion I’d seen in the getReader() examples, which wouldn’t compile under Typescript anyway.

async function readableStreamToString(stream: ReadableStream): Promise<string> {
  const chunks: Buffer[] = [];

  const reader = stream.getReader();

  let moreData = true;
  do {
    // eslint-disable-next-line no-await-in-loop
    const { done, value } = await reader.read();
    if (done) {
      moreData = false;
    } else {
      chunks.push(Buffer.from(value as Uint8Array));
    }
  } while (moreData);

  return Buffer.concat(chunks).toString('utf-8');
}

And I call it like this:

async function loadJsonFileFromS3(bucket: string, key: string): Promise<[]> {
  const s3Response = await s3Client.send(
    new GetObjectCommand({ Bucket: bucket, Key: key })
  );
  if (!s3Response.Body) {
    const errorMessage = `${key} returned undefined`;
    throw new Error(errorMessage);
  }

  const fileContents = await readableStreamToString(
    s3Response.Body as ReadableStream
  );
  const contentsAsJson = JSON.parse(fileContents);
  return contentsAsJson;
}

AWS team: it is just about November 2021, and, still, compared to the SDK v2, reading in a simple JSON object from S3 is way too convoluted. If this your idea of a private joke, it’s not funny. Fix it already.

So it looks like going by the latest PR: https://github.com/aws/aws-sdk-js-v3/pull/3977/files

The recommended way to do this is now:

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { sdkStreamMixin } from '@aws-sdk/util-stream-node';

const s3Client = new S3Client({});
const { Body } = await s3Client.send(
  new GetObjectCommand({
    Bucket: 'your-bucket',
    Key: 'your-key',
  }),
);
const objectString = await sdkStreamMixin(Body).transformToString(); // this throws if Body is undefined

Took two solid years, but hey, we have an official solution…

Hard to understand why getting a string from a simple storage service is unnecessary complicated.

@porteron This might help too. This is what I use in my code:

    const data = await S3Object.s3.getObject({
      Bucket: this.bucket,
      Key: this.key,
    });

    return assembleStream(data.Body);
export async function assembleStream(
  stream: Readable,
  options: AssembleStreamOptions = {}
): Promise<string | Buffer> {
  return new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', chunk => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => {
      const result = Buffer.concat(chunks);

      resolve(options.string ? result.toString('utf-8') : result);
    });
  });
}

If you find a method that works that’s shorter than mine, let me know!

Here is an example of how to download an object from S3 and write that as a file to disk, while keeping it as a stream. This example is typescript targeting node.

It seems silly to me if we’re going to all this trouble of having a stream coming from AWS that we then convert that to a buffer or string to write to disk.

I also agree with the sentiments expressed by others in this thread. It is crazy that getObject has become such a complicated operation in the V3 SDK compared with the V2 SDK and is going to trip many people up for years to come.

import type { Readable } from 'node:stream';
import { pipeline } from 'node:stream/promises';
import fs from 'node:fs'
import { GetObjectCommand, S3 } from '@aws-sdk/client-s3';

async function downloadFile() {
  const s3 = new S3({});
  const s3Result = await s3.send(new GetObjectCommand({ Bucket: sourceBucket, Key: sourceKey }));
  if (!s3Result.Body) {
    throw new Error('received empty body from S3');
  }
  await pipeline(s3Result.Body as Readable, fs.createWriteStream('/tmp/filedownload.zip'));
}

Why should everyone have to copy and paste the same boilerplate everywhere? Have you guys not heard of encapsulation? If you are making a new version that you want everyone to use, then it must be better, not worse. I’ve never seen something so obtuse.

The fact people are using a Typescript type assertion in many of these examples stems from the fact AWS has typed this thing poorly. Why is body even a union type? That is basically AWS telling us “we don’t know the shape of the response we’re going to give you, only that it will be one of these three things, but you need to write code to check which of these we gave you”. I think someone at AWS should look into making this an intersection type instead of a union type?

Why should I have to check the return type? Shouldn’t the library know what the return type is already?


  if (res.Body && res.Body instanceof Readable) {
    const payload = await stream2buffer(res.Body);
    console.log(payload.toString('utf-8'));
  }

It seems it is the case that the returned object can be used as any of the three interfaces (intersection type), and it does not seem to be the case that it can only be used as one of the interfaces (as the types would suggest with union type)

If a union type is in fact correct, it begs the question under what scenario I should expect the library to not return this interface to me. This is not documented. Why not just add an example to your docs that writes a “hello world” string and reads it back and outputs it, at the very least, using Typescript so you dogfood your own types (if you did this you’d realize your current types are “not optimal”)

Because AWS has this as a union type, people’s editors will suggest .toString() only (assuming the user has not discriminated the union). However this just prints “[object]” which is pretty poor DX especially considering toString() used to work for v2 of the API from the sounds of it. Perhaps consider structuring your types so that the “trigger suggest” command in VSCode guides the user into a working “hello world” command without all of this fuss.

Typescript version for node (not for browser).

import {GetObjectCommand, S3Client} from '@aws-sdk/client-s3'
import type {Readable} from 'stream'

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body as Readable
// if you are using node version < 17.5.0
return new Promise<Buffer>((resolve, reject) => {
    const chunks: Buffer[] = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})

// if you are using node version >= 17.5.0
return Buffer.concat(await stream.toArray())

For browser usage, the reasons behind this making-thing-complicated decision in SDK v3, also the explanation of the typecasting, I explain it in my blog post.

@dzmitry-kankalovich definitely summed it up correctly, I believe. A high level manager that doesn’t actually care about usability is in charge of the documentation specs as well as things like reading an object to memory (pretty sure it’s all about total lines of docs, not whether or not they matter?). No doubt this SDK is really powerful, and well written (props to the devs)… but reading an object to memory clearly has slipped through the cracks, as well as the entirety of dev UX concerning docs. Been using AWS for years, but this issue (existence, duration, lack of quick response time from AWS team) is pretty jaw dropping - to say the least.

There are 3 clear implementations/packages stated from the community. Adopt one. Ensure type safety. Patch it. Make hundreds, if not thousands, of other dev’s lives easier… that or have us devs asking our companies to never have to use AWS S3 with node again, I guess.

Also, as far as documentation goes… take a look at the Google/Angular team and take some notes @aws

Found an easy solution using transformToString if wanting to parse a JSON file in S3.

import { S3, GetObjectCommand } from '@aws-sdk/client-s3'

const s3 = new S3({});

const getObjectParams = {
  Bucket: 'my-bucket',
  Key: 'my-object',
};
const getObjectCommand = new GetObjectCommand(getObjectParams);
const s3Object = await s3.send(getObjectCommand);

const dataStr = await s3Object.Body?.transformToString();

let data;
if (dataStr) {
  data = JSON.parse(dataStr);
}

So it looks like going by the latest PR: https://github.com/aws/aws-sdk-js-v3/pull/3977/files

Edit: See below post for official answer

The recommended way to do this is now:

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { sdkStreamMixin } from '@aws-sdk/util-stream-node';

const s3Client = new S3Client({});
const { Body } = await s3Client.send(
  new GetObjectCommand({
    Bucket: 'your-bucket',
    Key: 'your-key',
  }),
);
const objectString = await sdkStreamMixin(Body).transformToString(); // this throws if Body is undefined

Of course we can all write our own wrappers and functions, but then we are all repeating the same work and re-inventing the same wheel everywhere and we will all have to change it again when there is another API change. That’s why in my mind it makes sense to centralise very common tasks in the library itself. We’re not talking about something esoteric, here.

Imagine a 5-liner copied and pasted everywhere (such things tend to be posted on discussions such as this) and then someone realises there is a bug in it. It’s easier to maintain things in one place.

Even if a bunch of different people decide to publish their solutions (i.e. as libraries), you end up with n solutions and potentially many bugs, so there is no shared benefit across the board when one is fixed, as there would be if we all subscribed to one central solution.

Also, PutObject lets you send a string, so the API is not symmetrical.

I agree about the build sizes, though, it’s why I decided to use v3.

Throwing into the pot here, I lost 4 hours of time trying to track this down.

@igilham has it right in his earlier message.

It’s still not noted in the docs that the latter two shapes are from DOM and only apply in browsers. I agree it’s frustrating that helper methods for common scenarios aren’t included with the client.

We’ve run into similar issues so often we’ve taken to wrapping all SDK clients in our own classes to extend their operations to handle situations like this. S3 shouldn’t be one of the services we should have to do this for. With such a large library of offerings, I know its hard to keep up with everything, but S3 is hands down one of the most used resources Amazon has available, so it should be receiving the most attention when it comes to DX. Bad experiences on the most common use cases definitely sour the impression of your products, and lowers the likelihood of developer evangelization.

Here’s to hoping 2021 closes out with a packaged implementation for this scenario.

i think now can use for s3.GetObject to return the result as a string https://newbedev.com/how-to-get-response-from-s3-getobject-in-node-js

@berenddeboer you are right, it was added to the documentation, but the documentation itself matches perfectly the poor attention to details this whole issue is about. It is poorly idented (not a problem itself) and not behaving as one would expect, as there’s a return statement that might work for the guy responsible for the unit tests (and apparently also for the documentation), but not for the other poor lost guy who had to face the complex use case of reading a text file and after sometime may overlook a return data; // For unit tests.

so, @dzmitry-kankalovich feel free to add to the script – Ok, so whaddaya think, should we improve its documentation? – Nah…

maybe it’s just our fault to expect a usable SDK for the biggest object storage product from the bigger cloud provider

Thanks. I think that covers my remaining frustrations. I appreciate that it can take time for documentation elsewhere to catch up when a major version is released.

A shorter version is:

const body: stream.Readable|undefined = s3result.Body as stream.Readable|undefined;
if (!body) return;
const payload = await body.read();
const output = Buffer.from(payload).toString())

stream didn’t work for me. This is how I ended up getting a JSON file from the s3 bucket

export const getFileData = (Bucket: string, Key: string) =>
  new Promise(async (res, rej) => {
    const decoder = new TextDecoder()
    const client = new S3Client({ region: 'us-west-3' }) 
    const command = new GetObjectCommand({ Bucket, Key })
    const response = await client?.send(command)
    // @ts-ignore - thanks aws
    const reader = await response.Body.getReader()
    await reader.read().then((resp: any) => {
      res(JSON.parse(decoder.decode(resp.value)))
    })
  })

I just noticed that a lot of the comments here are also covered in the v3 documentation, see section “Getting a file from an Amazon S3 bucket”.

@HerrSchwarz I imagine it was for flexibility, so the dev can decide if they want to stream data into another process or just grab a string directly. This saves having to come up with multiple API functions (getObject, getStream).

But I agree, it is a bit of a pain. There might’ve been a simpler way to accomplish this and have better DX (developer experience).

I’m also seeing a typescript compilation error when using this via “aws-sdk”: “^2.1175.0” on node 14.x Isn’t ReadableStream a node 16+ only interface?

node_modules/@aws-sdk/types/dist-types/serde.d.ts:58:33 - error TS2304: Cannot find name 'ReadableStream'.

58     transformToWebStream: () => ReadableStream;
                                   ~~~~~~~~~~~~~~
Found 1 error in node_modules/@aws-sdk/types/dist-types/serde.d.ts:58

Also ended up here after some time wasted trying to figure out the API. A year later, and there is still no proper documentation in docs.aws.amazon.com.

Will be reverting to v2. Despite the monolith design the API is simpler, smaller, and it only pulls in 10 dependencies.

I’ve been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don’t understand this new direction with the SDK. I can’t say I’m a fan. Maybe @trivikr can weigh in.

This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!

@sinsys where do you get data.Payload from? I do not see that in the GetObject model.

I only see data.Body.

I’m generally a bit concerned about this SDK being maintained over time. There are very serious issues like this one that have been open since May with no resolution.

Oh yes, I use Blobs a lot in browser. I’m just wondering in what circumstances the AWS SDK will return a Blob rather than a ReadableStream since it has typings for both. I’m guessing it might be if the browser doesn’t support ReadableStreams…

Does anyone know in what circumstances body will be a Blob or a ReadableStream (for browser env)?

My code is currently returning ReadableStreams, but it makes me wonder what APIs or configs result in blobs being returned.

@dzmitry-kankalovich Amazon is well-known to not have a solid focus on DX. Often times they build tools that work, but are not great experiences to work with.

IMO, every internal team at Amazon should be looking at Amplify as a model of how a library or SDK should be designed and documented.

I’ve been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don’t understand this new direction with the SDK. I can’t say I’m a fan. Maybe @trivikr can weigh in.

This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!

With Node 18 introducing Web Streams API will this affect s3 download streams in any way? To my knowledge if you were working in node you could assume the Body was always gonna be a Readable. Will it now also support ReadableStream?

Poking around in the SDK a bit, it looks like there are some stream consumers already available for browser and node environments, respectively:

These methods appear to already be used in some of the protocol implementations, like https://github.com/aws/aws-sdk-js-v3/blob/main/clients/client-s3/src/protocols/Aws_restXml.ts#L12024

That being said, @AllanZhengYP 's change will be a nice usability improvement when it makes it in. Keep up the good work Allan and the SDK team! Can you comment on if these above interfaces are stable enough to depend on?

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = consumers.text(stream)

This answer makes a lot of sense but for some reason the import is failing for me here. The line:

import consumers from 'stream/consumers'

fails to the error:

Module not found: Error: Can't resolve 'stream/consumers'

This feels like such a dumb error, but I can’t find a way around it. I’ve tried all of the following to no avail:

npm install stream

This works, but my error does not go away

npm install "stream/consumers"

This fails since the module does not exist

npm install stream-consumers

This works, but it’s a different package so my error still doesn’t go away.

The documentation on the node.js website: is not helpful here since they import it in a totally different way:

import {
  arrayBuffer,
  blob,
  buffer,
  json,
  text,
} from 'node:stream/consumers';

Could someone please tell me the correct way to install and import the ‘stream/consumers’ package. Thank you so much for the help!

Starting from node 16.7, you can simply use the utility consumer functions :

import consumers from 'stream/consumers'

const { Body: stream } = await s3.getObject({
  Bucket: bucket,
  Key: key
})
const objectText = consumers.text(stream)

That’s awesome. Thanks for sharing. Please note that consumers.text returns a promise. So we’d need to await it as well 🙌

I agree. I love AWS but finding this thread took a bit of research…

Has anyone gotten this to work with binary data a react native environment? I get a Blob from the SDK, but I’m having a doozy of a time to get that blob saved on disk.

The response method above #1877 (comment) works great for text, but doesn’t seem to do the trick for binary data.

RN Blobs don’t implement stream https://github.com/facebook/react-native/blob/main/Libraries/Blob/Blob.js and the rn-fetch-blob library the open source project I’m using expects a stream. (I’m new to modern JS and just wanna help fix this bug 😞)

I’ve tried my hand at a few different methods of taking the blob to a stream but all seem to not work on React Native…

I heard you want to save result of getObject to disk? Who would want to do that anyway? Sounds crazy. Anyway, here is how I managed to do it.

import { S3 } from '@aws-sdk/client-s3';
const params = {
    Bucket: bucketName,
    Key: filePath,
};
const client = new S3({
    region: 'us-east-1',
});
const s3Object = await this.client.getObject(params);

// if, to make node typescript happier
if (s3Object.Body && s3Object.Body instanceof stream.Readable) {
    const writeStream = fs.createWriteStream('/tmp/file.txt');
    s3Object.Body.pipe(writeStream);
    // magic
}

Oh hmm i see it was kind of mentioned here https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-886308913

Has anyone gotten this to work with binary data a react native environment? I get a Blob from the SDK, but I’m having a doozy of a time to get that blob saved on disk.

The response method above https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-776187712 works great for text, but doesn’t seem to do the trick for binary data.

RN Blobs don’t implement stream https://github.com/facebook/react-native/blob/main/Libraries/Blob/Blob.js and the rn-fetch-blob library the open source project I’m using expects a stream. (I’m new to modern JS and just wanna help fix this bug 😞)

I’ve tried my hand at a few different methods of taking the blob to a stream but all seem to not work on React Native…

That is awesome @berenddeboer, it has been added recently to the documentation. For me, the biggest problem is that I can’t be sure my own ad-hoc stream implementations are 100% correct and error-resilient in every situation. When AWS provides the implementation, I will trust it.

@kristiannotari rather than adding “DOM” to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.

You can use assembleStream function from @ffxsam comment or get-stream package I posted.

Thanks, but that’s not the issue I’m facing. I correctly fixed my s3 output Body stream retrieval using get-stream to convert it to a Buffer. The problem I have now is that I have the typescript compiler set to check definition files for libraries. This create the errors I cited above, because, rightly so, it doesn’t find ReadableStream and Blob types, among others (see File from dynamodb native attribute types). I don’t know how to manage a scenario like this one where a library supports multiple environments but I only need type definitions for nodejs env. Obviously, this was not an issue with aws sdk v2.

@kristiannotari rather than adding “DOM” to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.

You can use assembleStream function from @ffxsam comment or get-stream package I posted.

If you’re ok with using a (very popular) 3rd party library, this is shorter - https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-799697205

@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses data.Payload to return a response.

@trivikr Thanks for that link to the docs! I didn’t even know they existed till just now.

I didn’t realise the methods and types were documented. I took the description on the client landing page (go to the README) to mean it was a dead-end. Perhaps improving the wording should be a separate issue.

I’ve created documentation update request at https://github.com/aws/aws-sdk-js-v3/issues/1878