google-cloud-node: Can't store entities with embedded entities which have properties > 1500 bytes

Seems that excluding a top level property doesn’t exclude an embedded entities properties from being indexed - on top of that, I don’t believe there’s a way to exclude an embedded entities individual properties, that would solve my problem too.

Steps to reproduce

Just use this script (I’m running the datastore emulator)

const Datastore = require('@google-cloud/datastore');
const datastore = new Datastore({ projectId: 'datastore-test' });

datastore.insert({
  key: datastore.key('User'),
  data: [
    {
      name: 'description',
      value: {
        text: Buffer.alloc(1501, 'a').toString(),
      },
      excludeFromIndexes: true
    }
  ]
}, (err, result) => {
  if (err) {
    console.error(err);
  } else {
    console.log(result.data);
  }
});

I’d expect there to be no error - the embedded entity ({ text: '...' }) shouldn’t be indexed I believe? But it throw an error saying:

Error: The value of property "text" is longer than 1500 bytes.

Environment

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 41 (18 by maintainers)

Most upvoted comments

Could I ask what the status of this issue is? I have some entities that have embedded entities that have properties that are >1500 Bytes. When I try setting excludeFromIndexes: true on the embedded entity I still get an error thrown for its sub-properties that are >1500 Bytes. Is there some way I can store them?

@stephenplusplus thanks for the options - I think the data seems to make the more sense to me… i.e. value is for more ‘basic’ types, and data for embedded objects.

It’s pretty fringe edge case at this point - and I’m not sure if it’s even possible to do, but if you had an array of objects, could you index them?

e.g.

var rawPersonEntity = {
  key: datastore.key('Person'),
  data: [
    {
      name: 'name',
      value: 'Mary'
    },
    {
      name: 'metadata',
      data: [
        {
          excludeFromIndexes: true,
          name: 'bio',
          value: '...'
        }
      ]
    },
    {
      name: 'subjects',
      value: [
        {
          excludeFromIndexes: true, // Is this something that can be done? Is there a reason why it would be done?
          name: 'history',
          credits: 2
        },
        // ...
      ]
    }
  ]
}

Other than that, I can’t see any issue with them…Another option would be making an datastore.value style factory - like datastore.key - where you can specify types and the data associated with it e.g.

datastore.insert({
  key: datastore.key('Person'),
  data: datastore.value({
    type: datastore.Entity, // Defaults to 'auto'
    data: {
      name: datastore.value({
        type: String,
        value: 'Mary'
      }),
      metadata: datastore.value({
        type: datastore.Entity,
        value: {
          bio: datastore.value({
            type: String,
            excludeFromIndexes: true,
            value: '...'
          })
        }
      }),
      subjects: datastore.value({
        type: Array,
        value: [
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'history',
              }),
              credits: datastore.value({
                type: Number,
                value: 2
              })
            }
          },
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'french',
              }),
              credits: datastore.value({
                type: Number,
                value: 4
              })
            }
          },
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'maths',
              }),
              credits: datastore.value({
                type: Number,
                value: 4
              })
            }
          }
        ]
      })
    }
  });

Ha - after writing that all out, I realise how incredibly over the top that looks, but it feels robust. Perhaps not as a candidate for quickly writing out simple entities, but perhaps as a way that will work well with a helper utility? e.g. for my use case, I’m converting incoming JSON to an entity, and I want to recursively step through it and add excludeFromIndexes as needed…thoughts?

Here’s the issue for the catch-all feature request: #2510

How do you exclude properties that are not written in camel case?

This worked for me:

datastore.save({
  key: ...,
  data: {
    test: {
      'This is a key': 'This is a string > 1500 bytes ...'
    }
  },
  excludeFromIndexes: [
    'test.This is a key'
  ]
}, ...)

I just pumped in to this issue. We have data structure where under main entity there is embedded entity (or actually array of entities) that has a property that is longer than 1500 bytes. As we can’t exclude this >1500b property from indexing, we can’t store even the main entity.

Is there any progress on this? Basically datastore is unusable for us at this point

Just piping in that I’d opened an issue for this internally several days ago, but I’d routed it to the wrong place. Corrected, and looks like @stephenplusplus is aware.

@lukesneeringer @stephenplusplus Thanks for the “fix”, but can this please be updated in the documentation? Personally, I think it’s weird that I have to exclude every single subproperty from indexing but the bigger issue is that the documentation currently says that “If you exclude this value from indexing, then all subproperties are also excluded from indexing”. This is very misleading (at least it was for me) and it took quite a while to find this issue.

EDIT: Another question that came up. How do you exclude properties that are not written in camel case?

data: {
  test: {
    'This is a key': 'This is a string > 1500 bytes ...'
  }
}

@stephenplusplus thanks for this! Looks good to me, I like the general concept.

So two potential things:

  1. Is your proposed solution also allowing for exclusion for all nested properties? e.g. Will excluding metadata exclude all properties / nested entities beneath it? Alternatively could a metadata.* be also supported? Otherwise we’ll still run into the current problem of having to explicitly ignore all properties, no matter how large the entity.

  2. Suppose you hold an array of values which don’t all follow the same schema:

{
  data: [{
    foo: '...'
  }, {
    bar: '...'
  }]
}

Should this solution also try address not indexing one particular entity’s property? e.g. a syntax like: data.0.foo? I’m not sure if this is even supported by datastore’s indexing…? But a scenario where this might be appropriate is something like storing a series of metadata objects, and searching for arbitrary values inside them. I’m not sure how valid a use case this is, but just flagging it.

Other than that, I think it looks good 👍

Thanks for your patience, everyone. There are a couple of solutions we’ve talked about, but I’d like to get feedback on one more that might be the most simple. Can you see any “gotchas” using an “excludeFromIndexes” array on the top level to define where the unindexed properties are?

(cc @bedeoverend)

datastore.insert({
  key: datastore.key('Person'),
  excludeFromIndexes: [
    'description',
    'metadata.bio',
    'subjects.name'
  ],
  value: {
    description: '...',
    metadata: {
      bio: '...'
    },
    subjects: [
      {
        name: '...',
        credits: 2
      }
    ]
  }
})

This seems like it would also be more simple than the current “data”/verbose definition syntax for non-nested properties, i.e.

// before
datastore.insert({
  key: datastore.key('Person'),
  data: [
    {
      name: 'description',
      value: '...',
      excludeFromIndexes: true
    }
  ]
})

// after
datastore.insert({
  key: datastore.key('Person'),
  excludeFromIndexes: ['description'],

  // `data` remains a plain JavaScript object:
  data: {
    description: '...'
  }
})
  1. Let’s say we have an entity with a property of “text” type, i.e. not string.
  2. When the property is updated (with unindexed string), the type of the property is changing from “text” to “string”.
  3. This breaks some other software, which expects to have “text” property.

The question is: how to write >1500 chars as a “text”?

FWIW I’ve had a play with the datastore save method to support the explicit syntax on nested properties e.g.

datastore.insert({
  key: datastore.key('Stuff'),
  data: [{
    name: 'description',
    value: [{
      name: 'text',
      value: Buffer.alloc(1501, 'a').toString(),
      excludeFromIndexes: true
    }, {
      name: 'long',
      value: true
    }]
  }]
}, handler);

obviously though that wouldn’t work as an interface as it would break any properties that have array values e.g. this would break:

datastore.insert({
  key: datastore.key('Stuff'),
  data: [{
    name: 'foo',
    value: [ 1, 2, 3 ]
  }]
}, handler);

For reference, this is the hacked reduction (previously here) function to make this work:

function encodeValues(acc, data) {
  var value;

  if (is.array(data.value)) {
    value = {
      entityValue: {
        properties: data.value.reduce(encodeValues, {})
      }
    };
  } else {
    value = entity.encodeValue(data.value);

    if (is.boolean(data.excludeFromIndexes)) {
      var excluded = data.excludeFromIndexes;
      var values = value.arrayValue && value.arrayValue.values;

      if (values) {
        values = values.map(propAssign('excludeFromIndexes', excluded));
      } else {
        value.excludeFromIndexes = data.excludeFromIndexes;
      }
    }
  }

  acc[data.name] = value;

  return acc;
}