GraphEngine: Error: MemoryTrunk failed to expand on Ubuntu after (re)loading storage

I am trying to deploy services that use GraphEngine inside Ubuntu docker containers, but I am getting the following error:

"CommittedMemoryExpand: MemoryTrunk 110 failed to expand.
[ ERROR   ] CellAlloc: MemoryTrunk 110 is out of Memory.

The 110 can be any number between 0 and 255.

After some experiments I observe no relation between this error and the cellid generation. I have tried bothrandom hashing and just sequential numbers, as suggested in https://github.com/Microsoft/GraphEngine/issues/123, so I think this issue is different.

I am however able to consistently reproduce the issue outside docker on an Ubuntu virtual machine. The issue occurs when I save a store, restart the application, reload the store, and start inserting again.

I can reproduce the issue using the following small application, using the lastest version of the master branch of this repository:

Program.cs:

 public class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Loading storage");
			Global.LocalStorage.LoadStorage();
			var count = Global.LocalStorage.GenericCellAccessor_Selector().Count();
			Console.WriteLine($"Storage loaded. {count} in store.");
			Console.WriteLine($"How many entities would you like to insert?");
			int intTemp = Convert.ToInt32(Console.ReadLine());
			Console.WriteLine($"Generating {intTemp} entities");
			//Generate some sample data
			for (var i = 0;i<intTemp; i++)
			{
				Global.LocalStorage.SaveSampleEntity(RandomEntity());
			}
			count = Global.LocalStorage.GenericCellAccessor_Selector().Count();
			Console.WriteLine($"Entities generated. {count} in store");

			Global.LocalStorage.SaveStorage();
			Console.WriteLine($"Saved storage");
		}

		private static SampleEntity RandomEntity()
		{
			return new SampleEntity()
			{
				CellId = CellIdFactory.NewCellId(),
				name = RandomString(16)
			};
		}


		private static Random random = new Random();
		public static string RandomString(int length)
		{
			const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
			return new string(Enumerable.Repeat(chars, length)
			  .Select(s => s[random.Next(s.Length)]).ToArray());
		}
	}

Model.tsl:

cell struct SampleEntity {
	String name;
}

I start the application with the command dotnet TrinityDemo.dll (My project is called TrinityDemo).

When I initially start the program without an existing store and let it create 10 000 000 sample entities it will do so correctly and save the storage. However when I then start the application again and try to insert 1000 more entities, I get the out of memory exceptions. Meanwhile the VM has plenty of memory left and hasn’t even touched the swap file yet.

When I start the same application on the same VM and just insert 100 000 000 (10 times more than previous) entities in a single go without loading the storage, the error does not occur. I therefore don’t think the amount of available memory on the VM is related to the problem. The error also does not occur when I perform the same steps on a windows machine (Also using dotnet core). I would therefore suspect the issue lies somewhere in the native library.

Please let met know if there is any more information I can provide to help!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 23 (23 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks! The issue is no longer occuring in the sample project (Neither on WSL or on an Ubuntu VM), so I think the fix solved it!

I will need some time to get the docker container back up and running, but I will give an update when I know the results 😃

problem identified. Linux requires the buffer to be aligned to system page when committing memory – overlooked this one back then…

a fix is on the way.

@nvankaam the problem is reproduced! I tried with your sample, and also observed OOM when I expand the cell layout in my repro to this:

cell TC
{
    byte[36] data; // 16char string takes 36 bytes
}

Initially it was:

cell TC
{
    int data; // 4 bytes
}

So that results in the difference in total size (approx. 1.34MB vs 149KB per trunk).

hi @nvankaam ! let me try to reproduce the problem on a VM. Looks like the storage’s got “read-only-ish” after a restart? 10M is not a big number and the default configuration should handle it pretty easily.