GraphEngine: Error: MemoryTrunk failed to expand on Ubuntu after (re)loading storage
I am trying to deploy services that use GraphEngine inside Ubuntu docker containers, but I am getting the following error:
"CommittedMemoryExpand: MemoryTrunk 110 failed to expand.
[ ERROR ] CellAlloc: MemoryTrunk 110 is out of Memory.
The 110 can be any number between 0 and 255.
After some experiments I observe no relation between this error and the cellid generation. I have tried bothrandom hashing and just sequential numbers, as suggested in https://github.com/Microsoft/GraphEngine/issues/123, so I think this issue is different.
I am however able to consistently reproduce the issue outside docker on an Ubuntu virtual machine. The issue occurs when I save a store, restart the application, reload the store, and start inserting again.
I can reproduce the issue using the following small application, using the lastest version of the master branch of this repository:
Program.cs:
public class Program
{
static void Main(string[] args)
{
Console.WriteLine("Loading storage");
Global.LocalStorage.LoadStorage();
var count = Global.LocalStorage.GenericCellAccessor_Selector().Count();
Console.WriteLine($"Storage loaded. {count} in store.");
Console.WriteLine($"How many entities would you like to insert?");
int intTemp = Convert.ToInt32(Console.ReadLine());
Console.WriteLine($"Generating {intTemp} entities");
//Generate some sample data
for (var i = 0;i<intTemp; i++)
{
Global.LocalStorage.SaveSampleEntity(RandomEntity());
}
count = Global.LocalStorage.GenericCellAccessor_Selector().Count();
Console.WriteLine($"Entities generated. {count} in store");
Global.LocalStorage.SaveStorage();
Console.WriteLine($"Saved storage");
}
private static SampleEntity RandomEntity()
{
return new SampleEntity()
{
CellId = CellIdFactory.NewCellId(),
name = RandomString(16)
};
}
private static Random random = new Random();
public static string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Model.tsl:
cell struct SampleEntity {
String name;
}
I start the application with the command dotnet TrinityDemo.dll (My project is called TrinityDemo).
When I initially start the program without an existing store and let it create 10 000 000 sample entities it will do so correctly and save the storage. However when I then start the application again and try to insert 1000 more entities, I get the out of memory exceptions. Meanwhile the VM has plenty of memory left and hasn’t even touched the swap file yet.
When I start the same application on the same VM and just insert 100 000 000 (10 times more than previous) entities in a single go without loading the storage, the error does not occur. I therefore don’t think the amount of available memory on the VM is related to the problem. The error also does not occur when I perform the same steps on a windows machine (Also using dotnet core). I would therefore suspect the issue lies somewhere in the native library.
Please let met know if there is any more information I can provide to help!
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (23 by maintainers)
Commits related to this issue
- Trinity.C: fix #212 — committed to microsoft/GraphEngine by yatli 6 years ago
Thanks! The issue is no longer occuring in the sample project (Neither on WSL or on an Ubuntu VM), so I think the fix solved it!
I will need some time to get the docker container back up and running, but I will give an update when I know the results 😃
problem identified. Linux requires the buffer to be aligned to system page when committing memory – overlooked this one back then…
a fix is on the way.
@nvankaam the problem is reproduced! I tried with your sample, and also observed OOM when I expand the cell layout in my repro to this:
Initially it was:
So that results in the difference in total size (approx. 1.34MB vs 149KB per trunk).
hi @nvankaam ! let me try to reproduce the problem on a VM. Looks like the storage’s got “read-only-ish” after a restart? 10M is not a big number and the default configuration should handle it pretty easily.