Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assistance with adjusting default Arena Allocator C/C++ API #23768

Open
dsakharuk opened this issue Feb 20, 2025 · 1 comment
Open

Assistance with adjusting default Arena Allocator C/C++ API #23768

dsakharuk opened this issue Feb 20, 2025 · 1 comment

Comments

@dsakharuk
Copy link

Describe the issue

Hello All,

I am trying to figure out how to adjust the parameters of the default cpu Arena allocator in ONNX. The only example code that I found was explaining how to create a shared allocator but since I am running on one core and thread I think that is overkill.

My thought process is as follows:

  1. Default arena allocator works well already
  2. My model tensors continuously grow in size so arena expansion is frequent and expensive.
  3. I would like to use the default allocator but with the ability to adjust the initial chunk size bytes to reduce arena growth during inferences.

Can anyone provide some starter C++ code or point me to any tutorial that does something similar?

Thanks in advance.

To reproduce

Using C/C++ API on Linux

Urgency

No response

Platform

Linux

OS Version

Centos 7

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.21.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@yuslepukhin
Copy link
Member

yuslepukhin commented Feb 21, 2025

Arena allocator was primarily created to be used with GPU memory allocations. It is not very performant in multi-threaded scenarios on CPU.

For CPU, you can try first disable the arena all together and see if this changes your performance. The below makes tensor allocations go directly to the OS heap. The memory is returned to the OS, not to arena.

To disable it you can use:

Ort::SessionOptions sess_opts;
sess_opts.DisableCpuMemArena();

If this affects the performance of your scenario in a negative way you have an option to change the way arena allocates memory. The default strategy is a power of two, which makes it double the memory when more is requested.
The other strategy is the the same as requested' which is more conservative.

However, there is not a direct way to change it in the InferenceSession.
To accomplish it you need to create a shared allocator within the environment and configure it with OrtArenaConfig instance.
Within OrtArenaConfig there are multiple useful fields you can explore and adjust.

Image

You can set arena_extend_strategy to 1, and other things you wish to experiment with.
The last step is set session options to use shared allocators from the environment, and then create an instance of inference session.
To put this all together the below code illustrates the approach. (I have not tried to compile it).

Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE); // Logging for awareness

auto arena_memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
OrtArenaConfig arena_config;
arena_config.arena_extend_strategy = 1;

env.CreateAndRegisterAllocator(arena_memory_info, &arena_config);

// Make the session use environment level allocators
Ort::SessionOptions sess_opts;
sess_opts.AddConfigEntry(kOrtSessionOptionsConfigUseEnvAllocators, "1");

// Use the above session_opts to create your Ort::Session.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants