Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy/pathplannerlib memory issues #149

Open
virtuald opened this issue Feb 4, 2025 · 7 comments
Open

NumPy/pathplannerlib memory issues #149

virtuald opened this issue Feb 4, 2025 · 7 comments

Comments

@virtuald
Copy link
Member

virtuald commented Feb 4, 2025

RoboRIO 1 with current config. Here's python via pmap:

396:   python3
00010000      4K r-x-- python3.13
00020000      4K r---- python3.13
00021000      4K rw--- python3.13
0111d000   1500K rw---   [ anon ]
b5f69000    108K r-x-- libncurses.so.5.9
b5f84000     64K ----- libncurses.so.5.9
b5f94000      4K r---- libncurses.so.5.9
b5f95000      4K rw--- libncurses.so.5.9
b5f96000     16K r-x-- _random.cpython-313-arm-linux-gnueabi.so
b5f9a000     60K ----- _random.cpython-313-arm-linux-gnueabi.so
b5fa9000      4K r---- _random.cpython-313-arm-linux-gnueabi.so
b5faa000      4K rw--- _random.cpython-313-arm-linux-gnueabi.so
b5fab000     16K r-x-- _bisect.cpython-313-arm-linux-gnueabi.so
b5faf000     60K ----- _bisect.cpython-313-arm-linux-gnueabi.so
b5fbe000      4K r---- _bisect.cpython-313-arm-linux-gnueabi.so
b5fbf000      4K rw--- _bisect.cpython-313-arm-linux-gnueabi.so
b5fc0000    492K rw---   [ anon ]
b603b000     80K r-x-- math.cpython-313-arm-linux-gnueabi.so
b604f000     60K ----- math.cpython-313-arm-linux-gnueabi.so
b605e000      4K r---- math.cpython-313-arm-linux-gnueabi.so
b605f000      4K rw--- math.cpython-313-arm-linux-gnueabi.so
b6060000     16K r-x-- _posixsubprocess.cpython-313-arm-linux-gnueabi.so
b6064000     60K ----- _posixsubprocess.cpython-313-arm-linux-gnueabi.so
b6073000      4K r---- _posixsubprocess.cpython-313-arm-linux-gnueabi.so
b6074000      4K rw--- _posixsubprocess.cpython-313-arm-linux-gnueabi.so
b6075000    256K rw---   [ anon ]
b60b5000    124K r-x-- liblzma.so.5.2.3
b60d4000     60K ----- liblzma.so.5.2.3
b60e3000      4K r---- liblzma.so.5.2.3
b60e4000      4K rw--- liblzma.so.5.2.3
b60e5000     28K r-x-- _lzma.cpython-313-arm-linux-gnueabi.so
b60ec000     60K ----- _lzma.cpython-313-arm-linux-gnueabi.so
b60fb000      4K r---- _lzma.cpython-313-arm-linux-gnueabi.so
b60fc000      4K rw--- _lzma.cpython-313-arm-linux-gnueabi.so
b60fd000     56K r-x-- libbz2.so.1.0.6
b610b000     60K ----- libbz2.so.1.0.6
b611a000      4K r---- libbz2.so.1.0.6
b611b000      4K rw--- libbz2.so.1.0.6
b611c000     16K r-x-- _bz2.cpython-313-arm-linux-gnueabi.so
b6120000     60K ----- _bz2.cpython-313-arm-linux-gnueabi.so
b612f000      4K r---- _bz2.cpython-313-arm-linux-gnueabi.so
b6130000      4K rw--- _bz2.cpython-313-arm-linux-gnueabi.so
b6131000     76K r-x-- libz.so.1.2.11
b6144000     60K ----- libz.so.1.2.11
b6153000      4K r---- libz.so.1.2.11
b6154000      4K rw--- libz.so.1.2.11
b6155000     32K r-x-- zlib.cpython-313-arm-linux-gnueabi.so
b615d000     60K ----- zlib.cpython-313-arm-linux-gnueabi.so
b616c000      4K r---- zlib.cpython-313-arm-linux-gnueabi.so
b616d000      4K rw--- zlib.cpython-313-arm-linux-gnueabi.so
b616e000    108K r-x-- libgcc_s.so.1
b6189000     60K ----- libgcc_s.so.1
b6198000      4K r---- libgcc_s.so.1
b6199000      4K rw--- libgcc_s.so.1
b619a000     24K r-x-- libffi.so.6.0.4
b61a0000     60K ----- libffi.so.6.0.4
b61af000      4K r---- libffi.so.6.0.4
b61b0000      4K rw--- libffi.so.6.0.4
b61b1000    104K r-x-- _ctypes.cpython-313-arm-linux-gnueabi.so
b61cb000     60K ----- _ctypes.cpython-313-arm-linux-gnueabi.so
b61da000      4K r---- _ctypes.cpython-313-arm-linux-gnueabi.so
b61db000      8K rw--- _ctypes.cpython-313-arm-linux-gnueabi.so
b61dd000    256K rw---   [ anon ]
b621d000     16K r-x-- fcntl.cpython-313-arm-linux-gnueabi.so
b6221000     60K ----- fcntl.cpython-313-arm-linux-gnueabi.so
b6230000      4K r---- fcntl.cpython-313-arm-linux-gnueabi.so
b6231000      4K rw--- fcntl.cpython-313-arm-linux-gnueabi.so
b6232000     16K r-x-- termios.cpython-313-arm-linux-gnueabi.so
b6236000     60K ----- termios.cpython-313-arm-linux-gnueabi.so
b6245000      4K r---- termios.cpython-313-arm-linux-gnueabi.so
b6246000      4K rw--- termios.cpython-313-arm-linux-gnueabi.so
b6247000     36K r-x-- _struct.cpython-313-arm-linux-gnueabi.so
b6250000     60K ----- _struct.cpython-313-arm-linux-gnueabi.so
b625f000      4K r---- _struct.cpython-313-arm-linux-gnueabi.so
b6260000      4K rw--- _struct.cpython-313-arm-linux-gnueabi.so
b6261000     20K r-x-- select.cpython-313-arm-linux-gnueabi.so
b6266000     60K ----- select.cpython-313-arm-linux-gnueabi.so
b6275000      4K r---- select.cpython-313-arm-linux-gnueabi.so
b6276000      4K rw--- select.cpython-313-arm-linux-gnueabi.so
b6277000    256K rw---   [ anon ]
b62b7000    672K r-x-- unicodedata.cpython-313-arm-linux-gnueabi.so
b635f000     64K ----- unicodedata.cpython-313-arm-linux-gnueabi.so
b636f000      4K r---- unicodedata.cpython-313-arm-linux-gnueabi.so
b6370000      4K rw--- unicodedata.cpython-313-arm-linux-gnueabi.so
b6371000    512K rw---   [ anon ]
b63f1000     12K r-x-- _opcode.cpython-313-arm-linux-gnueabi.so
b63f4000     60K ----- _opcode.cpython-313-arm-linux-gnueabi.so
b6403000      4K r---- _opcode.cpython-313-arm-linux-gnueabi.so
b6404000      4K rw--- _opcode.cpython-313-arm-linux-gnueabi.so
b6405000   1024K rw---   [ anon ]
b6505000    104K r-x-- libtinfo.so.5.9
b651f000     64K ----- libtinfo.so.5.9
b652f000      8K r---- libtinfo.so.5.9
b6531000      4K rw--- libtinfo.so.5.9
b6532000    216K r-x-- libreadline.so.7.0
b6568000     64K ----- libreadline.so.7.0
b6578000      4K r---- libreadline.so.7.0
b6579000     16K rw--- libreadline.so.7.0
b657d000      4K rw---   [ anon ]
b6585000     24K r-x-- readline.cpython-313-arm-linux-gnueabi.so
b658b000     60K ----- readline.cpython-313-arm-linux-gnueabi.so
b659a000      4K r---- readline.cpython-313-arm-linux-gnueabi.so
b659b000      4K rw--- readline.cpython-313-arm-linux-gnueabi.so
b659c000   1280K rw---   [ anon ]
b66dc000      8K r-x-- ISO8859-1.so
b66de000     60K ----- ISO8859-1.so
b66ed000      4K r---- ISO8859-1.so
b66ee000      4K rw--- ISO8859-1.so
b66ef000    220K r---- LC_CTYPE
b6726000   1924K r---- locale-archive
b6907000   1172K r-x-- libc-2.24.so
b6a2c000     64K ----- libc-2.24.so
b6a3c000      8K r---- libc-2.24.so
b6a3e000      4K rw--- libc-2.24.so
b6a3f000     12K rw---   [ anon ]
b6a42000    436K r-x-- libm-2.24.so
b6aaf000     60K ----- libm-2.24.so
b6abe000      4K r---- libm-2.24.so
b6abf000      4K rw--- libm-2.24.so
b6ac0000      8K r-x-- libutil-2.24.so
b6ac2000     60K ----- libutil-2.24.so
b6ad1000      4K r---- libutil-2.24.so
b6ad2000      4K rw--- libutil-2.24.so
b6ad3000      8K r-x-- libdl-2.24.so
b6ad5000     60K ----- libdl-2.24.so
b6ae4000      4K r---- libdl-2.24.so
b6ae5000      4K rw--- libdl-2.24.so
b6ae6000     88K r-x-- libpthread-2.24.so
b6afc000     60K ----- libpthread-2.24.so
b6b0b000      4K r---- libpthread-2.24.so
b6b0c000      4K rw--- libpthread-2.24.so
b6b0d000      8K rw---   [ anon ]
b6b0f000   3748K r-x-- libpython3.13.so.1.0
b6eb8000     60K ----- libpython3.13.so.1.0
b6ec7000    160K r---- libpython3.13.so.1.0
b6eef000    272K rw--- libpython3.13.so.1.0
b6f33000    164K rw---   [ anon ]
b6f5c000    128K r-x-- ld-2.24.so
b6f80000      8K rw---   [ anon ]
b6f85000     24K rw---   [ anon ]
b6f8b000      4K r---- ld-2.24.so
b6f8c000      4K rw--- ld-2.24.so
be8f6000    132K rw---   [ stack ]
bec99000      4K r-x--   [ anon ]
ffff0000      4K r-x--   [ anon ]
 total    18124K

Same process, now I import numpy. Here's the diff.

--- step1	2025-02-03 23:04:45.149074482 -0500
+++ step2	2025-02-03 23:04:45.154074518 -0500
@@ -2,7 +2,43 @@
 00010000      4K r-x-- python3.13
 00020000      4K r---- python3.13
 00021000      4K rw--- python3.13
-0111d000   1500K rw---   [ anon ]
+0111d000   3068K rw---   [ anon ]
+b3c4f000    512K rw---   [ anon ]
+b3ccf000    128K r-x-- _umath_linalg.cpython-313-arm-linux-gnueabi.so
+b3cef000     64K ----- _umath_linalg.cpython-313-arm-linux-gnueabi.so
+b3cff000      4K r---- _umath_linalg.cpython-313-arm-linux-gnueabi.so
+b3d00000      4K rw--- _umath_linalg.cpython-313-arm-linux-gnueabi.so
+b3d01000   1024K rw---   [ anon ]
+b3e01000    100K r-x-- _pickle.cpython-313-arm-linux-gnueabi.so
+b3e1a000     60K ----- _pickle.cpython-313-arm-linux-gnueabi.so
+b3e29000      4K r---- _pickle.cpython-313-arm-linux-gnueabi.so
+b3e2a000      4K rw--- _pickle.cpython-313-arm-linux-gnueabi.so
+b3e2b000    768K rw---   [ anon ]
+b3eeb000      4K r-x-- _contextvars.cpython-313-arm-linux-gnueabi.so
+b3eec000     60K ----- _contextvars.cpython-313-arm-linux-gnueabi.so
+b3efb000      4K r---- _contextvars.cpython-313-arm-linux-gnueabi.so
+b3efc000      4K rw--- _contextvars.cpython-313-arm-linux-gnueabi.so
+b3efd000    108K r-x-- _datetime.cpython-313-arm-linux-gnueabi.so
+b3f18000     60K ----- _datetime.cpython-313-arm-linux-gnueabi.so
+b3f27000      4K r---- _datetime.cpython-313-arm-linux-gnueabi.so
+b3f28000      8K rw--- _datetime.cpython-313-arm-linux-gnueabi.so
+b3f2a000   4692K r-x-- _multiarray_umath.cpython-313-arm-linux-gnueabi.so
+b43bf000     64K ----- _multiarray_umath.cpython-313-arm-linux-gnueabi.so
+b43cf000      4K r---- _multiarray_umath.cpython-313-arm-linux-gnueabi.so
+b43d0000     92K rw--- _multiarray_umath.cpython-313-arm-linux-gnueabi.so
+b43e7000    304K rw---   [ anon ]
+b4433000      4K -----   [ anon ]
+b4434000    508K rwx--   [ anon ]
+b44b3000  16384K rw---   [ anon ]
+b54b3000    952K r-x-- libgfortran.so.5
+b55a1000     60K ----- libgfortran.so.5
+b55b0000      4K r---- libgfortran.so.5
+b55b1000      4K rw--- libgfortran.so.5
+b55b2000   9844K r-x-- libopenblas.so.0
+b5f4f000     60K ----- libopenblas.so.0
+b5f5e000     12K r---- libopenblas.so.0
+b5f61000     24K rw--- libopenblas.so.0
+b5f67000      8K rw---   [ anon ]
 b5f69000    108K r-x-- libncurses.so.5.9
 b5f84000     64K ----- libncurses.so.5.9
 b5f94000      4K r---- libncurses.so.5.9
@@ -141,7 +177,7 @@
 b6f85000     24K rw---   [ anon ]
 b6f8b000      4K r---- ld-2.24.so
 b6f8c000      4K rw--- ld-2.24.so
-be8f6000    132K rw---   [ stack ]
+be8f6000    132K rwx--   [ stack ]
 bec99000      4K r-x--   [ anon ]
 ffff0000      4K r-x--   [ anon ]
- total    18124K
+ total    55636K

Same process, import wpimath:

--- step2	2025-02-03 23:04:45.154074518 -0500
+++ step3	2025-02-03 23:04:45.159074555 -0500
@@ -2,8 +2,52 @@
 00010000      4K r-x-- python3.13
 00020000      4K r---- python3.13
 00021000      4K rw--- python3.13
-0111d000   3068K rw---   [ anon ]
-b3c4f000    512K rw---   [ anon ]
+0111d000   4916K rw---   [ anon ]
+b2c19000    256K rw---   [ anon ]
+b2c59000   4076K r-x-- _controls.cpython-313-arm-linux-gnueabi.so
+b3054000     60K ----- _controls.cpython-313-arm-linux-gnueabi.so
+b3063000     24K r---- _controls.cpython-313-arm-linux-gnueabi.so
+b3069000     20K rw--- _controls.cpython-313-arm-linux-gnueabi.so
+b306e000    312K r-x-- _spline.cpython-313-arm-linux-gnueabi.so
+b30bc000     60K ----- _spline.cpython-313-arm-linux-gnueabi.so
+b30cb000      4K r---- _spline.cpython-313-arm-linux-gnueabi.so
+b30cc000      4K rw--- _spline.cpython-313-arm-linux-gnueabi.so
+b30cd000   1468K r-x-- _kinematics.cpython-313-arm-linux-gnueabi.so
+b323c000     64K ----- _kinematics.cpython-313-arm-linux-gnueabi.so
+b324c000     12K r---- _kinematics.cpython-313-arm-linux-gnueabi.so
+b324f000      8K rw--- _kinematics.cpython-313-arm-linux-gnueabi.so
+b3251000    132K r-x-- _wpimath.cpython-313-arm-linux-gnueabi.so
+b3272000     60K ----- _wpimath.cpython-313-arm-linux-gnueabi.so
+b3281000      4K r---- _wpimath.cpython-313-arm-linux-gnueabi.so
+b3282000      4K rw--- _wpimath.cpython-313-arm-linux-gnueabi.so
+b3283000    844K r-x-- _geometry.cpython-313-arm-linux-gnueabi.so
+b3356000     64K ----- _geometry.cpython-313-arm-linux-gnueabi.so
+b3366000      8K r---- _geometry.cpython-313-arm-linux-gnueabi.so
+b3368000      8K rw--- _geometry.cpython-313-arm-linux-gnueabi.so
+b336a000    256K rw---   [ anon ]
+b33aa000   2180K r-x-- libwpimath.so
+b35cb000     60K ----- libwpimath.so
+b35da000      4K r---- libwpimath.so
+b35db000      8K rw--- libwpimath.so
+b35dd000   1192K r-x-- _wpiutil.cpython-313-arm-linux-gnueabi.so
+b3707000     64K ----- _wpiutil.cpython-313-arm-linux-gnueabi.so
+b3717000      8K r---- _wpiutil.cpython-313-arm-linux-gnueabi.so
+b3719000     12K rw--- _wpiutil.cpython-313-arm-linux-gnueabi.so
+b371c000   1724K r-x-- libstdc++.so.6.0.30
+b38cb000     60K ----- libstdc++.so.6.0.30
+b38da000     20K r---- libstdc++.so.6.0.30
+b38df000      8K rw--- libstdc++.so.6.0.30
+b38e1000      8K rw---   [ anon ]
+b38e3000     16K r-x-- libatomic.so.1.2.0
+b38e7000     60K ----- libatomic.so.1.2.0
+b38f6000      4K r---- libatomic.so.1.2.0
+b38f7000      4K rw--- libatomic.so.1.2.0
+b38f8000      4K rw---   [ anon ]
+b38f9000   3312K r-x-- libwpiutil.so
+b3c35000     60K ----- libwpiutil.so
+b3c44000     16K r---- libwpiutil.so
+b3c48000     20K rw--- libwpiutil.so
+b3c4d000    520K rw---   [ anon ]
 b3ccf000    128K r-x-- _umath_linalg.cpython-313-arm-linux-gnueabi.so
 b3cef000     64K ----- _umath_linalg.cpython-313-arm-linux-gnueabi.so
 b3cff000      4K r---- _umath_linalg.cpython-313-arm-linux-gnueabi.so
@@ -180,4 +224,4 @@
 be8f6000    132K rwx--   [ stack ]
 bec99000      4K r-x--   [ anon ]
 ffff0000      4K r-x--   [ anon ]
- total    55636K
+ total    74084K
@virtuald virtuald transferred this issue from robotpy/roborio-openblas Feb 4, 2025
@virtuald
Copy link
Member Author

virtuald commented Feb 4, 2025

Currently, we recommend disabling the niwebserver via robotpy installer niweb disable. There are other places we could conserve memory.

  • The openblas buffer is set to 8MB (and there is one per core), we could reduce it further to 4MB or 2MB, I'm not sure whether it would break anything or not
  • crond on my rio takes up 8MB of RAM, in theory disabling that could be a win for low effort. It seems to only run logrotate every 5 minutes -- and I think it fails at doing that because of a config issue (this can be done by SSH'ing to the RIO and running /etc/init.d/crond stop followed by update-rc.d -f crond remove)
  • vsftp takes up 2MB of RAM, but I think the driver station uses that for something
  • lvrt takes up a whopping 40MB of RAM, and we've talked about replacing it in the past with a C executable that does the same thing. This is mildly risky, but I've heard it can work

@virtuald
Copy link
Member Author

virtuald commented Feb 4, 2025

I guess a different option is to build numpy without openblas support, which would save 28MB. It's unknown whether this would decrease performance. In theory, it's supposed to accelerate multiplication.

Another alternative would be to reduce the number of threads that OpenBLAS uses:

>>> import os
>>> os.environ["OMP_NUM_THREADS"] = "1"
>>> import numpy

This reduces the amount of memory used by 10MB. Unknown performance impact.

@virtuald virtuald changed the title Memory issues NumPy/pathplannerlib memory issues Feb 4, 2025
@virtuald
Copy link
Member Author

virtuald commented Feb 4, 2025

I built a version of numpy that doesn't use OpenBLAS. For now, you need to explicitly enable it in your pyproject.toml:

[tool.robotpy]

requires = [
    "numpy==2.2.1.dev0; platform_machine == 'roborio'",
]

Run robotpy sync and on the next deploy it should install that specific version of numpy to your robot.

With this version of numpy, it seems to save around 15MB of RAM.

@roberttuttle
Copy link

roberttuttle commented Feb 14, 2025

Chiming in here ... 2881 has been tracking memory issues across CD and GitHub issue threads.

  • We use roboRIO 1 (team choice on avoiding v2 if at all possible because of rampant SD card issues reported across seasons)
  • We use PathPlanner, PhotonVision, rev, navx, and commands-v2
  • We have followed guidance to disable NI web server to gain mem space
  • We have updated toml config to use requires only and load just what we need per above
  • We have updated toml config to load the non-openBLAS numpy dev version per @virtuald post above
  • Current code with all the things above pushed to our main is here: https://github.com/frc2881/2025-Robot
  • We also only load / generate one auto command with PathPlanner using the chooser onchange handler pattern in pre-match setup (as opposed to loading all auto selections at init and/or generating one at start of auto which obviously creates delay). This pattern has helped keep our mem footprint down.

Data point: with all of config updates above ... we have been able to sync/deploy and run full match sessions to a "fresh" roboRIO (rebooted / power-cycled) with no memory allocation exceptions/crashes. So far, so good. However, when we hit about our 5th or so deploy with a code update during dev/testing, we hit the memory limit and the deploy crashes in midflight. We then have to simply reboot the roboRIO or power-cycle and deploy fresh and all is good again for the next ~5 run/deploy cycles. Not a showstopper of course as long as we can ultimately run a match on a clean boot up at our events. It seems that something is not being GC'd and/or leaking when we deploy each time and with our current footprint it takes about 5 times to hit the wall. Just a dev/test nuisance at this point. Hope this helps.

@virtuald
Copy link
Member Author

That's really interesting, there shouldn't be any residual memory usage across deploys. When deploying, there are outputs like "RoboRIO disk usage 238.2M/386.3M (62% full)" and "RoboRIO memory 203.6M/250.2M (19% full)"... are those values going up or staying the samee?

If it happens reliably, running robotpy installer sshcmd 'ps aux' will capture some memory related information. You can pipe it to a file, and maybe we can use that data to figure out which process is eating up more memory?

I feel like that sort of thing has been reported previously.

@roberttuttle
Copy link

Will try repro and grab both the exception details and the data points on disk/mem usage to share here this weekend. There will be lots of code updates and deploys for sure heading towards week 1 event.

@roberttuttle
Copy link

roberttuttle commented Feb 18, 2025

Following up for 2881 ... after all of the memory optimization updates, the team was able to work all weekend over many (100s) code deployments and not able to repro the memory space allocation fault observed earlier. Heading towards a week 1 event and no time to go back to methodically reverse each optimization and/or RobotPy build to trace the issue, but all seems good now.

Also, no noticeable/measurable performance issues with the custom numpy version (non OpenBLAS) ... as expected since real parallelism is not a thing in this environment. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants