Improve memory efficiency of seen cache #1073

arnetheduck · 2024-03-19T15:15:57Z

The seen cache currently is a significant memory usage hotspot due to its inefficient implementation: for every entry, two copies of the message id + timing data + seq overhead causes it to use much more memory than it has to.

In addition, each check involves several layers of allocations as the computed message id gets salted.

This PR improves on the situation by:

using a hash of the message id with the salt instead of joining strings
computing the salted id only once per message
storing one digest instead of two message id:s

The `seen` cache currently is a significant memory usage hotspot due to its inefficient implementation: for every entry, two copies of the message id + timing data + `seq` overhead causes it to use much more memory than it has to. In addition, each check involves several layers of allocations as the computed message id gets salted. This PR improves on the situation by: * using a hash of the message id with the salt instead of joining strings * computing the salted id only once per message * storing one digest instead of two message id:s

codecov-commenter · 2024-03-19T15:40:04Z

Codecov Report

Attention: Patch coverage is 88.70968% with 7 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (master@2b53196). Click here to learn what that means.

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #1073   +/-   ##
=========================================
  Coverage          ?   84.53%           
=========================================
  Files             ?       91           
  Lines             ?    15517           
  Branches          ?        0           
=========================================
  Hits              ?    13118           
  Misses            ?     2399           
  Partials          ?        0

Files	Coverage Δ
libp2p/protocols/pubsub/floodsub.nim	`88.18% <100.00%> (ø)`
libp2p/protocols/pubsub/gossipsub.nim	`86.46% <100.00%> (ø)`
libp2p/protocols/pubsub/gossipsub/behavior.nim	`88.83% <100.00%> (ø)`
libp2p/protocols/pubsub/rpc/messages.nim	`52.85% <ø> (ø)`
libp2p/protocols/pubsub/timedcache.nim	`82.89% <78.78%> (ø)`

arnetheduck · 2024-03-20T10:49:27Z

On holesky, this PR reduces memory usage of the seen cache by ~100mb

libp2p/protocols/pubsub/floodsub.nim

libp2p/protocols/pubsub/gossipsub.nim

libp2p/protocols/pubsub/floodsub.nim

tests/pubsub/testtimedcache.nim

libp2p/protocols/pubsub/timedcache.nim

diegomrsantos · 2024-04-09T10:21:35Z

libp2p/protocols/pubsub/timedcache.nim

-    addedAt = previous.addedAt
+  let
+    previous = t.del(k) # Refresh existing item
+    addedAt = if previous.isSome():


We had a long PR in the past to remove this pattern from the codebase and decrease the risk of raising defects. You can use https://github.com/vacp2p/nim-libp2p/blob/unstable/libp2p/utility.nim#L125

valueOr is not applicable in this case because we're accessing a field of previous[], not previous itself

True, but withValue can be used in this case.

doesn't work in generic code, due to similar problems as arnetheduck/nim-results#34

this seems to work fine:

addedAt = block: previous.withValue(p): p[].addedAt else: now

diegomrsantos

LGTM

arnetheduck mentioned this pull request Mar 19, 2024

extend seen ttl to cover 2 epochs status-im/nimbus-eth2#6098

Merged

arnetheduck added 2 commits March 19, 2024 23:09

fix del, hash

4254ced

a few more tests

3c74db5

Merge remote-tracking branch 'origin/unstable' into message-id-mem

fd9f9ab

diegomrsantos reviewed Apr 3, 2024

View reviewed changes

libp2p/protocols/pubsub/floodsub.nim Outdated Show resolved Hide resolved

diegomrsantos reviewed Apr 3, 2024

View reviewed changes

libp2p/protocols/pubsub/gossipsub.nim Show resolved Hide resolved

diegomrsantos reviewed Apr 3, 2024

View reviewed changes

libp2p/protocols/pubsub/floodsub.nim Outdated Show resolved Hide resolved

document salting strategy

513ee0b

diegomrsantos reviewed Apr 8, 2024

View reviewed changes

tests/pubsub/testtimedcache.nim Outdated Show resolved Hide resolved

diegomrsantos reviewed Apr 9, 2024

View reviewed changes

libp2p/protocols/pubsub/timedcache.nim Show resolved Hide resolved

diegomrsantos reviewed Apr 9, 2024

View reviewed changes

arnetheduck added 2 commits May 1, 2024 12:55

Merge branch 'master' into message-id-mem

d425eb2

test name

6c42af5

diegomrsantos approved these changes May 1, 2024

View reviewed changes

arnetheduck merged commit 02c96fc into master May 1, 2024
9 checks passed

arnetheduck deleted the message-id-mem branch May 1, 2024 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory efficiency of seen cache #1073

Improve memory efficiency of seen cache #1073

arnetheduck commented Mar 19, 2024

codecov-commenter commented Mar 19, 2024 •

edited

Loading

arnetheduck commented Mar 20, 2024

diegomrsantos Apr 9, 2024 •

edited

Loading

arnetheduck May 1, 2024

diegomrsantos May 1, 2024 •

edited

Loading

arnetheduck May 1, 2024

diegomrsantos May 14, 2024

diegomrsantos left a comment

Improve memory efficiency of seen cache #1073

Improve memory efficiency of seen cache #1073

Conversation

arnetheduck commented Mar 19, 2024

codecov-commenter commented Mar 19, 2024 • edited Loading

Codecov Report

arnetheduck commented Mar 20, 2024

diegomrsantos Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

arnetheduck May 1, 2024

Choose a reason for hiding this comment

diegomrsantos May 1, 2024 • edited Loading

Choose a reason for hiding this comment

arnetheduck May 1, 2024

Choose a reason for hiding this comment

diegomrsantos May 14, 2024

Choose a reason for hiding this comment

diegomrsantos left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 19, 2024 •

edited

Loading

diegomrsantos Apr 9, 2024 •

edited

Loading

diegomrsantos May 1, 2024 •

edited

Loading