Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize length instead of end offset #873

Merged

Conversation

eregon
Copy link
Member

@eregon eregon commented Apr 18, 2023

This currently fails the added assert:

..............................................................................................................................
.....ruby: src/serialize.c:28: serialize_location: Assertion `location->start <= location->end' failed.
rake aborted!
SignalException: SIGABRT

I found with gdb --args ruby -Ilib -Itest $f test/parse_test.rb:

(gdb) bt
#0  0x00007ffff76afe5c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff765fa76 in raise () from /lib64/libc.so.6
#2  0x00007ffff76497fc in abort () from /lib64/libc.so.6
#3  0x00007ffff764971b in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007ffff7658656 in __assert_fail () from /lib64/libc.so.6
#5  0x00007fffe57a2d5a in serialize_location (parser=0x7fffffffc520, location=0x859848, buffer=0x7fffffffc500) at src/serialize.c:28
#6  0x00007fffe57a2e3f in yp_serialize_node (parser=0x7fffffffc520, node=0x859840, buffer=0x7fffffffc500) at src/serialize.c:41
#7  0x00007fffe57a6d9b in yp_serialize_node (parser=0x7fffffffc520, node=0x7378e0, buffer=0x7fffffffc500) at src/serialize.c:1104
#8  0x00007fffe57a6513 in yp_serialize_node (parser=0x7fffffffc520, node=0x735060, buffer=0x7fffffffc500) at src/serialize.c:953
#9  0x00007fffe57da796 in yp_serialize (parser=0x7fffffffc520, node=0x735060, buffer=0x7fffffffc500) at src/yarp.c:11802
#10 0x00007ffff75c3277 in dump_source (filepath=<optimized out>, source=0x7fffffffc750) at ../../../../ext/yarp/extension.c:79
#11 0x00007ffff75c3a29 in dump (self=<optimized out>, string=<optimized out>, filepath=<optimized out>) at ../../../../ext/yarp/extension.c:100
...

(gdb) up
#1  0x00007ffff765fa76 in raise () from /lib64/libc.so.6
(gdb) 
#2  0x00007ffff76497fc in abort () from /lib64/libc.so.6
(gdb) 
#3  0x00007ffff764971b in __assert_fail_base.cold () from /lib64/libc.so.6
(gdb) 
#4  0x00007ffff7658656 in __assert_fail () from /lib64/libc.so.6
(gdb) 
#5  0x00007fffe57a2d5a in serialize_location (parser=0x7fffffffc520, location=0x859848, buffer=0x7fffffffc500) at src/serialize.c:28
28	  assert(location->start <= location->end);
(gdb) p location
$1 = (yp_location_t *) 0x859848
(gdb) p *location
$2 = {
  start = 0x7fffe593503e "foo[bar, baz] = 1, 2, 3\n\n[a: [:b, :c]]\n\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d,\n]\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d\n\n\n]\n\n[foo => bar]\n\nfoo[bar][baz] = qux\n\nfoo[bar][baz]\n\n[\n]\n\nfoo[bar, baz]\n\nfoo[bar, baz] = qux\n\nfoo[0], bar[0] "..., 
  end = 0x7fffe5935038 "[*a]\n\nfoo[bar, baz] = 1, 2, 3\n\n[a: [:b, :c]]\n\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d,\n]\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d\n\n\n]\n\n[foo => bar]\n\nfoo[bar][baz] = qux\n\nfoo[bar][baz]\n\n[\n]\n\nfoo[bar, baz]\n\nfoo[bar, baz] = qux\n\nfoo[0], b"...}
(gdb) p node
No symbol "node" in current context.
(gdb) up
#6  0x00007fffe57a2e3f in yp_serialize_node (parser=0x7fffffffc520, node=0x859840, buffer=0x7fffffffc500) at src/serialize.c:41
41	  serialize_location(parser, &node->location, buffer);
(gdb) p node
$3 = (yp_node_t *) 0x859840
(gdb) p *node
$4 = {type = YP_NODE_CALL_NODE, location = {
    start = 0x7fffe593503e "foo[bar, baz] = 1, 2, 3\n\n[a: [:b, :c]]\n\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d,\n]\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d\n\n\n]\n\n[foo => bar]\n\nfoo[bar][baz] = qux\n\nfoo[bar][baz]\n\n[\n]\n\nfoo[bar, baz]\n\nfoo[bar, baz] = qux\n\nfoo[0], bar[0] "..., 
    end = 0x7fffe5935038 "[*a]\n\nfoo[bar, baz] = 1, 2, 3\n\n[a: [:b, :c]]\n\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d,\n]\n\n\n[:a, :b,\n:c,1,\n\n\n\n:d\n\n\n]\n\n[foo => bar]\n\nfoo[bar][baz] = qux\n\nfoo[bar][baz]\n\n[\n]\n\nfoo[bar, baz]\n\nfoo[bar, baz] = qux\n\nfoo[0], b"...}}

i.e. that call node has a start > end.

@eregon
Copy link
Member Author

eregon commented Apr 18, 2023

It'd be great if someone can investigate why start > end for that call node and fix it, I'm not familiar with that part of the code.

@@ -113,7 +115,8 @@ public class Loader {
int type = buffer.get() & 0xFF;
int length = buffer.getInt();
int startOffset = buffer.getInt();
int endOffset = buffer.getInt();
int length = buffer.getInt();
int endOffset = startOffset + length;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would most likely change Java nodes/tokens/locations to store length instead of endOffset, but first we need to fix that failing assert. Also doing so will temporarily break the Loader CI check of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that separately in a later PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, #872 is about that

@haldun
Copy link
Collaborator

haldun commented Apr 18, 2023

@eregon I think there is a bug in calculating the location of the call nodes. For this code:

foo[bar, baz] = 1, 2, 3

we end up with

allNode(0...0)(
  CallNode(0...3)(nil, nil, IDENTIFIER(0...3)("foo"), nil, nil, nil, nil, "foo"),
  nil,
  BRACKET_LEFT_RIGHT_EQUAL(3...4)("["),
  BRACKET_LEFT(3...4)("["),
  ArgumentsNode(4...0)(
    [CallNode(4...7)(nil, nil, IDENTIFIER(4...7)("bar"), nil, nil, nil, nil, "bar"),
     CallNode(9...12)(nil, nil, IDENTIFIER(9...12)("baz"), nil, nil, nil, nil, "baz"),
     ArrayNode(0...0)([IntegerNode(16...17)(), IntegerNode(19...20)(), IntegerNode(22...23)()], nil, nil)]
  ),
  BRACKET_RIGHT(12...13)("]"),
  nil,
  "[]="
)

where it is obvious that the inner ArgumentsNode has incorrect value for the location. The reason seems to be the inner ArrayNode where we don't have any opening and closing nodes hence its location is missing.

@haldun
Copy link
Collaborator

haldun commented Apr 18, 2023

I hope the issue with the locations will be fixed at least for the array nodes due to #877

@eregon
Copy link
Member Author

eregon commented Apr 24, 2023

With #879 it goes further, thanks!
We still have at least one other case of start < end:

(gdb) p *node
$4 = {type = YP_NODE_ARGUMENTS_NODE, location = {start = 0xde9f3e "\r\n    baz\r\n  EOF\r\n", 
    end = 0xde9f3b ")\r\n\r\n    baz\r\n  EOF\r\n"}}

@eregon eregon force-pushed the serialize-length-instead-of-end-offset branch from 3f91023 to efd78ef Compare April 24, 2023 18:12
@eregon
Copy link
Member Author

eregon commented Apr 24, 2023

Let's track the remaining failures of start <= end with #891 and #902, the latter gives much more info when the check fails.

@haldun
Copy link
Collaborator

haldun commented Apr 25, 2023

TBH I am not sure what should the location of the call node be in this case:

<<~EOF.chop
    baz
  EOF

@eregon
Copy link
Member Author

eregon commented May 1, 2023

I would say it needs to cover at least .chop.
For the receiver I think it's a question of how we treat a heredoc in general for location (I don't know).
I think it would make sense to only consider the opening tag of the heredoc as the expression, the rest is somewhat similar to extra text in a comment or so.
So I'd say it should be <<~EOF.chop.

I'm not against covering the entire heredoc (the whole code here) for the call node location, it just seems more complicated.

@enebo
Copy link
Collaborator

enebo commented May 1, 2023

If you are using this as a library to replace syntax then I think it needs to be union of receiver, method, args. I am not making up a compelling example but if you wanted to eliminate this call then how could you do it? If call was not the union of locations then you would have to process all calls by ignoring the calls offsets and calculate the unions of them by studying the parts.

The other side of the same coin is what would you use the call offsets for when sometimes it does not capture the receiver offsets?

@eregon
Copy link
Member Author

eregon commented May 2, 2023

I was wondering what the parser gem does and here it is:

$ ruby -rparser/current -e 'pp Parser::CurrentRuby.parse(File.read(ARGV.first)).loc' -- heredoc_call.rb
#<Parser::Source::Map::Send:0x00007fddfd1095b8
 @begin=nil,
 @dot=#<Parser::Source::Range (string) 6...7>,
 @end=nil,
 @expression=#<Parser::Source::Range (string) 0...11>,
 @node=s(:send,
  s(:str, "baz\n"), :chop),
 @selector=#<Parser::Source::Range (string) 7...11>>

$ ruby -rparser/current -e 'pp Parser::CurrentRuby.parse(File.read(ARGV.first)).loc.expression.source' -- heredoc_call.rb
"<<~EOF.chop"

So the location of the call node for the parser gem is <<~EOF.chop.


Yeah, that's a good point, maybe it should indeed cover everything.
Logic-wise I guess that would basically mean it needs a { min(...), max(...) } location then as the receiver (or any argument) can go past the last argument and optional closing paren.

@eregon eregon force-pushed the serialize-length-instead-of-end-offset branch from efd78ef to 37cc284 Compare May 19, 2023 15:15
@eregon
Copy link
Member Author

eregon commented May 19, 2023

I'd like to measure the total serialized size for the top 100 gems, but that's blocked by one more case of start<end offset: #891 (comment)

@eregon eregon marked this pull request as ready for review May 20, 2023 17:30
@eregon eregon requested a review from jemmaissroff May 20, 2023 17:30
@eregon
Copy link
Member Author

eregon commented May 20, 2023

This is now ready to be merged.

@@ -113,7 +115,8 @@ public class Loader {
int type = buffer.get() & 0xFF;
int length = buffer.getInt();
int startOffset = buffer.getInt();
int endOffset = buffer.getInt();
int length = buffer.getInt();
int endOffset = startOffset + length;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an issue for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants