Parser options for either name_locs or names #1407

jemmaissroff · 2023-09-05T20:44:39Z

Some nodes (eg ClassVariableAndWriteNode) have both a name field and a name_loc field. These are because it's helpful for some consumers (eg linters) to have the location, while it's helpful for other consumers (eg compilers) to have the name itself. Consumers need one, not both.

We could eventually expose a way for consumers to specify which field type they need. This would reduce overall memory impact of the parse tree by not storing all of these fields twice.

This isn't necessary for V1, and is most helpful if we decide to reduce memory since it'll introduce some otherwise unnecessary complexity.

The text was updated successfully, but these errors were encountered:

enebo · 2023-09-05T21:01:03Z

@jemmaissroff I had suggested the notion of a profile between the two use-cases back when we discussed how syntax tools want offsets but compilers want line number. The linting use case wants original source in memory and the compiler doesn't. There are a number of different preferences between the two use-cases. It feels like two tools which overlap 80% trying to consume the same format.

eregon · 2023-09-06T10:01:31Z

Related to #807, which aims to remove all location fields for serialization and Java nodes (but keep start+length on the node itself, I consider that a different concern and TruffleRuby needs those anyway).

name_loc is just one of the many location fields, none of them should be needed for compilation/execution.

Could maybe be useful to do the same for C structs too but that seems far more involved as e.g. yarp.c would then need to check some macro or so for whether to assign those location fields. OTOH that might speedup the parser a bit too. If done at templating time, then that specific parser (yarp.c) would never be able to give locations and so wouldn't be usable for the Ruby API (would need to generate 2 files from yarp.c, one tracking location fields and one not, by setting defining macro differently).
If it's if (set_location_fields) conditions then it would work for either case but that wouldn't reduce the size of the node structs at all then (so no real value) and could have some overhead.

A more immediate gain for shrinking C structs could be to change yp_location_t to be uint32_t start, length; (8 bytes) instead of 2 const char* (16 bytes). That requires extensive changes in yarp.c so I'm not volunteering but it's probably worth it footprint-wise.

kddnewton · 2023-12-07T16:14:14Z

Unfortunately a tool like rubocop needs both (I found out the hard way through https://github.com/kddnewton/parser-prism. I think since we're now not serialization the locations for JRuby/TruffleRuby and we could instead shrink the location struct, I think I'd like to close this one as not planned.

kddnewton added the enhancement New feature or request label Sep 6, 2023

eregon mentioned this issue Sep 20, 2023

consider using 32-bit offsets for start and end in yp_location_t #1566

Open

kddnewton closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser options for either name_locs or names #1407

Parser options for either name_locs or names #1407

jemmaissroff commented Sep 5, 2023

enebo commented Sep 5, 2023

eregon commented Sep 6, 2023 •

edited

Loading

kddnewton commented Dec 7, 2023

Parser options for either name_locs or names #1407

Parser options for either name_locs or names #1407

Comments

jemmaissroff commented Sep 5, 2023

enebo commented Sep 5, 2023

eregon commented Sep 6, 2023 • edited Loading

kddnewton commented Dec 7, 2023

eregon commented Sep 6, 2023 •

edited

Loading