Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store startOffset + length instead of startOffset + endOffset #872

Closed
eregon opened this issue Apr 18, 2023 · 7 comments
Closed

Store startOffset + length instead of startOffset + endOffset #872

eregon opened this issue Apr 18, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@eregon
Copy link
Member

eregon commented Apr 18, 2023

From #741 (comment)

Doing so is simply better for the serialized size, because length is always <= endOffset, and often much smaller.

I think it might make sense to change it everywhere, but we could also just change it for serialization.
@jemmaissroff WDYT?

@eregon
Copy link
Member Author

eregon commented Apr 18, 2023

Actually it seems much easier to only change it for serialization for now, as that has much less potential for conflicts, etc.
And performance-wise it's likely no big deal either way.

So it's more about consistency and what's more useful to track inside C structs.

eregon added a commit to eregon/yarp that referenced this issue Apr 18, 2023
eregon added a commit to eregon/yarp that referenced this issue Apr 24, 2023
eregon added a commit to eregon/yarp that referenced this issue May 19, 2023
eregon added a commit to eregon/yarp that referenced this issue May 20, 2023
@eregon
Copy link
Member Author

eregon commented May 22, 2023

Now that #873 is merged, I think we should change it for Java Nodes and Ruby Locations, because that avoids extra work like #873 (comment) and also some places need to recompute the length from end_offset - start_offset which feels inefficient.

I'm not sure about C nodes structs though, WDYT @jemmaissroff ?
That's probably a lot more involved to change, so if we do that, I think best to do it separately.

@eregon eregon self-assigned this May 22, 2023
@jemmaissroff
Copy link
Collaborator

I think we should prioritize keeping it consistent. I'm worried we'll make the changes for Java Nodes and Ruby Locations, and then add the C Nodes to the backlog without committing to doing it.

@eregon
Copy link
Member Author

eregon commented Jun 1, 2023

Actually there is an advantage to store it as (start, length) for C structs too because sizeof(char* end) = 8 but sizeof(uint32_t length) = 4.
Potentially we could also store the start as a uint32_t offset from parser->start instead of char* and that would again save 4 bytes.
Currently:

typedef struct {
  const char *start;
  const char *end;
} yp_location_t;

@kddnewton kddnewton added the enhancement New feature or request label Jun 5, 2023
@eregon
Copy link
Member Author

eregon commented Jul 3, 2023

@jemmaissroff Should we change it for C structs too? How do we do that without having huge conflicts?

@eregon
Copy link
Member Author

eregon commented Jul 3, 2023

I gave it a shot at main...eregon:yarp:c-struct-length, that compiles but many offsets are wrong, there must be some mistranslation.
Given the frequent usage of end while computing locations in yarp.c, I think it might be best to just keep it as-is, it's far more convenient.

@kddnewton
Copy link
Collaborator

This works in the serialization, but I think it would not be worth it in the actual library. I'm going to close this now that we have it in the serialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants