Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloading child_nodes in config.yml #1268

Closed
jemmaissroff opened this issue Aug 16, 2023 · 8 comments
Closed

Overloading child_nodes in config.yml #1268

jemmaissroff opened this issue Aug 16, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@jemmaissroff
Copy link
Collaborator

I’m looking at adding a flag to nodes, and realizing how much we’re overloading the term child_nodes in config.yml . We’re using child_nodes currently to mean params. A small, simple improvement would be renaming this. But overloading the term still leads to quite a bit of filtering and casing when actually generating the code.

I think it would be cleaner to separate out actual child nodes from other fields on a node (eg have fields on a node for child_nodes, locations, flags) to match how the templates are being used. We could define a params method which returns all of these when needed, but in most cases just use the filtered down nodes.

For example, something like this

  - name: WhileNode
    child_nodes:
      - name: keyword_loc
        type: location
      - name: predicate
        type: node
      - name: statements
        type: node?
        kind: StatementsNode
      - name: flags
        type: flags
        kind: LoopFlags

could become:

  - name: WhileNode
    locations:
      - name: keyword_loc
        type: location
    child_nodes:
      - name: predicate
        type: node
      - name: statements
        type: node?
        kind: StatementsNode
    flags:
      - name: flags
        type: flags
        kind: LoopFlags

depending on how we’d implement it, we could even get rid of the type on some of the less complicated fields.

@eregon
Copy link
Member

eregon commented Aug 16, 2023

What about string and integer fields?

Also if it's separated by category it wouldn't be possible to choose the order of fields belonging to different categories.
For instance the current ordering for CallNode seems nice as it's in order of things appearing in the source:
https://github.com/ruby/yarp/blob/8a0a7a1d28b82df26a614378b2007076033ff5d0/config.yml#L585-L607

I think renaming child_nodes to fields would be an easy fix here with most of the advantages.

Regarding iterating child params, there could be a convenience method for that, that's indeed done a lot manually right now.

@enebo
Copy link
Collaborator

enebo commented Aug 17, 2023

@jemmaissroff in JRuby for the current AST we differentiate between child nodes and fields in that we have a childNodes() which only returns the child nodes of the current node. Non-nodes are fields (and actually we also save the child nodes as fields) but childNodes() is necessary for implementing a proper visitor.

Ultimately though the only need to know child nodes from other fields (or types) is mostly to give the list of children to a visitor right? That must be done by looking for a type of node{,?,[]}. If we guaranteed child_nodes was only nodes I think that could allow type to be removed so long as cardinality specifier was still there somewhere. So:

- name: PostExecutionNode
    child_nodes:
      - name: statements
        type: node?
        kind: StatementsNode

could just become:

name: PostExecutionNode
    child_nodes:
      - name: statements
        kind: StatementsNode?

and so long as we are changing format we could go big (:smile:):

name: PostExecutionNode
    child_nodes:
      - child: statements: StatementsNode?

I guess this is maybe a good way to condense the size of config.yml.

The thing I don't like about the idea is the notion that the current child_node order represents the syntactical order of all the elements. This is mostly helpful in looking up nodes in this file as a reference. It is trivial from order to know which element is which on a node (although the names are pretty good too so this is maybe not that important).

@kddnewton kddnewton added the enhancement New feature or request label Aug 17, 2023
@kddnewton
Copy link
Collaborator

I think order is not as helpful as you might think. For example on IfNode, it's in the right order only if it's in the block form. If it's in the modifier it's in the wrong order. Similarly for CallNode, depending on the kind of call it could be in the wrong order. (-foo vs foo.-@ would change this.) I think ultimately it's more beneficial for us to put them in the order that makes sense for us reading it.

The other implication for ordering is the order in the struct. Ideally I think we should sort these based on the most efficient packing mechanism to make sure they don't cause any holes in the struct. I think this isn't an issue now, and certainly isn't something we need to think about for a while, but I figured I'd mention it.

Overall, I'm very 👍 on this idea. Anything that makes it more readable.

@enebo
Copy link
Collaborator

enebo commented Aug 17, 2023

@kddnewton ah yeah I just looked at how if works between regular and mod_if. I see what you mean. I could argue we could add even more nodes so then order did match syntax but I am not serious about that (although mod_if cannot have an else branch so we lose a byte serializing in that case by sharing the same node entry). I withdraw that reservation.

@kddnewton
Copy link
Collaborator

There are 2 things in this issue: renaming child_nodes and splitting up the fields. The first is done. I think we can leave the second one until we provide the option to drop certain fields from the API when we don't need them. (Sort of a "full" parse versus a "minimal" parse.) I think at that point we could split up the fields like:

  - name: CallNode
    fields:
      minimal:
        - name: receiver
          type: node?
        - name: arguments
          type: node?
          kind: ArgumentsNode
        - name: block
          type: node?
          kind: BlockNode
        - name: flags
          type: flags
          kind: CallNodeFlags
        - name: name
          type: string
      full:
        - name: call_operator_loc
          type: location?
        - name: message_loc
          type: location?
        - name: opening_loc
          type: location?
        - name: closing_loc
          type: location?

There might end up being some locations that we actually want to keep, so I think this delineation might be more useful. Closing for now, revisit later.

@enebo
Copy link
Collaborator

enebo commented Sep 7, 2023

@kddnewton two things for your last comment design:

  1. maybe use semantic for minimal and syntactic for full?
  2. All nodes currently still have offsets of the node itself and we still need it for line calcs but it would be nice to make lines for semantic and offsets for syntactic. line would be a big savings for compile serialization because it is only one number and a smaller one (I think there are one or two nodes which require endLine as well but that is just a detail).

@enebo
Copy link
Collaborator

enebo commented Sep 7, 2023

I think TR wants both offsets AND line so there likely is a wrinkle there.

@eregon
Copy link
Member

eregon commented Sep 12, 2023

I think it's enough to remove location/location? fields for serialization (#807).
We should be able to do it soon now, I think we have everything needed or very close to it.
If some location field turns out to be needed we can either expose it in another way (e.g. flag, other field), or until that's done have some extra YAML key to preserve it.
I think a flat list of fields is easier to handle for all consumers of this YAML config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants