-
Notifications
You must be signed in to change notification settings - Fork 355
Add Formatter #2830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: canary
Are you sure you want to change the base?
Add Formatter #2830
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
✅ Latest build complete • View logs |
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
ae6e6a7 to
0d4b081
Compare
CodSpeed Performance ReportMerging #2830 will not alter performanceComparing Summary
Footnotes
|
| while current_pos < start { | ||
| let token = self.root.token_at_offset(current_pos).right_biased(); | ||
|
|
||
| if let Some(token) = token { | ||
| // check if token is within our target range and fix trivia if necessary | ||
| if token.text_range().start() < start { | ||
| match token.kind() { | ||
| SyntaxKind::NEWLINE => on_same_line = false, | ||
| SyntaxKind::LINE_COMMENT | SyntaxKind::BLOCK_COMMENT => { | ||
| if !on_same_line { | ||
| self.push_text(format!("\n{}", self.gen_indent())); | ||
| } else { | ||
| self.push_text(" ".to_string()); | ||
| } | ||
|
|
||
| self.push_text(token.text().to_string()); | ||
| } | ||
| _ => (), // throw away all other tokens | ||
| } | ||
| current_pos = token.text_range().end(); | ||
| } else { | ||
| break; | ||
| } | ||
| } else { | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly suggest rewriting this using let-else early exits to avoid so much code nesting:
while current_pos < start {
let token = self.root.token_at_offset(current_pos).right_biased();
let Some(token) = token else {
break;
};
// check if token is within our target range and fix trivia if necessary
if token.text_range().start() >= start {
break;
}
match token.kind() {
SyntaxKind::NEWLINE => on_same_line = false,
SyntaxKind::LINE_COMMENT | SyntaxKind::BLOCK_COMMENT => {
if !on_same_line {
self.push_text(format!("\n{}", self.gen_indent()));
} else {
self.push_text(" ".to_string());
}
self.push_text(token.text().to_string());
}
_ => {} // throw away all other tokens
}
current_pos = token.text_range().end();
}It's a nit but flat code is much easier to follow than nested.
| while parser.current < parser.tokens.len() { | ||
| let token = &parser.tokens[parser.current]; | ||
| let kind = token_kind_to_syntax_kind(token.kind); | ||
| parser.events.push(Event::Token { | ||
| kind, | ||
| text: token.text.clone(), | ||
| }); | ||
| parser.current += 1; | ||
| // let token = &parser.tokens[parser.current]; | ||
| // let kind = token_kind_to_syntax_kind(token.kind); | ||
| // parser.events.push(Event::Token { | ||
| // kind, | ||
| // text: token.text.clone(), | ||
| // }); | ||
| // parser.current += 1; | ||
| parser.bump(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a temporary comment or code here was broken? bump() impl seems to add the Event so I suppose this piece was wrong. We can remove the commented code if so.
591c1f2 to
f5aa084
Compare
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
|
🐑 BEPs Preview Ready Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/ Commit: |
| @skip | ||
| // Stream done attribute | ||
| summary string | ||
| @stream.done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do attributes always go to the next line? I think for cases where you have a few attributes it's better to put them in the same line as the field, cases where you have many attributes and one line becomes unreadable then jumping to the next one does help. Maybe it could even be indented:
class Example {
f1 string @one_attr
f2 string
@attr_1
@attr_2
@attr_3
@attr_4
}Hard case is with trailing comments:
// Original
class Example {
f2 string @attr_1 @attr_2 @attr_3 // Trailing comment| self.format_node( | ||
| children, | ||
| SyntaxKind::TYPE_EXPR, | ||
| false, | ||
| false, | ||
| Self::format_type_expr, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit here, these types of functions that take 4+ parameters are kinda hard to read when literal booleans are passed in, LSP can help with inlay hints for param names but otherwise reading false, false doesn't make it clear what is false. Naming the parameters would make it clear at every call site and also helps LLM codegen these calls:
self.format_node(FormatArgs {
children,
syntax_kind: SyntaxKind::TYPE_EXPR,
prepend_newline: false,
should_collect_preceding_trivia: false,
f: Self::format_type_expr
});You could also use defaults:
self.format_node(FormatArgs {
children,
syntax_kind: SyntaxKind::TYPE_EXPR,
f: Self::format_type_expr,
..FormatArgs::default()
});But being explicit makes it more clear.
Not important though, we merge as is and we may refactor it later.
| if should_collect_preceding_trivia { | ||
| f.format_missing(token.text_range().start()); | ||
| } | ||
| f.push_text("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one appears multiple times, and since line breaks can be different in each OS / encoding, I'd make it a function just in case: f.push_line_break().
Same applies for f.push_white_space().
| === ORIGINAL === | ||
| function Foo() -> int { | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| x + 1;;;;;; | ||
| } | ||
|
|
||
| === FORMATTED === | ||
| function Foo() -> int { | ||
| x + 1; | ||
| x + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break on C style for loops with no expressions? They're usually used for infinite loops:
for (;;) {
// Infinite loop
}| // get rid of duplicate semicolons | ||
| SyntaxKind::SEMICOLON => { | ||
| if !already_pushed_semicolon { | ||
| self.push_text(";"); | ||
| already_pushed_semicolon = true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to special case infinite for-loops? for (;;) {}
Adds a BEP for a BAML formatter, as well as a prototype CST-based formatter implementation with LSP integration.
The code for the formatter is currently contained in just a single file, and is a bit verbose in some places. It could definitely be simplified with some further abstractions. One possibility for simplifying this could be creating known-valid AST/CST wrappers for SyntaxNodes that are only constructed if the code parses without errors. With this it would be far easier to get specific tokens from CST nodes, and could possibly provide an interface for future code transformation tooling (see: rust-clippy).
Known Issues/Limitations
Some of these are due to limitations of/possible issues with the current parser, such as:
How it works
At a basic level this traverses the CST top-down, calling a specific format function for each type of SyntaxNode. Each node-specific formatting function generally iterates over its child nodes and tokens, selectively parsing semantics and outputting equivalent code.
All formatted code is pushed to the output alongside its span with
push_format. The end of the last span that was formatted is also stored internally.push_formatfirst searches for unprocessed trivia in the range between the end of the last formatted span by callingformat_missing, which has logic for reattaching comments and other trivia. Once this preceding trivia is processed, it then will add the formatted span to the final output. This approach ensures that we do not lose any comments.format_node, andformat_tokenare helper functions that automatically iterate over a provided children iterator, consuming until it reaches a specified type of SyntaxNode or SyntaxToken. Once it reaches this specified type, it will call a provided format function.Sometimes it's not possible to use these helper functions because of various extra context that may be needed for formatting. In this case, the children are manually iterated over for formatting.