Add Formatter #2830

codeshaunted · 2025-12-17T21:30:26Z

Adds a BEP for a BAML formatter, as well as a prototype CST-based formatter implementation with LSP integration.

The code for the formatter is currently contained in just a single file, and is a bit verbose in some places. It could definitely be simplified with some further abstractions. One possibility for simplifying this could be creating known-valid AST/CST wrappers for SyntaxNodes that are only constructed if the code parses without errors. With this it would be far easier to get specific tokens from CST nodes, and could possibly provide an interface for future code transformation tooling (see: rust-clippy).

Known Issues/Limitations

Comments that are on their own lines are not attached properly in some cases
User-intended whitespace is ignored currently
For-loop initializer, condition, and update are printed verbatim
Minor spacing issues on config blocks
No current implementation for any sort of line wrapping
Comments are not allowed in some places where we may want to respect comments in the future
No current tests for transforming specific constructs, tests are limited in scope
Does not handle dedenting on raw strings

Some of these are due to limitations of/possible issues with the current parser, such as:

expressions are not properly parsed for config values, they are just parsed as tokens verbatim
string literals are not allowed as keys in config blocks
void functions are not allowed

How it works

At a basic level this traverses the CST top-down, calling a specific format function for each type of SyntaxNode. Each node-specific formatting function generally iterates over its child nodes and tokens, selectively parsing semantics and outputting equivalent code.

All formatted code is pushed to the output alongside its span with push_format. The end of the last span that was formatted is also stored internally. push_format first searches for unprocessed trivia in the range between the end of the last formatted span by calling format_missing, which has logic for reattaching comments and other trivia. Once this preceding trivia is processed, it then will add the formatted span to the final output. This approach ensures that we do not lose any comments.

format_node, and format_token are helper functions that automatically iterate over a provided children iterator, consuming until it reaches a specified type of SyntaxNode or SyntaxToken. Once it reaches this specified type, it will call a provided format function.

Sometimes it's not possible to use these helper functions because of various extra context that may be needed for formatting. In this case, the children are manually iterated over for formatting.

vercel · 2025-12-17T21:30:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
promptfiddle	Ready	Preview, Comment	Dec 22, 2025 9:27pm

github-actions · 2025-12-17T21:34:00Z

✅ Latest build complete • View logs

github-actions · 2025-12-17T21:34:32Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: 4e71b9417b911e2ebfbe36702b4e9e3b6a5a8fb9 • Workflow run}

github-actions · 2025-12-17T21:41:12Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: f81691f2100f455a177a060463ed954722b370ac • Workflow run}

github-actions · 2025-12-17T23:41:44Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: c5e54a20e9b18ae2152c381e2d8d2e13fb193222 • Workflow run}

codspeed-hq · 2025-12-19T20:08:43Z

CodSpeed Performance Report

Merging #2830 will not alter performance

_{Comparing formatter (89f66af) with canary (1f09944)}

Summary

✅ 15 untouched
⏩ 14 skipped¹

14 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

antoniosarosi · 2025-12-20T01:01:03Z

baml_language/crates/baml_fmt/src/lib.rs

+            while current_pos < start {
+                let token = self.root.token_at_offset(current_pos).right_biased();
+
+                if let Some(token) = token {
+                    // check if token is within our target range and fix trivia if necessary
+                    if token.text_range().start() < start {
+                        match token.kind() {
+                            SyntaxKind::NEWLINE => on_same_line = false,
+                            SyntaxKind::LINE_COMMENT | SyntaxKind::BLOCK_COMMENT => {
+                                if !on_same_line {
+                                    self.push_text(format!("\n{}", self.gen_indent()));
+                                } else {
+                                    self.push_text(" ".to_string());
+                                }
+
+                                self.push_text(token.text().to_string());
+                            }
+                            _ => (), // throw away all other tokens
+                        }
+                        current_pos = token.text_range().end();
+                    } else {
+                        break;
+                    }
+                } else {
+                    break;
+                }
+            }


I highly suggest rewriting this using let-else early exits to avoid so much code nesting:

while current_pos < start { let token = self.root.token_at_offset(current_pos).right_biased(); let Some(token) = token else { break; }; // check if token is within our target range and fix trivia if necessary if token.text_range().start() >= start { break; } match token.kind() { SyntaxKind::NEWLINE => on_same_line = false, SyntaxKind::LINE_COMMENT | SyntaxKind::BLOCK_COMMENT => { if !on_same_line { self.push_text(format!("\n{}", self.gen_indent())); } else { self.push_text(" ".to_string()); } self.push_text(token.text().to_string()); } _ => {} // throw away all other tokens } current_pos = token.text_range().end(); }

It's a nit but flat code is much easier to follow than nested.

antoniosarosi · 2025-12-20T01:08:12Z

baml_language/crates/baml_parser/src/parser.rs

    while parser.current < parser.tokens.len() {
-        let token = &parser.tokens[parser.current];
-        let kind = token_kind_to_syntax_kind(token.kind);
-        parser.events.push(Event::Token {
-            kind,
-            text: token.text.clone(),
-        });
-        parser.current += 1;
+        // let token = &parser.tokens[parser.current];
+        // let kind = token_kind_to_syntax_kind(token.kind);
+        // parser.events.push(Event::Token {
+        //     kind,
+        //     text: token.text.clone(),
+        // });
+        // parser.current += 1;
+        parser.bump();
    }


Is this a temporary comment or code here was broken? bump() impl seems to add the Event so I suppose this piece was wrong. We can remove the commented code if so.

github-actions · 2025-12-22T20:50:21Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: 3c583bf2c09ce2b3532635f8dc03a94b1fa5e0f3 • Workflow run}

github-actions · 2025-12-22T20:56:20Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: 0e36893c3a62e17c24d44df66c4d9094d14c36d4 • Workflow run}

github-actions · 2025-12-22T21:07:18Z

🐑 BEPs Preview Ready

Preview URL: https://dj7ggjkp4tlhz.cloudfront.net/formatter/

_{Commit: 1c463eb17bfc45013c88eb11a97d66f1269b9b7d • Workflow run}

antoniosarosi · 2025-12-22T20:31:35Z

...rates/baml_tests/snapshots/basic_types/baml_tests__basic_types__06_5_formatter__classes.snap

+  @skip
+  // Stream done attribute
+  summary string
+  @stream.done


Do attributes always go to the next line? I think for cases where you have a few attributes it's better to put them in the same line as the field, cases where you have many attributes and one line becomes unreadable then jumping to the next one does help. Maybe it could even be indented:

class Example { f1 string @one_attr f2 string @attr_1 @attr_2 @attr_3 @attr_4 }

Hard case is with trailing comments:

// Original class Example { f2 string @attr_1 @attr_2 @attr_3 // Trailing comment

antoniosarosi · 2025-12-23T14:33:45Z

baml_language/crates/baml_format/src/lib.rs

+        self.format_node(
+            children,
+            SyntaxKind::TYPE_EXPR,
+            false,
+            false,
+            Self::format_type_expr,
+        );


Small nit here, these types of functions that take 4+ parameters are kinda hard to read when literal booleans are passed in, LSP can help with inlay hints for param names but otherwise reading false, false doesn't make it clear what is false. Naming the parameters would make it clear at every call site and also helps LLM codegen these calls:

self.format_node(FormatArgs { children, syntax_kind: SyntaxKind::TYPE_EXPR, prepend_newline: false, should_collect_preceding_trivia: false, f: Self::format_type_expr });

You could also use defaults:

self.format_node(FormatArgs { children, syntax_kind: SyntaxKind::TYPE_EXPR, f: Self::format_type_expr, ..FormatArgs::default() });

But being explicit makes it more clear.

Not important though, we merge as is and we may refactor it later.

antoniosarosi · 2025-12-23T14:36:36Z

baml_language/crates/baml_format/src/lib.rs

+                if should_collect_preceding_trivia {
+                    f.format_missing(token.text_range().start());
+                }
+                f.push_text("\n");


This one appears multiple times, and since line breaks can be different in each OS / encoding, I'd make it a function just in case: f.push_line_break().

Same applies for f.push_white_space().

antoniosarosi · 2025-12-23T14:42:42Z

.../baml_tests/snapshots/formatter/baml_tests__formatter__06_5_formatter__extra_semicolons.snap

+=== ORIGINAL ===
+function Foo() -> int {
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+  x + 1;;;;;;
+}
+
+=== FORMATTED ===
+function Foo() -> int {
+  x + 1;
+  x + 1;


Does this break on C style for loops with no expressions? They're usually used for infinite loops:

for (;;) { // Infinite loop }

antoniosarosi · 2025-12-23T14:47:52Z

baml_language/crates/baml_format/src/lib.rs

+                    // get rid of duplicate semicolons
+                    SyntaxKind::SEMICOLON => {
+                        if !already_pushed_semicolon {
+                            self.push_text(";");
+                            already_pushed_semicolon = true;
+                        }
+                    }


Does this need to special case infinite for-loops? for (;;) {}

vercel bot deployed to Preview December 17, 2025 21:56 View deployment

vercel bot deployed to Preview December 17, 2025 23:57 View deployment

vercel bot deployed to Preview December 19, 2025 18:24 View deployment

codeshaunted changed the title ~~Add Formatter BEP~~ Add Formatter Dec 19, 2025

vercel bot deployed to Preview December 19, 2025 18:53 View deployment

codeshaunted force-pushed the formatter branch from ae6e6a7 to 0d4b081 Compare December 19, 2025 20:02

vercel bot deployed to Preview December 19, 2025 20:18 View deployment

antoniosarosi reviewed Dec 20, 2025

View reviewed changes

vercel bot deployed to Preview December 20, 2025 02:13 View deployment

vercel bot deployed to Preview December 20, 2025 02:50 View deployment

vercel bot deployed to Preview December 20, 2025 19:37 View deployment

vercel bot deployed to Preview December 20, 2025 19:58 View deployment

codeshaunted force-pushed the formatter branch from 591c1f2 to f5aa084 Compare December 20, 2025 21:14

vercel bot deployed to Preview December 20, 2025 21:39 View deployment

Avery Townsend added 10 commits December 20, 2025 15:07

add formatter bep

2fcbc19

fix bullet points

ffec20e

add context and change suggestion

0a21a32

add broken fmt impl for enums

beb0f11

fixed basic enum fmt with nicer newline handling

f4d5c56

add stub class fmt impl and prototype enum impl

3cd85c0

add basic class support

b89b700

minor cleanup = more robust type printing

dabfade

add more type cases

ba0c9f5

remove push_format_indent, replace with indent param

fb9d95f

vercel bot deployed to Preview December 21, 2025 18:52 View deployment

Avery Townsend added 3 commits December 21, 2025 11:29

fix enum and class defs without block attrs

8911ce0

fix some spacing

cfd1423

rename to baml_format

d0bf036

vercel bot deployed to Preview December 21, 2025 20:01 View deployment

Avery Townsend added 4 commits December 21, 2025 12:21

various fixes for edge case syntax

49be55b

fixes for parameters

88a2d10

add formatter to snapshot test generation

f6faf44

fix attribute with path formatting

1acde20

vercel bot deployed to Preview December 21, 2025 21:04 View deployment

Avery Townsend added 2 commits December 21, 2025 13:09

fix comment attachment

1becb33

update snapshots

28c9aa1

vercel bot deployed to Preview December 21, 2025 21:29 View deployment

Avery Townsend added 10 commits December 21, 2025 15:51

fix hanging line comments

4d584a3

enable formatting in the lsp

6ec2939

updates to comment removal

ab5985c

update snapshots

f799ad7

update BEP

6ed78ed

add some doc strings

d7f5f19

add extra snapshot tests

6252098

clippy pass

e8cccb2

new snapshot test

d0490f8

update bep readme

3c583bf

update BEP

0e36893

add cargo lock

1c463eb

vercel bot deployed to Preview December 22, 2025 21:27 View deployment

antoniosarosi reviewed Dec 23, 2025

View reviewed changes

Add Formatter #2830

Are you sure you want to change the base?

Add Formatter #2830

Uh oh!

Conversation

codeshaunted commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Known Issues/Limitations

How it works

Uh oh!

vercel bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

codspeed-hq bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #2830 will not alter performance

Summary

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codeshaunted commented Dec 17, 2025 •

edited

Loading

vercel bot commented Dec 17, 2025 •

edited

Loading

github-actions bot commented Dec 17, 2025 •

edited

Loading

codspeed-hq bot commented Dec 19, 2025 •

edited

Loading