perf: move html entity decoding to server side by danielroe · Pull Request #1561 · npmx-dev/npmx.dev

danielroe · 2026-02-22T08:31:50Z

follow on from #1550 and #591 - we should be doing this on server-side

vercel · 2026-02-22T08:31:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
npmx.dev	Ready	Preview, Comment	Feb 22, 2026 9:26am

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
docs.npmx.dev	Ignored	Preview	Feb 22, 2026 9:26am
npmx-lunaria	Ignored		Feb 22, 2026 9:26am

codecov · 2026-02-22T08:33:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

server/utils/readme.ts

coderabbitai · 2026-02-22T08:39:42Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

HTML entity decoding was moved out of app/utils/formatters.ts into a new shared/utils/html.ts (adds decodeHtmlEntities) and removed from the former module. Components app/components/Package/Playgrounds.vue and app/components/ReadmeTocDropdown.vue stopped decoding entity-encoded labels and now render text as-is. server/utils/readme.ts gained robust HTML-tag stripping and now uses the new decoder for TOC items and playground labels. app/composables/useMarkdown.ts updated its import path. The readme API cache key was bumped from v8 to v9. Tests were added/moved for the new decoder.

Possibly related PRs

fix: decode HTML entities in TOC dropdown labels #1550: Modifies TOC label decoding in ReadmeTocDropdown.vue, directly overlapping component-level decoding logic.
test: add formatters utils tests #892: Touches decodeHtmlEntities usage and tests in utils, overlapping the relocation/removal of the decoder.
perf: load markdown source code on copy instead of on main request #1386: Changes server/utils/readme.ts readme processing/response shape, intersecting server-side readme rendering changes made here.

🚥 Pre-merge checks | ✅ 1

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The pull request description clearly relates to the changeset, referencing prior PRs and explaining the objective to move HTML entity decoding to server-side for performance improvement.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/entity-server

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/utils/readme.ts (1)
376-383: ⚠️ Potential issue | 🟠 Major

Decoded plainText inserted into heading HTML template causes content loss for headings containing < or >.

The order of operations at line 377 is:

Strip HTML tags: text.replace(/<[^>]*>/g, '')

Decode entities: decodeHtmlEntities(...) — turns < → <, & → &

The decoded plainText (which may now contain raw < and &) is then string-interpolated directly into the heading HTML at line 383:
return `<h${semanticLevel} ...><a href="#${uniqueSlug}">${plainText}</a></h${semanticLevel}>\n`
sanitize-html's underlying htmlparser2 is fairly tolerant, but when a bare < appears in text content it begins tag parsing — confirmed reports show the string is truncated to the text before the < symbol, so the remainder of the text is discarded. For example, a heading ## Types <T> would lose T> in the final rendered README HTML.

Before this PR the entity-encoded < was inserted into the template (valid HTML); the client decoded it for display. The decodeHtmlEntities call should only apply to the TOC entry, not to the value embedded in the HTML template.

The CodeQL "Incomplete multi-character sanitization" warning at line 377 tracks exactly this dataflow.
🐛 Proposed fix — separate decoded text (TOC) from entity-safe text (HTML template)
-    // Collect TOC item with plain text (HTML stripped, entities decoded)
-    const plainText = decodeHtmlEntities(text.replace(/<[^>]*>/g, '').trim())
+    // Strip HTML tags for heading template (keep entities so the HTML stays valid)
+    const strippedText = text.replace(/<[^>]*>/g, '').trim()
+    // Decode entities only for the plain-text TOC entry
+    const plainText = decodeHtmlEntities(strippedText)
     if (plainText) {
       toc.push({ text: plainText, id, depth })
     }

     /** The link href uses the unique slug WITHOUT the 'user-content-' prefix, because that will later be added for all links. */
-    return `<h${semanticLevel} id="${id}" data-level="${depth}"><a href="#${uniqueSlug}">${plainText}</a></h${semanticLevel}>\n`
+    return `<h${semanticLevel} id="${id}" data-level="${depth}"><a href="#${uniqueSlug}">${strippedText}</a></h${semanticLevel}>\n`

🧹 Nitpick comments (1)

shared/utils/html.ts (1)

1-13: Minimal but fit-for-purpose utility.

The regex alternatives and the htmlEntities map are in perfect correspondence, so the || match fallback is technically unreachable at runtime (every match the regex captures has a corresponding map entry). It serves as a compile-time hint when noUncheckedIndexedAccess is enabled, which is fine; just worth noting it cannot fire under normal operation.

coderabbitai

Actionable comments posted: 1

test/unit/shared/utils/html.spec.ts

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/utils/readme.ts (1)
388-395: ⚠️ Potential issue | 🟠 Major

Decoded plainText used in the HTML template causes content loss via sanitizeHtml.

After decoding, plainText may contain raw < and > characters. When these are interpolated into the heading HTML template at line 395 and then processed by sanitizeHtml, the resulting markup is treated as unknown HTML tags and stripped — not preserved as text.

Concrete regression trace for a heading such as ## `Promise<T>`:

text (from marked) = "<code>Promise<T></code>"

stripHtmlTags(text) → "Promise<T>"

decodeHtmlEntities(...) → "Promise<T>" — plainText now contains raw angle brackets

Line 395 template: <a href="#...">Promise<T></a>

sanitizeHtml sees <T> as an unknown tag (not in ALLOWED_TAGS), strips it

Rendered anchor text: "Promise" — <T> is silently lost

Before this PR, plainText was not decoded, so Promise<T> was embedded as HTML entities, which sanitizeHtml preserves and the browser renders correctly as Promise<T>.

The fix is to separate the stripped-but-not-decoded text (safe for HTML templates) from the fully decoded text (correct for TOC display, where Vue renders it as plain text via interpolation):
🐛 Proposed fix
-    // Collect TOC item with plain text (HTML stripped, entities decoded)
-    const plainText = decodeHtmlEntities(stripHtmlTags(text).trim())
+    // Collect TOC item with plain text (HTML stripped, entities decoded)
+    const strippedText = stripHtmlTags(text).trim()
+    const plainText = decodeHtmlEntities(strippedText)
     if (plainText) {
       toc.push({ text: plainText, id, depth })
     }

     /** The link href uses the unique slug WITHOUT the 'user-content-' prefix, because that will later be added for all links. */
-    return `<h${semanticLevel} id="${id}" data-level="${depth}"><a href="#${uniqueSlug}">${plainText}</a></h${semanticLevel}>\n`
+    return `<h${semanticLevel} id="${id}" data-level="${depth}"><a href="#${uniqueSlug}">${strippedText}</a></h${semanticLevel}>\n`

coderabbitai · 2026-02-22T09:19:15Z

server/utils/readme.ts

+ * Strip all HTML tags from a string, looping until stable to prevent
+ * incomplete sanitization from nested/interleaved tags
+ * (e.g. `<scr<script>ipt>` → `<script>` after one pass).
+ */


⚠️ Potential issue | 🟡 Minor

Comment example does not match the actual regex behaviour.

The doc comment illustrates <scr<script>ipt> → <script> after one pass, which is the failure mode of the narrower regex /<[^<>]*>/g (which excludes both < and >). The actual regex used here is /<[^>]*>/g, whose greedy [^>]* also matches <, so the first pass strips <scr<script> (from the opening < to the first >), leaving ipt> — not <script>.

📝 Proposed fix for the comment

/** * Strip all HTML tags from a string, looping until stable to prevent * incomplete sanitization from nested/interleaved tags - * (e.g. `<scr<script>ipt>` → `<script>` after one pass). + * (e.g. `<<script>>` → `<script>` after one pass, then `''` after the second). */

coderabbitai

🧹 Nitpick comments (2)

shared/utils/html.ts (2)
1-13: Derive the regex from htmlEntities keys to eliminate desync risk.

The regex alternation /&(?:amp|lt|gt|quot|apos|nbsp|#39);/g is a manual duplicate of the map's keys. Adding a new entity to the map without updating the regex will silently no-op.
♻️ Proposed refactor — derive regex from map keys
 const htmlEntities: Record<string, string> = {
   '&amp;': '&',
   '&lt;': '<',
   '&gt;': '>',
   '&quot;': '"',
   '&#39;': "'",
   '&apos;': "'",
   '&nbsp;': '\u00A0',
 }
 
+const htmlEntityRegex = new RegExp(
+  Object.keys(htmlEntities)
+    .map(k => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'))
+    .join('|'),
+  'g',
+)
+
 export function decodeHtmlEntities(text: string): string {
-  return text.replace(/&(?:amp|lt|gt|quot|apos|nbsp|#39);/g, match => htmlEntities[match] || match)
+  return text.replace(htmlEntityRegex, match => htmlEntities[match] || match)
 }
1-9: Consider extending entity coverage for common typographic characters.

README headings and package labels frequently contain entities outside this set — — (—), – (–), … (…), © (©), ® (®) — that will now appear as literal entity strings in the server-rendered output. Worth auditing whether the TOC/playground label paths encounter these in practice.

perf: move html entity decoding to server side

69a473c

vercel bot deployed to Preview – npmx.dev February 22, 2026 08:32 View deployment

github-advanced-security bot found potential problems Feb 22, 2026

View reviewed changes

server/utils/readme.ts Fixed Show fixed Hide fixed

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

test: move test to shared dir

6a99a32

vercel bot deployed to Preview – npmx.dev February 22, 2026 08:42 View deployment

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

test/unit/shared/utils/html.spec.ts Show resolved Hide resolved

fix: improve html tag stripping

90313e5

vercel bot deployed to Preview – npmx.dev February 22, 2026 09:12 View deployment

chore: use explicit non-breaking space unicode

1015911

vercel bot deployed to Preview – npmx.dev February 22, 2026 09:18 View deployment

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

test: update expectation

a9fb76d

vercel bot deployed to Preview – npmx.dev February 22, 2026 09:26 View deployment

43081j approved these changes Feb 22, 2026

View reviewed changes

danielroe added this pull request to the merge queue Feb 22, 2026

Merged via the queue into main with commit 54aa86a Feb 22, 2026
20 checks passed

danielroe deleted the fix/entity-server branch February 22, 2026 09:35

coderabbitai bot mentioned this pull request Feb 22, 2026

fix(ui): escape special characters in description #1582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

perf: move html entity decoding to server side#1561

perf: move html entity decoding to server side#1561
danielroe merged 5 commits intomainfrom
fix/entity-server

danielroe commented Feb 22, 2026

Uh oh!

vercel bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot commented Feb 22, 2026 •

edited

Loading

Reviews paused

Walkthrough

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 22, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

danielroe commented Feb 22, 2026

Uh oh!

vercel bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coderabbitai bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Feb 22, 2026 •

edited

Loading

codecov bot commented Feb 22, 2026 •

edited

Loading

coderabbitai bot commented Feb 22, 2026 •

edited

Loading