Skip to content

Conversation

@dereuromark
Copy link
Contributor

Summary

Adds a new Profile::excerpt() static method for generating plain text previews/snippets from Djot content.

Use Cases

1. Search Result Snippets

When indexing content for search, you want to show a preview of the actual content without quoted text or code blocks:

$converter = new DjotConverter(profile: Profile::excerpt());
$html = $converter->convert($document->content);
$snippet = mb_substr(trim(strip_tags($html)), 0, 200) . '...';

2. Social Media Previews (Open Graph)

For meta descriptions, you want the author's text, not quoted material:

// Blog post with quote at the start
$content = <<<'DJOT'
> "The best code is no code at all." — Jeff Atwood

In this article, I'll explain why simplicity matters...
DJOT;

$converter = new DjotConverter(profile: Profile::excerpt());
$text = trim(strip_tags($converter->convert($content)));
// Result: "In this article, I'll explain why simplicity matters..."
// (Quote is stripped)

3. Forum/Comment Thread Previews

Show the user's actual response, not what they're quoting:

// Forum reply with nested quotes
$reply = <<<'DJOT'
> @user123 wrote:
> I think the API needs improvement.

Actually, the API is well-designed. Here's why...
DJOT;

$converter = new DjotConverter(profile: Profile::excerpt());
$preview = trim(strip_tags($converter->convert($reply)));
// Result: "Actually, the API is well-designed. Here's why..."

4. RSS Feed Summaries

Generate clean summaries without code blocks or images:

$converter = new DjotConverter(profile: Profile::excerpt());
$html = $converter->convert($article->body);
$summary = html_entity_decode(trim(strip_tags($html)));

5. Email Subject Lines from Content

Extract meaningful text for automated emails:

$converter = new DjotConverter(profile: Profile::excerpt());
$text = trim(strip_tags($converter->convert($notification->message)));
$subject = mb_substr($text, 0, 50);

What Gets Stripped

Element Type Reason
Blockquotes Quoted content represents others' text, not the author's
Code blocks Technical details not suitable for previews
Images Cannot be represented in plain text
Tables Complex structure doesn't translate to excerpts
Footnotes Reference material not needed in previews
Raw HTML Security concern and doesn't render as text
Math/Symbols Don't translate well to plain text

What Gets Kept

  • Paragraphs, headings, lists (structure for text extraction)
  • Inline formatting (bold, italic, etc.) - renders as HTML, stripped by strip_tags()
  • Links - link text is preserved after strip_tags()

Test Plan

  • Added 13 new test cases covering excerpt profile functionality
  • Added use case tests for search snippets and Open Graph
  • Updated existing profile name/description tests
  • All 94 profile tests pass

Adds Profile::excerpt() for extracting plain text snippets from Djot content.
This is useful for:
- Search result snippets
- Open Graph/social media descriptions
- RSS feed summaries
- Forum/comment thread previews

The profile strips elements that don't belong in excerpts:
- Blockquotes (quoted content isn't the author's text)
- Code blocks (technical details not suitable for previews)
- Images (can't be represented in plain text)
- Tables (complex structure doesn't translate well)
- Footnotes (reference material not needed)
- Raw HTML (security and doesn't render as text)

Uses ACTION_STRIP to completely remove these elements rather than
converting to text representation.

Example usage:
```php
$converter = new DjotConverter(profile: Profile::excerpt());
$html = $converter->convert($djot);
$text = trim(strip_tags($html));
$excerpt = mb_substr($text, 0, 160) . '...';
```
@codecov
Copy link

codecov bot commented Dec 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.44%. Comparing base (1ea357f) to head (32f1724).

Additional details and impacted files
@@             Coverage Diff              @@
##             master      #55      +/-   ##
============================================
+ Coverage     93.40%   93.44%   +0.04%     
- Complexity     1928     1929       +1     
============================================
  Files            74       74              
  Lines          5183     5221      +38     
============================================
+ Hits           4841     4879      +38     
  Misses          342      342              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark
Copy link
Contributor Author

Closing this - the existing API already supports this use case perfectly with Profile::minimal()->denyBlock([...])->onDisallowed(Profile::ACTION_STRIP). A built-in profile isn't necessary for such a simple configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants