Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
import com.linkedin.urls.detection.UrlDetector;
import com.linkedin.urls.detection.UrlDetectorOptions;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.CompletableFuture;

/**
* Utility class to detect links.
Expand Down Expand Up @@ -58,6 +63,43 @@ public static boolean containsLink(String content) {
return !(new UrlDetector(content, UrlDetectorOptions.BRACKET_MATCH).detect().isEmpty());
}

public static CompletableFuture<Boolean> isLinkBroken(String url) {
HttpClient client = HttpClient.newHttpClient();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the HttpClient as a static reference in this class please?

i.e.

private static final HttpClient CLIENT = HttpClient.newHttpClient();

public static CompletableFuture<Boolean> isLinkBroken(String url) {
    HttpRequest request = HttpRequest.newBuilder(URI.create(url))
            .method("HEAD", HttpRequest.BodyPublishers.noBody())
            .build();

    return CLIENT.sendAsync(request, HttpResponse.BodyHandlers.discarding())
            .thenApply(response -> response.statusCode() >= 400)
            .exceptionally(ignored -> true);
}

Creating a new HttpClient per link is unnecessary and inefficient. HttpClient is thread-safe and designed to be reused across concurrent requests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed review, I really appreciate it @tj-wazei

I’ve updated the code to reuse a private static final HttpClient

I see the issue with mutating a shared StringBuilder from async callbacks — I’m currently reworking replaceDeadLinks(..) to separate the async link checks from the sequential text replacement so it’s thread-safe.

I’ll push an updated version shortly when am done. once again thanks for pointing those out!


HttpRequest request = HttpRequest.newBuilder(URI.create(url))
.method("HEAD", HttpRequest.BodyPublishers.noBody())
.build();

return client.sendAsync(request, HttpResponse.BodyHandlers.discarding())
.thenApply(response -> response.statusCode() >= 400)
.exceptionally(ignored -> true);
}

public static CompletableFuture<String> replaceDeadLinks(String text, String replacement) {
Set<LinkFilter> filters = Set.of(LinkFilter.SUPPRESSED, LinkFilter.NON_HTTP_SCHEME);

List<String> links = extractLinks(text, filters);

if (links.isEmpty()) {
return CompletableFuture.completedFuture(text);
}

StringBuilder result = new StringBuilder(text);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is not thread-safe. StringBuilder is mutated from multiple async callbacks, causing data races and invalid index calculations. Async HTTP completion order is nondeterministic, so index-based replacements are unsafe.

The correct approach is to separate concurrent link checks from sequential string mutation: perform all HTTP checks first, then apply replacements in a single thread once all futures complete.


List<CompletableFuture<Void>> checks =
links.stream().map(link -> isLinkBroken(link).thenAccept(isDead -> {
if (isDead) {
int index = result.indexOf(link);
if (index != -1) {
result.replace(index, index + link.length(), replacement);
}
}
})).toList();

return CompletableFuture.allOf(checks.toArray(new CompletableFuture[0]))
.thenApply(v -> result.toString());
}

private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
String raw = url.getOriginalUrl();
if (filter.contains(LinkFilter.SUPPRESSED) && raw.contains(">")) {
Expand All @@ -76,7 +118,6 @@ private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
// Remove trailing punctuation
link = link.substring(0, link.length() - 1);
}

return Optional.of(link);
}

Expand Down