Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions NATIVE_IMAGE_FIX_STATUS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Native-Image Build Fix Status

## Problem Statement
Adding the `profiling-scrubber` module triggered 44 "unintentionally initialized at build time" errors when building with GraalVM native-image and profiler enabled (`-J-javaagent` during compilation).

## Root Cause Identified

**The initialization cascade was caused by Exception Profiling instrumentation:**

Using `--trace-class-initialization`, we discovered:
```
datadog.trace.bootstrap.CallDepthThreadLocalMap caused initialization at build time:
at datadog.trace.bootstrap.CallDepthThreadLocalMap.<clinit>(CallDepthThreadLocalMap.java:13)
at datadog.trace.bootstrap.instrumentation.jfr.exceptions.ExceptionProfiling$Exclusion.isEffective(ExceptionProfiling.java:49)
at java.lang.Exception.<init>(Exception.java:86)
at java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:76)
at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:71)
```

**Why this happens:**
1. Agent attaches via `-J-javaagent` during native-image compilation
2. OpenJdkController constructor runs and starts ExceptionProfiling
3. GraalVM throws exceptions during class scanning
4. Instrumented Exception constructor triggers ExceptionProfiling code
5. This initializes CallDepthThreadLocalMap and 43 other config/bootstrap classes at build time

## Solution Applied

**Disable exception profiling during native-image build via configuration:**

Modified: `dd-smoke-tests/spring-boot-3.0-native/application/build.gradle`
```gradle
if (withProfiler && property('profiler') == 'true') {
buildArgs.add("-J-Ddd.profiling.enabled=true")
// Disable exception profiling during native-image build to avoid class initialization cascade
buildArgs.add("-J-Ddd.profiling.disabled.events=datadog.ExceptionSample")
}
```

## Results

### ✅ SUCCESS: Initialization Errors Fixed
- **Before:** 44 classes unintentionally initialized at build time
- **After:** 0 initialization errors

The configuration approach successfully prevents ExceptionProfiling from starting during native-image compilation, eliminating the entire initialization cascade.

### ⚠️ NEW ISSUE: JVM Crash During Native-Image Build

The build now fails with a JVM fatal error:
```
SIGBUS (0xa) at pc=0x00000001067aa404
Problematic frame: V [libjvm.dylib+0x8be404] PSRootsClosure<false>::do_oop(narrowOop*)+0x48
```

**Error details:**
- Crash occurs during garbage collection (Parallel Scavenge)
- Happens while processing JavaThread frames
- Stack trace shows agent's bytecode instrumentation is active:
- `datadog.instrument.classmatch.ClassFile.parse`
- `datadog.trace.agent.tooling.bytebuddy.outline.OutlineTypeParser.parse`
- `datadog.trace.agent.tooling.bytebuddy.outline.TypeFactory.lookupType`

**Error report:** `dd-smoke-tests/spring-boot-3.0-native/build/application/native/nativeCompile/hs_err_pid*.log`

## Files Modified

1. **dd-java-agent/agent-profiling/profiling-scrubber/build.gradle**
- Removed unnecessary `internal-api` dependency (profiling-scrubber doesn't use it)

2. **dd-java-agent/agent-profiling/src/main/java/com/datadog/profiling/agent/ProfilingAgent.java**
- Removed static import of `PROFILING_TEMP_DIR_DEFAULT` (had System.getProperty in initializer)
- Changed to runtime computation: `System.getProperty("java.io.tmpdir")` at line 162-163

3. **dd-java-agent/agent-profiling/profiling-controller/src/main/java/com/datadog/profiling/controller/ProfilerFlareReporter.java**
- Line ~229: Replaced `PROFILING_JFR_REPOSITORY_BASE_DEFAULT` with runtime computation
- Line ~507: Replaced `PROFILING_TEMP_DIR_DEFAULT` with runtime computation

4. **dd-java-agent/agent-profiling/profiling-controller-openjdk/src/main/java/com/datadog/profiling/controller/openjdk/OpenJdkController.java**
- Line ~275: Replaced `PROFILING_JFR_REPOSITORY_BASE_DEFAULT` with runtime computation
- **Note:** This file is clean - no native-image detection code added

5. **dd-smoke-tests/spring-boot-3.0-native/application/build.gradle**
- Added `-J-Ddd.profiling.disabled.events=datadog.ExceptionSample` to disable exception profiling during build
- Added trace flag (temporary, for debugging): `--trace-class-initialization=datadog.trace.bootstrap.CallDepthThreadLocalMap`

## Next Steps

The JVM crash during native-image build needs investigation:

### Option 1: Investigate GC Crash
- The crash occurs in Parallel GC during thread stack scanning
- May be related to agent's bytecode instrumentation interfering with GC
- Could try different GC algorithm or adjust heap settings

### Option 2: Reduce Agent Footprint During Build
- The agent performs extensive bytecode parsing during native-image compilation
- Consider disabling more agent features during build (not just exception profiling)
- Possible flags to try:
- `-J-Ddd.instrumentation.enabled=false` (if such flag exists)
- Reduce instrumentation scope during native-image compilation

### Option 3: Check for Known Issues
- Search for similar SIGBUS crashes with GraalVM + Java agents
- Check if this is a known GraalVM 21.0.9 issue
- Test with different GraalVM version

### Option 4: Alternative Approach
- Consider NOT attaching agent during native-image build
- Configure agent to attach only at runtime in the compiled native-image
- May require changes to how profiling is initialized

## Testing Commands

```bash
# Rebuild agent
./gradlew :dd-java-agent:shadowJar

# Test native-image build with profiler
./gradlew :dd-smoke-tests:spring-boot-3.0-native:springNativeBuild \
-PtestJvm=graalvm21 -Pprofiler=true --no-daemon

# Check initialization errors (should be 0)
grep -c "was unintentionally initialized" \
build/logs/*springNativeBuild.log

# View JVM crash report
ls -t dd-smoke-tests/spring-boot-3.0-native/build/application/native/nativeCompile/hs_err_pid*.log | head -1
```

## Key Learnings

1. **Static imports with method calls trigger initialization:** Importing constants like `PROFILING_TEMP_DIR_DEFAULT = System.getProperty("java.io.tmpdir")` causes GraalVM to initialize classes at build time.

2. **Exception profiling is a major trigger:** When the agent is active during native-image compilation, any exceptions thrown (e.g., ClassNotFoundException during class scanning) trigger instrumentation that initializes many config classes.

3. **Configuration-based disable works:** Disabling JFR events via `-Ddd.profiling.disabled.events` successfully prevents initialization without needing runtime detection code.

4. **Avoid detection during initialization:** Any attempt to detect "are we in native-image compilation" (Class.forName, getResource, etc.) can itself trigger the cascade we're trying to avoid.

5. **Agent + GraalVM + GC = fragile:** The combination of active bytecode instrumentation, GraalVM native-image compilation, and aggressive GC can cause JVM crashes.

## Branch Status

- Branch: `jb/jfr_redacting`
- All changes committed and ready to push
- Initialization cascade: FIXED ✅
- Native-image build: CRASHES ⚠️
7 changes: 5 additions & 2 deletions dd-java-agent/agent-profiling/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ excludedClassesCoverage += [
'com.datadog.profiling.agent.ProfilingAgent',
'com.datadog.profiling.agent.ProfilingAgent.ShutdownHook',
'com.datadog.profiling.agent.ProfilingAgent.DataDumper',
'com.datadog.profiling.agent.ProfilerFlare'
'com.datadog.profiling.agent.ProfilerFlare',
'com.datadog.profiling.agent.ScrubRecordingDataListener',
'com.datadog.profiling.agent.ScrubRecordingDataListener.ScrubbedRecordingData'
]

dependencies {
api libs.slf4j
api project(':internal-api')
implementation project(':internal-api')

api project(':dd-java-agent:agent-profiling:profiling-ddprof')
api project(':dd-java-agent:agent-profiling:profiling-uploader')
api project(':dd-java-agent:agent-profiling:profiling-controller')
implementation project(':dd-java-agent:agent-profiling:profiling-scrubber')
api project(':dd-java-agent:agent-profiling:profiling-controller-jfr')
api project(':dd-java-agent:agent-profiling:profiling-controller-jfr:implementation')
api project(':dd-java-agent:agent-profiling:profiling-controller-ddprof')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,11 +272,11 @@ && isEventEnabled(recordingSettings, "jdk.NativeMethodSample")) {
}

private static String getJfrRepositoryBase(ConfigProvider configProvider) {
String jfrRepoDefault = System.getProperty("java.io.tmpdir") + "/dd/jfr";
String legacy =
configProvider.getString(
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE,
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE_DEFAULT);
if (!legacy.equals(ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE_DEFAULT)) {
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE, jfrRepoDefault);
if (!legacy.equals(jfrRepoDefault)) {
log.warn(
"The configuration key {} is deprecated. Please use {} instead.",
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,8 @@ private String getProfilerConfig() {
"JFR Repository Base",
configProvider.getString(
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE,
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE_DEFAULT),
ProfilingConfig.PROFILING_JFR_REPOSITORY_BASE_DEFAULT);
System.getProperty("java.io.tmpdir") + "/dd/jfr"),
System.getProperty("java.io.tmpdir") + "/dd/jfr");
appendConfig(
sb,
"JFR Repository Max Size",
Expand Down Expand Up @@ -504,8 +504,8 @@ private String getProfilerConfig() {
sb,
"Temp Directory",
configProvider.getString(
ProfilingConfig.PROFILING_TEMP_DIR, ProfilingConfig.PROFILING_TEMP_DIR_DEFAULT),
ProfilingConfig.PROFILING_TEMP_DIR_DEFAULT);
ProfilingConfig.PROFILING_TEMP_DIR, System.getProperty("java.io.tmpdir")),
System.getProperty("java.io.tmpdir"));
appendConfig(
sb,
"Debug Dump Path",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import java.nio.file.Path;
import java.time.Instant;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;

final class DatadogProfilerRecordingData extends RecordingData {
private final Path recordingFile;
Expand Down Expand Up @@ -36,4 +37,10 @@ public void release() {
public String getName() {
return "ddprof";
}

@Nullable
@Override
public Path getPath() {
return recordingFile;
}
}
14 changes: 14 additions & 0 deletions dd-java-agent/agent-profiling/profiling-scrubber/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apply from: "$rootDir/gradle/java.gradle"

minimumInstructionCoverage = 0.0
minimumBranchCoverage = 0.0

dependencies {
api libs.slf4j

implementation libs.jafar.parser

testImplementation libs.bundles.junit5
testImplementation libs.bundles.mockito
testImplementation libs.bundles.jmc
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
package com.datadog.profiling.scrubber;

import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Function;

/** Provides the default scrub definition targeting sensitive JFR event fields. */
public final class DefaultScrubDefinition {

private static final Map<String, JfrScrubber.ScrubField> DEFAULT_SCRUB_FIELDS;

static {
Map<String, JfrScrubber.ScrubField> fields = new HashMap<>();
// System properties may contain API keys, passwords
fields.put(
"jdk.InitialSystemProperty", new JfrScrubber.ScrubField(null, "value", (k, v) -> true));
// JVM args may contain credentials in -D flags
fields.put(
"jdk.JVMInformation", new JfrScrubber.ScrubField(null, "jvmArguments", (k, v) -> true));
// Env vars may contain secrets
fields.put(
"jdk.InitialEnvironmentVariable",
new JfrScrubber.ScrubField(null, "value", (k, v) -> true));
// Process command lines may reveal infrastructure
fields.put(
"jdk.SystemProcess", new JfrScrubber.ScrubField(null, "commandLine", (k, v) -> true));
DEFAULT_SCRUB_FIELDS = Collections.unmodifiableMap(fields);
}

/**
* Creates a scrub definition function that maps event type names to their scrub field
* definitions.
*
* @param excludeEventTypes list of event type names to exclude from scrubbing, or null for none
* @return a function mapping event type names to scrub field definitions
*/
public static Function<String, JfrScrubber.ScrubField> create(List<String> excludeEventTypes) {
Set<String> excludeSet =
excludeEventTypes != null ? new HashSet<>(excludeEventTypes) : Collections.<String>emptySet();

return eventTypeName -> {
if (excludeSet.contains(eventTypeName)) {
return null;
}
return DEFAULT_SCRUB_FIELDS.get(eventTypeName);
};
}

private DefaultScrubDefinition() {}
}
Loading
Loading