Add TensorBoard logging for AutoTuner sweep mode #3780

luarss · 2026-01-11T16:02:13Z

Introduced TensorBoardLogger class for logging metrics during sweeps.
Updated sweep function to integrate TensorBoard logging.
Enhanced consumer function to log metrics after each parameter run.

* Introduced `TensorBoardLogger` class for logging metrics during sweeps. * Updated `sweep` function to integrate TensorBoard logging. * Enhanced `consumer` function to log metrics after each parameter run. Signed-off-by: Jack Luar <jluar@precisioninno.com>

Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss · 2026-01-11T17:22:53Z

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

jeffng-or · 2026-01-12T16:21:49Z

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

Great, thanks! I will check it out and let you know how it goes.

jeffng-or · 2026-01-12T16:58:06Z

It looks like the code is trying to write the SDC file into tools/AutoTuner/src/constraint.sdc, which isn't writable and also not in a trial-specific directory:

(consumer pid=509) [INFO TUN-0007] Scheduling run for parameter {'_SDC_CLK_PERIOD': 250}.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 676, in <module>
    main()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 672, in main
    sweep()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 605, in sweep
    ray.get(workers)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2771, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 919, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::consumer() (pid=509, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 678, in consumer
    metric_file, _ = ray.get(
ray.exceptions.RayTaskError(PermissionError): ray::openroad_distributed() (pid=499, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 646, in openroad_distributed
    config = parse_config(
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 257, in parse_config
    write_sdc(sdc, path, sdc_original, constraints_sdc)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 116, in write_sdc
    with open(file_name, "w") as file:
PermissionError: [Errno 13] Permission denied: '/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc'

I'm running within a docker container where I've mounted the tools/AutoTuner/src/autotuner directory, but not tools/AutoTuner/src. So, the src directory is not writable.

Here's the script that I use to start the container:

#!/bin/bash

#
# Method to use docker CLI to determine if we're using docker or podman
#
# Sets container_engine global variable with either "docker" or "podman"
#
get_container_engine () {
    local DOCKER_VERSION_STRING=$(docker --version 2> /dev/null)

    if [[ "$DOCKER_VERSION_STRING" == *"Docker"* ]]; then
        container_engine="docker"
    elif [[ "$DOCKER_VERSION_STRING" == *"podman"* ]]; then
        container_engine="podman"
    else
        echo "Unable to determine container engine using docker CLI"
        exit 1
    fi
}

if [ $# -lt 1 ]; then
    echo "Usage: run_at_docker.sh <port_num>"
    exit
fi

port_num=$1
get_container_engine

if [[ $container_engine == "podman" ]]; then
    user_args="--privileged --userns=keep-id"
else
    user_args="-u $(id -u ${USER}):$(id -g ${USER})"
fi

host_dir=`pwd`
docker run --privileged --rm -it -p $port_num:$port_num \
       $user_args \
	-v $host_dir:/OpenROAD-flow-scripts/flow:Z \
 	-v $host_dir/../tools/AutoTuner/src/autotuner:/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner:Z \
 	-v /workspace/rapidus/current/rapidus:/rapidus:Z \
 	-v /platforms/Rapidus/2HP:/platforms/Rapidus/2HP:Z \
	autotuner:1.0 bash

Here's the Dockerfile that I used to build the autotuner:1.0 container:

# syntax=docker/dockerfile:1
#
# Installs ORFS from docker image 
#

FROM openroad/orfs-verific:v3.0-4385-g4ae3d761e

# install AT required packages
RUN pip3 install -U -r /OpenROAD-flow-scripts/tools/AutoTuner/requirements.txt
RUN pip3 install torchvision

# ORFS installation dir
WORKDIR /OpenROAD-flow-scripts/tools/AutoTuner/src

To build the docker image:

docker build -t autotuner:1.0 -f Dockerfile .

To start the container:

./run_at_docker.sh 6008

Within the container:

python3 -m autotuner.distributed --design gcd --platform rapidus2hp --config /OpenROAD-flow-scripts/flow/designs/rapidus2hp/gcd/autotuner.json --experiment sweep --jobs 20 sweep

vvbandeira

@luarss
Please address Jeff's concerns and request a new review when he is satisfied.

jeffng-or · 2026-01-13T00:10:49Z

So, here are some differences that I see between tune and sweep:

tune calls openroad_distributed from a trial specific directory (e.g. /tmp/ray/session_2026-01-12_22-41-04_998954_29112/artifacts/2026-01-12_22-41-07/tune-tune/working_dirs/variant-AutoTunerBase-30ce30d8-ray)
sweep calls openroad_distributed from the os.getcwd()
In my case, I'm calling the AT from /OpenROAD-flow-scripts/tools/AutoTuner/src, which is located in the docker image filesystem and isn't writable
I can workaround this my changing my container mount point to mount tools/AutoTuner/src, instead of tools/AutoTuner/src/autotuner

Maybe we should be writing the SDC file under the experiment directory, which would be under flow/logs? At least we'd know that the directory is writable.

After I make the change, the AT starts running trials. As it's running, I'm noticing the following:

When I "grep -w core_clock" in logs/rapidus2hp/gcd/sweep-sweep/*/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc/metrics.json, all of them report that the clock frequency is 290. So, I'm not sure that the SDC file is being uniquely created for each trial. This is further reinforced when I compare the metrics.json files for two runs, which are virtually identical.
I have --jobs set to 20, but it doesn't look like 20 jobs are run in parallel. The job has been running for an hour, but no data has been written to logs/rapidus2hp/gcd/sweep-sweep since the first five minutes of the run.

jeffng-or · 2026-01-13T15:47:50Z

The job ran overnight without completing, so there's something off. Please use the following flow for testing:

docker load -i /home/jeffng/Jan2026Demo/v3.0-4385-g4ae3d761e.tar
docker build -t autotuner:1.0 -f Dockerfile . (Use Dockerfile posted above)
git checkout 4ae3d76 (in your ORFS workspace)
Replace designs/rapidus2hp/gcd/autotuner.json with the content below
Execute run_at_docker.sh 6007 (the script above - note that you'll have to change the /workspace/rapidus/current/rapidus path to /platforms or wherever your rapidus workspace is)
export PLATFORM_HOME=/rapidus (in docker container)
Execute the python3 call above

autotuner.json

{
    "_SDC_FILE_PATH": "constraint.sdc",
    "_SDC_CLK_PERIOD": {
        "type": "int",
        "minmax": [
            180,
            300
        ],
        "step": 10
    }}

Once it works for you, I can try again.

…ing sdc files to correct dir Signed-off-by: Jack Luar <jluar@precisioninno.com>

Signed-off-by: Jack Luar <jluar@precisioninno.com>

jeffng-or · 2026-01-15T18:50:46Z

Great, thanks! I'm able to run through a sweep with rapidus2hp gcd and view the data in TensorBoard.

The "score" should roughly match the "metric" in tune mode. Check out the metric name and calculation in the evaluate() method (basically metric should be 100*effective_clk_period plus some other stuff)

Other than that, it's good to go. I'll update my chart generation code to key off the sweep results.

jeffng-or · 2026-01-16T00:57:17Z

Frequency sweep is working fine. I've ported my chart generator to use the sweep directory organization. So, make the update to the metrics calculation and I think we can call that done.

One quirk that I found when trying to run the physical sweep is that the choice of strings isn't supported in the sweep. My autotuner_phys.json looks like:

{
    "_SDC_FILE_PATH": "constraint.sdc",
    "_SDC_CLK_PERIOD": {
        "type": "float",
        "minmax": [
            670,
	    670
        ],
        "step": 0
    },
    "CORE_UTILIZATION": {
        "type": "int",
        "minmax": [
            40,
            80
        ],
        "step": 1
    },
    "PLACE_SITE": {
        "type": "string",
        "values": [
            "SC6T",
            "SC8T"
        ]
    }
}

But, when I run it:

/OpenROAD-flow-scripts/flow/designs/rapidus2hp/cva6/constraint.sdc
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 677, in <module>
    main()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 618, in main
    config_dict, SDC_ORIGINAL, FR_ORIGINAL = read_config(
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 614, in read_config
    config[key] = read_sweep(value)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 458, in read_sweep
    return [*this["minmax"], this["step"]]
KeyError: 'minmax'

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

vvbandeira · 2026-01-16T08:46:20Z

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

The expected config here would be to use "choice" and not "string", which I don't think that the AT sweep supports in this version. The easy solution is to run AT twice since there are only two possible values for that variable. If you would like to expand to N-possibilieities we should be able to add support as well in another PR.

Signed-off-by: Jack Luar <jluar@precisioninno.com>

jeffng-or · 2026-01-16T15:34:49Z

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

The expected config here would be to use "choice" and not "string", which I don't think that the AT sweep supports in this version. The easy solution is to run AT twice since there are only two possible values for that variable. If you would like to expand to N-possibilieities we should be able to add support as well in another PR.

OK, that makes sense. I'd prefer to not have to run AT twice, so let me file another GH issue for choice/string support. That way we get enable this PR, which works.

Filed: #3809

Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss · 2026-01-16T15:58:09Z

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

jeffng-or · 2026-01-16T17:04:30Z

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

Something is off:

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

Yeah, it looks like it works. For consistency, can we change the name of "score" to "metric" to match the tune mode? Also, I noticed that if the score/metric is 9e99, the resulting score in tensorboard is shown as 0.

luarss · 2026-01-16T17:11:24Z

Sure, can change the key. Should I show it as 9e99 or 0?

jeffng-or · 2026-01-16T17:15:50Z

Sure, can change the key. Should I show it as 9e99 or 0?

9e99 is prob best. Users can filter it out, if they want.

* metrics: show error scores as 9e99 Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss · 2026-01-22T10:57:03Z

@jeffng-or Could you please retry with the latest diffs?

jeffng-or · 2026-01-22T15:46:35Z

The metric name change looks fine. I've been running with this and the string choice code for the past couple of days without a problem. Looks good to go. Thanks!

luarss added the autotuner Flow autotuner label Jan 11, 2026

remove unused idx

b38911e

Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss requested a review from vvbandeira January 11, 2026 17:21

vvbandeira requested changes Jan 12, 2026

View reviewed changes

* add calculate_trial_path function for log path handling, redirect…

f1f1920

…ing sdc files to correct dir Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss force-pushed the topic/tb-sweep branch from 56771e4 to f1f1920 Compare January 14, 2026 17:33

fix tb logger

0d6e7a3

Signed-off-by: Jack Luar <jluar@precisioninno.com>

implement score calculation module (used for tune/sweep)

b7c958c

Signed-off-by: Jack Luar <jluar@precisioninno.com>

bugfix: use correct hparam/score value for invalids

10e92e9

Signed-off-by: Jack Luar <jluar@precisioninno.com>

* hparam/score -> hparam/metric

7170733

* metrics: show error scores as 9e99 Signed-off-by: Jack Luar <jluar@precisioninno.com>

maliberty requested a review from vvbandeira January 22, 2026 16:57

vvbandeira merged commit 169b196 into The-OpenROAD-Project:master Jan 22, 2026
8 checks passed

Add TensorBoard logging for AutoTuner sweep mode #3780

Add TensorBoard logging for AutoTuner sweep mode #3780

Uh oh!

Conversation

luarss commented Jan 11, 2026

Uh oh!

luarss commented Jan 11, 2026

Uh oh!

jeffng-or commented Jan 12, 2026

Uh oh!

jeffng-or commented Jan 12, 2026

Uh oh!

vvbandeira left a comment

Choose a reason for hiding this comment

Uh oh!

jeffng-or commented Jan 13, 2026

Uh oh!

jeffng-or commented Jan 13, 2026

Uh oh!

jeffng-or commented Jan 15, 2026

Uh oh!

jeffng-or commented Jan 16, 2026

Uh oh!

vvbandeira commented Jan 16, 2026

Uh oh!

jeffng-or commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luarss commented Jan 16, 2026

Uh oh!

jeffng-or commented Jan 16, 2026

Uh oh!

luarss commented Jan 16, 2026

Uh oh!

jeffng-or commented Jan 16, 2026

Uh oh!

luarss commented Jan 22, 2026

Uh oh!

jeffng-or commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeffng-or commented Jan 16, 2026 •

edited

Loading