Skip to content

Conversation

@luarss
Copy link
Contributor

@luarss luarss commented Jan 11, 2026

  • Introduced TensorBoardLogger class for logging metrics during sweeps.
  • Updated sweep function to integrate TensorBoard logging.
  • Enhanced consumer function to log metrics after each parameter run.

* Introduced `TensorBoardLogger` class for logging metrics during sweeps.
* Updated `sweep` function to integrate TensorBoard logging.
* Enhanced `consumer` function to log metrics after each parameter run.

Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss luarss added the autotuner Flow autotuner label Jan 11, 2026
Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss luarss requested a review from vvbandeira January 11, 2026 17:21
@luarss
Copy link
Contributor Author

luarss commented Jan 11, 2026

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

@jeffng-or
Copy link
Contributor

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

Great, thanks! I will check it out and let you know how it goes.

@jeffng-or
Copy link
Contributor

It looks like the code is trying to write the SDC file into tools/AutoTuner/src/constraint.sdc, which isn't writable and also not in a trial-specific directory:

(consumer pid=509) [INFO TUN-0007] Scheduling run for parameter {'_SDC_CLK_PERIOD': 250}.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 676, in <module>
    main()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 672, in main
    sweep()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 605, in sweep
    ray.get(workers)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2771, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 919, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::consumer() (pid=509, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 678, in consumer
    metric_file, _ = ray.get(
ray.exceptions.RayTaskError(PermissionError): ray::openroad_distributed() (pid=499, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 646, in openroad_distributed
    config = parse_config(
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 257, in parse_config
    write_sdc(sdc, path, sdc_original, constraints_sdc)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 116, in write_sdc
    with open(file_name, "w") as file:
PermissionError: [Errno 13] Permission denied: '/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc'

I'm running within a docker container where I've mounted the tools/AutoTuner/src/autotuner directory, but not tools/AutoTuner/src. So, the src directory is not writable.

Here's the script that I use to start the container:

#!/bin/bash

#
# Method to use docker CLI to determine if we're using docker or podman
#
# Sets container_engine global variable with either "docker" or "podman"
#
get_container_engine () {
    local DOCKER_VERSION_STRING=$(docker --version 2> /dev/null)

    if [[ "$DOCKER_VERSION_STRING" == *"Docker"* ]]; then
        container_engine="docker"
    elif [[ "$DOCKER_VERSION_STRING" == *"podman"* ]]; then
        container_engine="podman"
    else
        echo "Unable to determine container engine using docker CLI"
        exit 1
    fi
}

if [ $# -lt 1 ]; then
    echo "Usage: run_at_docker.sh <port_num>"
    exit
fi

port_num=$1
get_container_engine

if [[ $container_engine == "podman" ]]; then
    user_args="--privileged --userns=keep-id"
else
    user_args="-u $(id -u ${USER}):$(id -g ${USER})"
fi

host_dir=`pwd`
docker run --privileged --rm -it -p $port_num:$port_num \
       $user_args \
	-v $host_dir:/OpenROAD-flow-scripts/flow:Z \
 	-v $host_dir/../tools/AutoTuner/src/autotuner:/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner:Z \
 	-v /workspace/rapidus/current/rapidus:/rapidus:Z \
 	-v /platforms/Rapidus/2HP:/platforms/Rapidus/2HP:Z \
	autotuner:1.0 bash

Here's the Dockerfile that I used to build the autotuner:1.0 container:

# syntax=docker/dockerfile:1
#
# Installs ORFS from docker image 
#

FROM openroad/orfs-verific:v3.0-4385-g4ae3d761e

# install AT required packages
RUN pip3 install -U -r /OpenROAD-flow-scripts/tools/AutoTuner/requirements.txt
RUN pip3 install torchvision

# ORFS installation dir
WORKDIR /OpenROAD-flow-scripts/tools/AutoTuner/src

To build the docker image:

docker build -t autotuner:1.0 -f Dockerfile .

To start the container:

./run_at_docker.sh 6008

Within the container:

python3 -m autotuner.distributed --design gcd --platform rapidus2hp --config /OpenROAD-flow-scripts/flow/designs/rapidus2hp/gcd/autotuner.json --experiment sweep --jobs 20 sweep

Copy link
Member

@vvbandeira vvbandeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luarss
Please address Jeff's concerns and request a new review when he is satisfied.

@jeffng-or
Copy link
Contributor

So, here are some differences that I see between tune and sweep:

  • tune calls openroad_distributed from a trial specific directory (e.g. /tmp/ray/session_2026-01-12_22-41-04_998954_29112/artifacts/2026-01-12_22-41-07/tune-tune/working_dirs/variant-AutoTunerBase-30ce30d8-ray)
  • sweep calls openroad_distributed from the os.getcwd()
  • In my case, I'm calling the AT from /OpenROAD-flow-scripts/tools/AutoTuner/src, which is located in the docker image filesystem and isn't writable
  • I can workaround this my changing my container mount point to mount tools/AutoTuner/src, instead of tools/AutoTuner/src/autotuner

Maybe we should be writing the SDC file under the experiment directory, which would be under flow/logs? At least we'd know that the directory is writable.

After I make the change, the AT starts running trials. As it's running, I'm noticing the following:

  • When I "grep -w core_clock" in logs/rapidus2hp/gcd/sweep-sweep/*/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc/metrics.json, all of them report that the clock frequency is 290. So, I'm not sure that the SDC file is being uniquely created for each trial. This is further reinforced when I compare the metrics.json files for two runs, which are virtually identical.
  • I have --jobs set to 20, but it doesn't look like 20 jobs are run in parallel. The job has been running for an hour, but no data has been written to logs/rapidus2hp/gcd/sweep-sweep since the first five minutes of the run.

@jeffng-or
Copy link
Contributor

The job ran overnight without completing, so there's something off. Please use the following flow for testing:

  • docker load -i /home/jeffng/Jan2026Demo/v3.0-4385-g4ae3d761e.tar
  • docker build -t autotuner:1.0 -f Dockerfile . (Use Dockerfile posted above)
  • git checkout 4ae3d76 (in your ORFS workspace)
  • Replace designs/rapidus2hp/gcd/autotuner.json with the content below
  • Execute run_at_docker.sh 6007 (the script above - note that you'll have to change the /workspace/rapidus/current/rapidus path to /platforms or wherever your rapidus workspace is)
  • export PLATFORM_HOME=/rapidus (in docker container)
  • Execute the python3 call above

autotuner.json

{
    "_SDC_FILE_PATH": "constraint.sdc",
    "_SDC_CLK_PERIOD": {
        "type": "int",
        "minmax": [
            180,
            300
        ],
        "step": 10
    }}

Once it works for you, I can try again.

…ing sdc files to correct dir

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
@jeffng-or
Copy link
Contributor

Great, thanks! I'm able to run through a sweep with rapidus2hp gcd and view the data in TensorBoard.

The "score" should roughly match the "metric" in tune mode. Check out the metric name and calculation in the evaluate() method (basically metric should be 100*effective_clk_period plus some other stuff)

Other than that, it's good to go. I'll update my chart generation code to key off the sweep results.

@jeffng-or
Copy link
Contributor

Frequency sweep is working fine. I've ported my chart generator to use the sweep directory organization. So, make the update to the metrics calculation and I think we can call that done.

One quirk that I found when trying to run the physical sweep is that the choice of strings isn't supported in the sweep. My autotuner_phys.json looks like:

{
    "_SDC_FILE_PATH": "constraint.sdc",
    "_SDC_CLK_PERIOD": {
        "type": "float",
        "minmax": [
            670,
	    670
        ],
        "step": 0
    },
    "CORE_UTILIZATION": {
        "type": "int",
        "minmax": [
            40,
            80
        ],
        "step": 1
    },
    "PLACE_SITE": {
        "type": "string",
        "values": [
            "SC6T",
            "SC8T"
        ]
    }
}

But, when I run it:

/OpenROAD-flow-scripts/flow/designs/rapidus2hp/cva6/constraint.sdc
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 677, in <module>
    main()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 618, in main
    config_dict, SDC_ORIGINAL, FR_ORIGINAL = read_config(
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 614, in read_config
    config[key] = read_sweep(value)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 458, in read_sweep
    return [*this["minmax"], this["step"]]
KeyError: 'minmax'

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

@vvbandeira
Copy link
Member

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

The expected config here would be to use "choice" and not "string", which I don't think that the AT sweep supports in this version. The easy solution is to run AT twice since there are only two possible values for that variable. If you would like to expand to N-possibilieities we should be able to add support as well in another PR.

Signed-off-by: Jack Luar <jluar@precisioninno.com>
@jeffng-or
Copy link
Contributor

jeffng-or commented Jan 16, 2026

likely because the PLACE_SITE doesn't have a minmax. How do we enable this in sweep mode?

The expected config here would be to use "choice" and not "string", which I don't think that the AT sweep supports in this version. The easy solution is to run AT twice since there are only two possible values for that variable. If you would like to expand to N-possibilieities we should be able to add support as well in another PR.

OK, that makes sense. I'd prefer to not have to run AT twice, so let me file another GH issue for choice/string support. That way we get enable this PR, which works.

Filed: #3809

Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss
Copy link
Contributor Author

luarss commented Jan 16, 2026

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

@jeffng-or
Copy link
Contributor

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

Something is off:
image

@jeffng-or Fixed the scoring, could you please check? We should be using the same scoring module for both tune/sweep now.

Yeah, it looks like it works. For consistency, can we change the name of "score" to "metric" to match the tune mode? Also, I noticed that if the score/metric is 9e99, the resulting score in tensorboard is shown as 0.

@luarss
Copy link
Contributor Author

luarss commented Jan 16, 2026

Sure, can change the key. Should I show it as 9e99 or 0?

@jeffng-or
Copy link
Contributor

Sure, can change the key. Should I show it as 9e99 or 0?

9e99 is prob best. Users can filter it out, if they want.

* metrics: show error scores as 9e99

Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss
Copy link
Contributor Author

luarss commented Jan 22, 2026

@jeffng-or Could you please retry with the latest diffs?

@jeffng-or
Copy link
Contributor

The metric name change looks fine. I've been running with this and the string choice code for the past couple of days without a problem. Looks good to go. Thanks!

@maliberty maliberty requested a review from vvbandeira January 22, 2026 16:57
@vvbandeira vvbandeira merged commit 169b196 into The-OpenROAD-Project:master Jan 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autotuner Flow autotuner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants