Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Allda The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
A new init container is added to the workspace deployment in case user choose to restore the workspace from backup. By setting workspace attribute "controller.devfile.io/restore-workspace" the controller sets a new init container instead of cloning data from git repository. By default an automated path to restore image is used based on cluster settings. However user is capable overwrite that value using another attribute "controller.devfile.io/restore-source-image". The restore container runs a wokspace-recovery.sh script that pull an image using oras an extract files to a /project directory. Signed-off-by: Ales Raszka <araszka@redhat.com>
A new tests that verifies the workspace is created from a backup. It checks if a deployment is ready and if it contains a new restore init container with proper configuration. There are 2 tests - one focused on common pvc and other that have per-workspace storage. Signed-off-by: Ales Raszka <araszka@redhat.com>
The condition whether an workspace should be restored from workspace was in the restore module itself. This make a reading a code more difficult. Now the condition is checked in the controller itself and restore container is only added when enabled. This commit also fixes few minor changes based on the code review comments: - Licence header - Attribute validation - Add a test for disabled workspace recovery - Typos Signed-off-by: Ales Raszka <araszka@redhat.com>
A new config is added to control the restore container. Default values are set for the new init container. It can be changed by user in the config. The config uses same logic as the project clone container config. Signed-off-by: Ales Raszka <araszka@redhat.com>
1b95d94 to
5480c1c
Compare
|
@Allda : I'm facing a strange issue while testing this functionality on CRC cluster . I've tried on both amd64 and arm64 variants but face same issue. I used samples/plain-workspace.yaml for testing. Everything goes fine till step 4, but when I create the restore backup manifest I can see devworkspace resource is created but there is no corresponding pod for it: oc create -f restore-dw.yaml
devworkspace.workspace.devfile.io/plain-devworkspace created
oc get dw
NAME DEVWORKSPACE ID PHASE INFO
plain-devworkspace workspace612b8ddca9ff45d5 Running Workspace is running
oc get pods
No resources found in rokumar-dev namespace.I had just modified the name from the restore manifest you shared: kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
labels:
controller.devfile.io/creator: ""
name: plain-devworkspace
spec:
started: true
routingClass: 'basic'
template:
attributes:
controller.devfile.io/storage-type: common
controller.devfile.io/restore-workspace: 'true'Could you please check if I'm missing something? |
I am not sure why it doesn't work on your system. I tried the workspace you mentioned and the backup and other pods were created successfully. Are there any logs you can share or see if the workspace has any pods at the very start? |
Signed-off-by: Ales Raszka <araszka@redhat.com>
|
I think I found the cause of #1572 (comment). It seems unrelated to this PR. I will need to gather more evidence, and create a fix |
pkg/library/env/workspaceenv.go
Outdated
| Name: devfileConstants.ProjectsRootEnvVar, | ||
| Value: constants.DefaultProjectsSourcesRoot, | ||
| }) | ||
| if workspace.Config.Workspace.BackupCronJob.OrasConfig != nil { |
There was a problem hiding this comment.
To me it seems like a potential nil pointer issue, when workspace restore is enabled but backup configuration is not set in DWOC.
| if workspace.Config.Workspace.BackupCronJob.OrasConfig != nil { | |
| if workspace.Config.Workspace.BackupCronJob != nil && | |
| workspace.Config.Workspace.BackupCronJob.OrasConfig != nil { |
pkg/secrets/backup.go
Outdated
| err = c.Delete(ctx, existingNamespaceSecret) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| } | ||
| namespaceSecret = &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: constants.DevWorkspaceBackupAuthSecretName, | ||
| Namespace: workspace.Namespace, | ||
| Labels: map[string]string{ | ||
| constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId, | ||
| constants.DevWorkspaceWatchSecretLabel: "true", | ||
| }, | ||
| }, | ||
| Data: sourceSecret.Data, | ||
| Type: sourceSecret.Type, | ||
| } | ||
| if err := controllerutil.SetControllerReference(workspace, namespaceSecret, scheme); err != nil { | ||
| return nil, err | ||
| } | ||
| err = c.Create(ctx, namespaceSecret) |
There was a problem hiding this comment.
When multiple workspaces start simultaneously they race to copy the same secret causing failures.
err = c.Delete(ctx, existingNamespaceSecret) // race window opens
// time gap
err = c.Create(ctx, namespaceSecret) // race window closesI think you can update instead of delete and create. Does this make sense?
There was a problem hiding this comment.
I replaced the function with the SyncObjectWithCluster that performs object sync in a standardized way and only syncs it when secrets is changed.
pkg/library/restore/restore.go
Outdated
| MountPath: constants.DefaultProjectsSourcesRoot, | ||
| }, | ||
| } | ||
| registryAuthSecret, err := secrets.HandleRegistryAuthSecret(ctx, k8sClient, workspace.DevWorkspace, workspace.Config, "", scheme, log) |
There was a problem hiding this comment.
Could you please clarify why you passed an empty string for operatorNamespace?
I noticed that the backup implementation retrieves the operator namespace and passes it to the same function.
Sorry if this has already been discussed. If so, please point that out to me.
There was a problem hiding this comment.
@akurinnoy I created a similar discussion here: #1572 (comment)
There was a problem hiding this comment.
I addressed this based on the suggestion.
Signed-off-by: Ales Raszka <araszka@redhat.com>
Signed-off-by: Ales Raszka <araszka@redhat.com>
In case the backup config is not present the value might be null and fails. The new condition handles it. Signed-off-by: Ales Raszka <araszka@redhat.com>
The delay between checking the empty dir and copying a backup content might cause an issue. This fix moves the check right before the content is being copied which minimize the delay. Signed-off-by: Ales Raszka <araszka@redhat.com>
The previous solution always deleted a secret if exist and re-create it. This can lead to potential issues. The new code uses SyncObjectWithCluster that is used across a whole codebase and minimize the risk of issues. Signed-off-by: Ales Raszka <araszka@redhat.com>
50d4dd0 to
27ee785
Compare
|
@dkwon17 I addressed all the code review comments. Is this PR good to be merged? |
pkg/constants/constants.go
Outdated
| //Role kinds | ||
| Role = "Role" | ||
| // ClusterRole kind | ||
| ClusterRole = "ClusterRole" |
There was a problem hiding this comment.
This seems to be a very generic name and can be mistaken for an API concept or exported constant from Kubernetes (which doesn’t exist).
I suggest renaming it to something like rbacRoleKind / rbacClusterRoleKind, or documenting that this is a literal Kind string used for comparison.
| //Role kinds | |
| Role = "Role" | |
| // ClusterRole kind | |
| ClusterRole = "ClusterRole" | |
| // Role kind | |
| rbacRoleKind = "Role" | |
| // ClusterRole kind | |
| rbacClusterRoleKind = "ClusterRole" |
| func HandleRegistryAuthSecret(ctx context.Context, c client.Client, workspace *dw.DevWorkspace, | ||
| dwOperatorConfig *controllerv1alpha1.OperatorConfiguration, operatorConfigNamespace string, scheme *runtime.Scheme, log logr.Logger, | ||
| ) (*corev1.Secret, error) { | ||
| secretName := dwOperatorConfig.Workspace.BackupCronJob.Registry.AuthSecret |
There was a problem hiding this comment.
Potential nil pointer deference:
| secretName := dwOperatorConfig.Workspace.BackupCronJob.Registry.AuthSecret | |
| if dwOperatorConfig.Workspace == nil || | |
| dwOperartorConfig.Workspace.BackupCronJob == nil || | |
| dwOperatorConfig.Workspace.BackupCronJob.Registry == nil { | |
| return nil, fmt.Errorf("backup/restore configuration not properly set in DevWorkspaceOperatorConfig") | |
| } | |
| secretName := dwOperatorConfig.Workspace.BackupCronJob.Registry.AuthSecret |
| // Construct the desired secret state | ||
| desiredSecret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: constants.DevWorkspaceBackupAuthSecretName, | ||
| Namespace: workspace.Namespace, | ||
| Labels: map[string]string{ | ||
| constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId, | ||
| constants.DevWorkspaceWatchSecretLabel: "true", | ||
| }, | ||
| }, | ||
| Data: sourceSecret.Data, | ||
| Type: sourceSecret.Type, | ||
| } |
There was a problem hiding this comment.
Here seems to be another race condition in secret copying. If multiple workspaces are restoring simultaneously in the same namespace, they will race to create/update the same secret name. Does this make sense?
| // Construct the desired secret state | |
| desiredSecret := &corev1.Secret{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: constants.DevWorkspaceBackupAuthSecretName, | |
| Namespace: workspace.Namespace, | |
| Labels: map[string]string{ | |
| constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId, | |
| constants.DevWorkspaceWatchSecretLabel: "true", | |
| }, | |
| }, | |
| Data: sourceSecret.Data, | |
| Type: sourceSecret.Type, | |
| } | |
| // Construct the desired secret state | |
| desiredSecret := &corev1.Secret{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: constants.DevWorkspaceBackupAuthSecretName + "=" + workspace.Status.DevWorkspaceId, | |
| Namespace: workspace.Namespace, | |
| Labels: map[string]string{ | |
| constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId, | |
| constants.DevWorkspaceWatchSecretLabel: "true", | |
| }, | |
| }, | |
| Data: sourceSecret.Data, | |
| Type: sourceSecret.Type, | |
| } |
There was a problem hiding this comment.
I don't think we need an auth secret for each workspace, especially since the secret has the same data, how about just removing the constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId label from the secret?
There was a problem hiding this comment.
@Allda I suggest removing constants.DevWorkspaceIDLabel :
| // Construct the desired secret state | |
| desiredSecret := &corev1.Secret{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: constants.DevWorkspaceBackupAuthSecretName, | |
| Namespace: workspace.Namespace, | |
| Labels: map[string]string{ | |
| constants.DevWorkspaceIDLabel: workspace.Status.DevWorkspaceId, | |
| constants.DevWorkspaceWatchSecretLabel: "true", | |
| }, | |
| }, | |
| Data: sourceSecret.Data, | |
| Type: sourceSecret.Type, | |
| } | |
| // Construct the desired secret state | |
| desiredSecret := &corev1.Secret{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: constants.DevWorkspaceBackupAuthSecretName, | |
| Namespace: workspace.Namespace, | |
| Labels: map[string]string{ | |
| constants.DevWorkspaceWatchSecretLabel: "true", | |
| }, | |
| }, | |
| Data: sourceSecret.Data, | |
| Type: sourceSecret.Type, | |
| } |
|
Now that I'm testing with updated changes, I'm seeing issue that after creating restored workspace manifest . Restore pod correctly starts However, the DevWorkspace is not able to come out of |
| } | ||
| if workspace.Config.Workspace.ProjectCloneConfig.ImagePullPolicy != "" { | ||
| projectCloneOptions.PullPolicy = config.Workspace.ProjectCloneConfig.ImagePullPolicy | ||
| if restore.IsWorkspaceRestoreRequested(&workspace.Spec.Template) { |
There was a problem hiding this comment.
I see every reconcile checks IsWorkspaceRestoreRequested(), I think this would keep adding restore init container, and the workspace would never reach Running phase.
The restore attribute should be automatically removed after successful completion.
There was a problem hiding this comment.
I think this would keep adding restore init container, and the workspace would never reach Running phase.
In this situation, the restore container is basically an alternative to the project-clone container, which (before this PR) is also being added for every reconciliation, so I don't think this is a problem.
Did you face specific issues during the testing?
There was a problem hiding this comment.
While I was testing yesterday, I keep bumping into #1572 (comment)
DevWorkspace was able to come in Running state when I patched DevWorkspace to remove the restore attribute.
I'll check again if it is some problem with my setup.
@rohanKanojia could you please share the DevWorkspace yaml that you used for the restore workspace? |
|
@dkwon17 : I'm trying to reproduce it via a script: restore-external-registry-test.sh #!/usr/bin/env bash
set -euo pipefail
source ./utils.sh
NAMESPACE="openshift-operators"
RESTORE_ATTRIBUTE="controller.devfile.io/restore-workspace"
WORKSPACE_NAME_PREFIX="${1:-test-devworkspace}"
BACKUP_WORKSPACE_NAME="${WORKSPACE_NAME_PREFIX}-should-get-backup"
# 1️⃣ Delete the workspace if it exists
if kubectl get devworkspace "$BACKUP_WORKSPACE_NAME" -n "$NAMESPACE" >/dev/null 2>&1; then
echo "Deleting existing workspace: $BACKUP_WORKSPACE_NAME"
kubectl delete devworkspace "$BACKUP_WORKSPACE_NAME" -n "$NAMESPACE" --wait
fi
# 2️⃣ Apply the restore workspace manifest
cat <<EOF | kubectl apply -f -
apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
name: $BACKUP_WORKSPACE_NAME
namespace: $NAMESPACE
spec:
started: true
template:
attributes:
$RESTORE_ATTRIBUTE: 'true'
projects:
- name: web-nodejs-sample
git:
remotes:
origin: "https://github.com/che-samples/web-nodejs-sample.git"
components:
- name: dev
container:
image: quay.io/devfile/universal-developer-image:latest
memoryLimit: 512Mi
memoryRequest: 256Mi
cpuRequest: 1000m
commands:
- id: say-hello
exec:
component: dev
commandLine: echo "Hello from \$(pwd)"
workingDir: \${PROJECT_SOURCE}/app
contributions:
- name: che-code
uri: https://eclipse-che.github.io/che-plugin-registry/main/v3/plugins/che-incubator/che-code/latest/devfile.yaml
components:
- name: che-code-runtime-description
container:
env:
- name: CODE_HOST
value: 0.0.0.0
EOF
echo "Workspace $BACKUP_WORKSPACE_NAME created with restore attribute"
# 3️⃣ Wait until the workspace is ready
echo "Waiting for workspace to start..."
kubectl wait devworkspace "$BACKUP_WORKSPACE_NAME" -n "$NAMESPACE" --for=condition=Ready --timeout=120s
# Wait up to 120s for pod to appear
echo "Waiting for workspace pod to be created..."
for i in {1..24}; do
POD_NAME=$(kubectl get pods -n "$NAMESPACE" \
-l "controller.devfile.io/devworkspace_name=$BACKUP_WORKSPACE_NAME" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
if [[ -n "$POD_NAME" ]]; then
break
fi
sleep 5
done
if [[ -z "$POD_NAME" ]]; then
echo "❌ Workspace pod was not created in time"
exit 1
fi
echo "✅ Workspace pod created: $POD_NAME"
# 4️⃣ Print controller logs to verify restore logic execution
echo ""
echo "=========================================="
echo "Controller logs (restore logic execution):"
echo "=========================================="
CONTROLLER_POD=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=devworkspace-controller -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
if [[ -n "$CONTROLLER_POD" ]]; then
kubectl logs -n "$NAMESPACE" "$CONTROLLER_POD" --tail=100 | grep -i "restore\|$BACKUP_WORKSPACE_NAME" || echo "No restore-related logs found in recent entries"
else
echo "⚠️ Controller pod not found, skipping controller logs"
fi
# 5️⃣ Print workspace pod logs
echo ""
echo "=========================================="
echo "Workspace pod logs:"
echo "=========================================="
kubectl logs -n "$NAMESPACE" "$POD_NAME" --all-containers=true --tail=50 2>/dev/null || echo "⚠️ Could not fetch pod logs (pod may still be initializing)"
# 6️⃣ Verify restored file exists (example: README.md from backup)
# Adjust this path based on what your backup contains
echo ""
echo "=========================================="
echo "Verifying restored file:"
echo "=========================================="
# First, list the projects directory to see what was restored
echo "Contents of /projects directory:"
kubectl exec -n "$NAMESPACE" "$POD_NAME" -- ls -la /projects/ || true
# Find the project clone directory (it may have a random suffix)
PROJECT_DIR=$(kubectl exec -n "$NAMESPACE" "$POD_NAME" -- sh -c 'find /projects -maxdepth 1 -type d -name "project-clone-*" | head -n 1' 2>/dev/null || echo "")
if [[ -z "$PROJECT_DIR" ]]; then
echo "⚠️ No project-clone-* directory found, checking for direct web-nodejs-sample directory..."
PROJECT_DIR="/projects"
fi
echo "Project directory: $PROJECT_DIR"
# Check for the restored file
RESTORED_FILE="$PROJECT_DIR/web-nodejs-sample/README.md"
echo "Checking if restored file exists: $RESTORED_FILE"
if kubectl exec -n "$NAMESPACE" "$POD_NAME" -- test -f "$RESTORED_FILE"; then
echo "✅ Restored file exists!"
echo ""
echo "Content preview (first 5 lines):"
kubectl exec -n "$NAMESPACE" "$POD_NAME" -- head -n 5 "$RESTORED_FILE"
echo ""
# Verify the specific modification from backup-external-registry-test.sh is present
echo "Verifying backup modification is present..."
if kubectl exec -n "$NAMESPACE" "$POD_NAME" -- grep -q "## Modified via backup test" "$RESTORED_FILE"; then
echo "✅ Backup modification found in restored file!"
echo ""
echo "Last 3 lines of restored file:"
kubectl exec -n "$NAMESPACE" "$POD_NAME" -- tail -n 3 "$RESTORED_FILE"
else
echo "❌ Backup modification NOT found in restored file"
echo ""
echo "Full file content:"
kubectl exec -n "$NAMESPACE" "$POD_NAME" -- cat "$RESTORED_FILE"
exit 1
fi
else
echo "❌ Restored file missing: $RESTORED_FILE"
echo ""
echo "Directory structure:"
kubectl exec -n "$NAMESPACE" "$POD_NAME" -- find /projects -type f -name "README.md" || true
exit 1
fi
echo "✅ Restore test passed! |
|
@rohanKanojia I can reproduce the issue, I am investigating |
Signed-off-by: Ales Raszka <araszka@redhat.com>
|
Thank you @Allda , after these suggestions, I believe we are good to merge |
|
While testing the latest changes, I’m facing an issue with the backup process in the OpenShift internal registry. This seems to be an authentication-related problem. The same test backup ocp flow test script runs successfully on the main branch without any issues. I'm sharing logs : I'm testing it via this script: #!/usr/bin/env bash
set -euo pipefail
source ./utils.sh
# -------------------------
# Defaults
# -------------------------
WORKSPACE_NAME_PREFIX="${1:-test-devworkspace}"
WORKSPACE_STOPPED="${WORKSPACE_NAME_PREFIX}-should-get-backup"
WORKSPACE_RUNNING="${WORKSPACE_NAME_PREFIX}-no-backup"
MANIFEST_URL="${2:-https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/code-latest.yaml}"
DWO_CONFIG_NAME="devworkspace-operator-config"
DWO_NS="openshift-operators"
kubectl config set-context --current --namespace="$DWO_NS"
# -------------------------
# Get OpenShift internal registry route
# -------------------------
echo "🔍 Getting OpenShift internal registry route..."
REGISTRY_SERVICE="default-route-openshift-image-registry.apps-crc.testing"
log_success "Registry route: $REGISTRY_SERVICE"
echo "Will stop workspace to allow backup : $WORKSPACE_STOPPED"
echo "Will keep running workspace to avoid backup : $WORKSPACE_RUNNING"
echo
# -------------------------
# Create or Patch DevWorkspaceOperatorConfig
# -------------------------
echo "⚙️ Enabling backup CronJob with OpenShift internal registry..."
if kubectl get devworkspaceoperatorconfig "$DWO_CONFIG_NAME" -n "$DWO_NS" >/dev/null 2>&1; then
# Config exists, patch it
echo "DevWorkspaceOperatorConfig exists, patching..."
kubectl patch devworkspaceoperatorconfig "$DWO_CONFIG_NAME" -n "$DWO_NS" --type merge -p "
config:
workspace:
backupCronJob:
oras:
extraArgs: '--insecure'
enable: true
schedule: '*/1 * * * *'
registry:
path: ${REGISTRY_SERVICE}
authSecret: ""
"
else
# Config doesn't exist, create it
echo "DevWorkspaceOperatorConfig not found, creating..."
cat <<EOF | kubectl apply -f -
apiVersion: controller.devfile.io/v1alpha1
kind: DevWorkspaceOperatorConfig
metadata:
name: $DWO_CONFIG_NAME
namespace: $DWO_NS
config:
workspace:
backupCronJob:
oras:
extraArgs: '--insecure'
enable: true
schedule: '*/1 * * * *'
registry:
path: ${REGISTRY_SERVICE}
authSecret: ""
EOF
fi
log_success "DevWorkspaceOperatorConfig configured for backup"
# -------------------------
# Create both DevWorkspaces
# -------------------------
echo "🚀 Creating DevWorkspaces..."
deploy_devworkspace "$WORKSPACE_STOPPED" "$MANIFEST_URL"
deploy_devworkspace "$WORKSPACE_RUNNING" "$MANIFEST_URL"
log_success "Both workspaces are running"
sleep 5
# -------------------------
# Modify file in stopped workspace
# -------------------------
echo "📝 Modifying README.md in $WORKSPACE_STOPPED..."
POD_STOPPED=$(kubectl get pod -n "$DWO_NS" \
-l controller.devfile.io/devworkspace_name="$WORKSPACE_STOPPED" \
-o jsonpath='{.items[0].metadata.name}')
sleep 5
kubectl exec "$POD_STOPPED" -n "$DWO_NS" -- \
bash -c 'echo "## Modified via backup test" >> /projects/web-nodejs-sample/README.md'
log_success "File modified"
# -------------------------
# Stop ONLY one workspace
# -------------------------
echo "🛑 Stopping workspace: $WORKSPACE_STOPPED"
kubectl patch dw "$WORKSPACE_STOPPED" -n "$DWO_NS" \
--type merge -p '{"spec":{"started":false}}'
log_success "Workspace stopped"
# =====================================================
# Monitor for backup Jobs
# =====================================================
MONITOR_TIME=600
INTERVAL=5
echo
echo "👀 Monitoring for backup Jobs for ${MONITOR_TIME}s..."
FOUND_STOPPED_JOB=""
FOUND_RUNNING_JOB=""
end=$((SECONDS + MONITOR_TIME))
while [[ $SECONDS -lt $end ]]; do
WORKSPACE_ID_STOPPED=$(kubectl get dw "$WORKSPACE_STOPPED" -n "$DWO_NS" -o jsonpath='{.status.devworkspaceId}')
WORKSPACE_ID_RUNNING=$(kubectl get dw "$WORKSPACE_RUNNING" -n "$DWO_NS" -o jsonpath='{.status.devworkspaceId}')
FOUND_STOPPED_JOB=$(kubectl get jobs -n "$DWO_NS" \
-l "controller.devfile.io/backup-job=true,controller.devfile.io/devworkspace_id=$WORKSPACE_ID_STOPPED" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
FOUND_RUNNING_JOB=$(kubectl get jobs -n "$DWO_NS" \
-l "controller.devfile.io/backup-job=true,controller.devfile.io/devworkspace_id=$WORKSPACE_ID_RUNNING" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "$FOUND_RUNNING_JOB" ]]; then
echo "❌ Backup Job created for RUNNING workspace: $FOUND_RUNNING_JOB"
exit 1
fi
if [[ -n "$FOUND_STOPPED_JOB" ]]; then
log_success "Backup Job detected for STOPPED workspace: $FOUND_STOPPED_JOB"
break
fi
sleep "$INTERVAL"
done
if [[ -z "$FOUND_STOPPED_JOB" ]]; then
echo "❌ Backup Job not created for stopped workspace"
exit 1
fi
# Delete Running DevWorkspace to avoid Multi-Attach Error
kubectl delete dw "$WORKSPACE_RUNNING" -n "$DWO_NS" --ignore-not-found
# -------------------------
# Wait for stopped workspace backup Job completion
# -------------------------
echo "⏳ Waiting for backup Job to complete..."
TIMEOUT=300
ELAPSED=0
CHECK_INTERVAL=5
while [[ $ELAPSED -lt $TIMEOUT ]]; do
# Get pod associated with the job
JOB_POD_NAME=$(kubectl get pods -n "$DWO_NS" \
-l "job-name=$FOUND_STOPPED_JOB" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "$JOB_POD_NAME" ]]; then
# Check pod status
POD_PHASE=$(kubectl get pod "$JOB_POD_NAME" -n "$DWO_NS" \
-o jsonpath='{.status.phase}' 2>/dev/null || true)
# Check if any container is in error state
CONTAINER_STATE=$(kubectl get pod "$JOB_POD_NAME" -n "$DWO_NS" \
-o jsonpath='{.status.containerStatuses[0].state}' 2>/dev/null || true)
if [[ "$POD_PHASE" == "Failed" ]] || echo "$CONTAINER_STATE" | grep -q "waiting.*Error\|terminated.*Error"; then
echo "❌ Backup Job pod is in Error state. Printing logs..."
kubectl logs "$JOB_POD_NAME" -n "$DWO_NS" --all-containers=true
exit 1
fi
fi
# Check if job completed successfully
JOB_STATUS=$(kubectl get job "$FOUND_STOPPED_JOB" -n "$DWO_NS" \
-o jsonpath='{.status.conditions[?(@.type=="Complete")].status}' 2>/dev/null || true)
if [[ "$JOB_STATUS" == "True" ]]; then
log_success "Backup Job completed for stopped workspace"
break
fi
sleep "$CHECK_INTERVAL"
ELAPSED=$((ELAPSED + CHECK_INTERVAL))
done
if [[ $ELAPSED -ge $TIMEOUT ]]; then
echo "❌ Backup Job did not complete in time. Printing logs..."
JOB_POD_NAME=$(kubectl get pods -n "$DWO_NS" \
-l "job-name=$FOUND_STOPPED_JOB" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "$JOB_POD_NAME" ]]; then
kubectl logs "$JOB_POD_NAME" -n "$DWO_NS" --all-containers=true
fi
exit 1
fi
# -------------------------
# Verify backup artifact using ImageStream
# -------------------------
echo
echo "📦 Verifying backup artifact (ImageStream) for $WORKSPACE_STOPPED..."
if kubectl get imagestream "$WORKSPACE_STOPPED" -n "$DWO_NS" >/dev/null 2>&1; then
log_success "ImageStream exists for stopped workspace"
# Show ImageStream details
echo "📋 ImageStream details:"
kubectl get imagestream "$WORKSPACE_STOPPED" -n "$DWO_NS" -o jsonpath='{.status.dockerImageRepository}'
echo
else
echo "❌ ImageStream missing for stopped workspace"
exit 1
fi
# -------------------------
# Verify NO ImageStream for running workspace
# -------------------------
echo
echo "📦 Verifying NO backup artifact for $WORKSPACE_RUNNING..."
if kubectl get imagestream "$WORKSPACE_RUNNING" -n "$DWO_NS" >/dev/null 2>&1; then
echo "❌ ImageStream exists for running workspace"
exit 1
else
log_success "No ImageStream for running workspace"
fi
echo
echo "🎉 Backup validation successful"
log_success "Backup created ONLY for stopped workspace"
log_success "No backup for running workspace"
# -------------------------
# Cleanup logic
# -------------------------
cleanup() {
echo "🗑️ Deleting DevWorkspaces..."
kubectl delete dw "$WORKSPACE_STOPPED" "$WORKSPACE_RUNNING" -n "$DWO_NS" --ignore-not-found
log_success "Cleanup complete"
}
trap cleanup EXIT |
Signed-off-by: Ales Raszka <araszka@redhat.com>
I somehow missed those. It is fixed now. |
|
@rohanKanojia I noticed it's because it seems the role name got accidentally changed from And see if that fixes it? |
|
@dkwon17 : Thanks a lot for your investigation. I can confirm that with the fix openshift backup seems to be working. All backup test scenarios are passing:
I think PR should be good to merge now after this change #1572 (comment) is committed. |
What does this PR do?
Add init container for workspace restoration
A new init container is added to the workspace deployment in case user choose to restore the workspace from backup.
By setting the workspace attribute "controller.devfile.io/restore-workspace" the controller sets a new init container instead of cloning data from git repository.
By default an automated path to restore image is used based on cluster settings. However user is capable overwrite that value using another attribute "controller.devfile.io/restore-source-image".
The restore container runs a wokspace-recovery.sh script that pull an image using oras an extract files to a /project directory.
What issues does this PR fix or reference?
#1525
Is it tested? How?
No automated tests are available in the first phase. I will add tests once I get the first approval that the concept is ok.
How to test:
kubectl delete devworkspace restore-workspace-2controller.devfile.io/restore-workspace)PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-pathto trigger)v8-devworkspace-operator-e2e: DevWorkspace e2e testv8-che-happy-path: Happy path for verification integration with CheWhat's missing: