fix: health check retry loop instead of fixed 35s sleep
All checks were successful
NALU Deployment Pipeline / Run Tests (push) Successful in 1m29s
NALU Deployment Pipeline / PR Validation (push) Has been skipped
NALU Deployment Pipeline / Build and Push Image (push) Successful in 1m16s
NALU Deployment Pipeline / Deploy naluai.dev (push) Successful in 47s
NALU Deployment Pipeline / Cleanup Old Resources (push) Successful in 12s

Retry up to 12x with 10s intervals (2 min total).
Also reuse SSH setup in health check step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Ricardo Carneiro 2026-05-15 22:18:00 -03:00
parent 971c390ea3
commit 72ad41c529

View File

@ -250,10 +250,25 @@ jobs:
SSHEOF SSHEOF
- name: Health check - name: Health check
env:
SSH_PRIVATE_KEY: ${{ secrets.SSH_PRIVATE_KEY }}
run: | run: |
sleep 35 mkdir -p ~/.ssh
ssh -o StrictHostKeyChecking=no ubuntu@${{ env.SWARM_MANAGER }} \ echo "${SSH_PRIVATE_KEY}" | base64 -d > ~/.ssh/id_rsa
'curl -sf http://localhost:8084/health && echo "✅ nalu healthy" || (echo "❌ health check failed"; docker service ps nalu_app; exit 1)' chmod 600 ~/.ssh/id_rsa
ssh-keyscan -H ${{ env.SWARM_MANAGER }} >> ~/.ssh/known_hosts 2>/dev/null
# retry for up to 2 minutes
for i in $(seq 1 12); do
sleep 10
echo "Attempt $i/12..."
if ssh -o StrictHostKeyChecking=no ubuntu@${{ env.SWARM_MANAGER }} 'curl -sf http://localhost:8084/health'; then
echo "✅ nalu healthy"
exit 0
fi
done
echo "❌ health check failed after 2 minutes"
ssh -o StrictHostKeyChecking=no ubuntu@${{ env.SWARM_MANAGER }} 'docker service ps nalu_app'
exit 1
# ─── Cleanup ────────────────────────────────────────────────────────────── # ─── Cleanup ──────────────────────────────────────────────────────────────
cleanup: cleanup: