Skip to content

Troubleshooting

Common issues and their solutions when running Ops Atlas.

Cannot Connect to Docker Host

Symptom: Environment shows as offline, containers are not discovered.

Solutions:

  1. Verify SSH credentials — Ensure the host, port, username, and private key (or password) are correct in the environment configuration.

  2. Check Docker is listening — The Docker daemon must be accessible. Verify it is running on the remote host:

    bash
    ssh user@host "docker ps"
  3. Check the Docker port — If using TCP, confirm port 2375 (or 2376 for TLS) is open:

    bash
    curl http://host:2375/version
  4. Firewall rules — Ensure the firewall on the Docker host allows inbound connections from the Ops Atlas backend on the SSH and Docker ports.

TIP

The SSH user must be a member of the docker group on the remote host, or have sudo access to run Docker commands.

Backend Won't Start

Symptom: Backend container exits immediately or shows connection errors in logs.

Solutions:

  1. PostgreSQL not ready — Ensure the postgres container is healthy before the backend starts:

    bash
    docker compose logs postgres
  2. Database URL incorrect — Verify POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_PORT in your .env file match the PostgreSQL container configuration.

  3. Missing JWT_SECRET — The backend will not start without a JWT_SECRET value. Generate one:

    bash
    openssl rand -base64 32
  4. Port conflict — Check that port 8090 (or your configured BACKEND_PORT) is not already in use:

    bash
    lsof -i :8090

DANGER

Never leave JWT_SECRET or ENCRYPTION_KEY empty. The application will fail to start or produce cryptographic errors at runtime.

Frontend Shows Blank Page

Symptom: Browser loads but the page is empty or shows a white screen.

Solutions:

  1. API URL misconfigured — Open the browser developer console (F12) and check for failed network requests. The frontend must be able to reach the backend API.

  2. CORS not configured — If the frontend and backend are on different origins, ensure CORS_ALLOWED_ORIGINS includes the frontend URL:

    CORS_ALLOWED_ORIGINS=https://ops.example.com
  3. Reverse proxy not forwarding — If behind Nginx or Traefik, verify the proxy passes requests to the correct backend port and preserves headers.

  4. Browser cache — Hard-refresh (Ctrl+Shift+R) or clear the browser cache after an upgrade.

Deployment Fails

Symptom: Deployment starts but ends with an error in the SSE log stream.

Solutions:

  1. SSH user lacks Docker permissions — The SSH user must be in the docker group:

    bash
    sudo usermod -aG docker <username>
  2. Compose file missing — Verify the Docker Compose file exists at the expected path on the remote host.

  3. Health check timeout — If the deployment waits for a health check that never passes, increase the timeout or verify the container's health endpoint is reachable.

  4. Image pull failure — Check that the Docker registry credentials are correct and the image tag exists.

WARNING

After adding a user to the docker group, the SSH session must be restarted for the change to take effect. You may need to disconnect and reconnect the environment in Ops Atlas.

License Not Activating

Symptom: License key is entered but the edition does not change.

Solutions:

  1. Edition mismatch — Ensure LICENSE_EDITION in your .env matches the key you purchased (e.g., PRO or ENTERPRISE).

  2. Network connectivity — If license validation requires an external check, ensure the backend can reach the validation server.

  3. Expired license — Check the validUntil date on your license. Expired keys are rejected.

  4. Restart after .env change — If you changed LICENSE_EDITION in the .env file, restart the backend:

    bash
    docker compose restart backend

Container Discovery Missing Services

Symptom: Some containers on a Docker host are not shown in the dashboard.

Solutions:

  1. Excluded patterns — Check the discovery configuration for exclude patterns. Containers matching these patterns are intentionally hidden.

  2. Container not running — By default, only running containers are discovered. Stopped containers may be filtered out depending on settings.

  3. Refresh needed — Force a refresh via the API:

    bash
    curl -X POST http://localhost:8090/api/refresh \
      -H "Authorization: Bearer <token>"

Redis / Eureka Connection Errors

Symptom: Redis or Eureka pages show connection refused or timeout errors.

Solutions:

  1. Per-environment configuration — Redis and Eureka connections are configured per environment. Verify the host and port in the environment settings.

  2. Network reachability — Ensure the backend container can reach the Redis/Eureka host. From inside the backend container:

    bash
    docker compose exec backend curl telnet://redis-host:6379
  3. Authentication — If Redis requires a password, ensure it is set in the environment configuration.

High Memory Usage

Symptom: Backend container consumes excessive memory or is OOM-killed.

Solutions:

  1. Adjust JVM heap — Set JAVA_OPTS in your .env to control memory allocation:

    JAVA_OPTS=-Xms256m -Xmx512m
  2. Monitoring refresh interval — Frequent polling of container stats across many environments can consume significant memory. Increase the refresh interval in Settings.

  3. Container limits — Set memory limits in docker-compose.yml:

    yaml
    services:
      backend:
        deploy:
          resources:
            limits:
              memory: 768M

TIP

For environments with more than 50 containers, consider increasing the JVM heap to -Xmx1g and setting a Docker memory limit of at least 1.5 GB.

Released under the MIT License.