# Exaplay Service Analysis

## Question

> Why do we still need the exaplay service? Only licensing and autostart?

**Short answer: No.** Licensing and autostart are only two of the service's responsibilities. The *primary* reason the service must remain a separate process is the **watchdog / crash-recovery loop**.

***

> **Follow-up: Would the autoheal-functionality also belong there?**

**Yes — partially, and it already does.** Auto-heal is a *shared* feature split across the engine and the service by architectural necessity. See the [Auto-heal architecture](#auto-heal-architecture) section below.

***

## Responsibilities of `exaplay3-service.exe`

### 1. Watchdog — crash detection and automatic restart *(core reason)*

The service runs a dedicated thread (`WD_Start`) that polls `exaplay3.exe` every second via `OpenProcess` / `GetExitCodeProcess`.

* If the process exits **gracefully** (the `exaplay3_end_ok_*` marker file is present in the work folder), the service logs the normal shutdown and does **not** restart.
* If the process exits **without** that marker (crash or kill), `on_WD_End()` fires:
  * Reads `config/config_system.json` to check `watchdog-restart`.
  * Implements **crash-loop detection**: counts crashes within 60-second windows; suppresses auto-restart after 3 rapid crashes to prevent an infinite loop.
  * Restarts `exaplay3.exe` with the `-autoheal-restore` flag (first crash) or `-fresh` (second crash) so the engine can recover its last saved playback state.
* If the process **hangs** (still running but its `autoheal_health.json` heartbeat is not updated within `autoheal.hang-timeout` seconds, default 30 s), `on_WD_Tick()` terminates it with `TerminateProcess` so the crash-recovery path takes over on the next tick.

**This functionality cannot live inside `exaplay3.exe` itself** — if the engine process dies or hangs, the in-process code dies with it. An external watchdog is the only reliable way to detect and recover from crashes and hangs.

### 2. Autostart — launching Exaplay at Windows startup

`Service::create()` checks `config_startup.json` for `show-mode == true`. If set, it launches `exaplay3.exe` automatically when the service starts (which can be at Windows startup via the installer's startup registration).

This is genuinely "autostart" as expected, though it could also be implemented with a Windows Task Scheduler entry. The service approach keeps all startup logic in one place.

### 3. Licensing — VIOSO / Soraco QLM validation

`main.cpp` instantiates `vioso::License` and calls `Validate()`. The tray menu exposes a **Licensing** item that calls `LaunchLicenseWizard()`.

Licensing could theoretically move into `exaplay3.exe` (and `#ifdef USE_LICENSE` in `exaplay_backend/main.cpp` already shows the scaffolding for this), but until that migration is complete the service is the host for QLM validation.

### 4. Service HTTP server — remote management API

`start_comm()` binds an HTTP server on `port + 1` (i.e., 8124 when Exaplay is on 8123) that:

| Endpoint         | Purpose                                                                                      |
| ---------------- | -------------------------------------------------------------------------------------------- |
| `GET /` (static) | Serves the service web UI from `html/exaplay_service/`                                       |
| `WS /info`       | Real-time JSON status: `status_service`, `status_exaplay`, `status_vnc`                      |
| `POST /wdog`     | Remote commands: `exaplay:start`, `exaplay:restart`, `exaplay:stop`, `vnc:start`, `vnc:stop` |
| `GET /net/info`  | Lists reachable Exaplay and service addresses                                                |

This API enables remote monitoring and control without opening a full UI session.

### 5. Scheduled reboot

`on_WD_Tick()` reads `reboot` / `reboot-time` from the system config once per second and calls `ExitWindowsEx(EWX_REBOOT)` at the configured time (e.g., `03:00` for a nightly reboot). This is useful for 24/7 installations that require a daily reset.

### 6. Tray icon

The service provides its own system-tray icon (separate from `exaplay3.exe`'s tray icon) with quick-access to launch/restart/stop Exaplay, open the service UI, and access licensing.

***

## Auto-heal architecture

Auto-heal is split between the two processes by design.

```
exaplay3.exe (engine)             exaplay3-service.exe (service)
─────────────────────             ──────────────────────────────
autoheal_SaveState()              on_WD_End()
  writes autoheal_state.json  ←─  reads autoheal_crash_counter.json
  every 2 s (configurable)        decides restart strategy:
  (needs live composition            -autoheal-restore  (1st crash)
   objects — must be in engine)      -fresh             (2nd crash)
                                     no restart         (≥3 crashes)

autoheal_WriteHealthCheck()       on_WD_Tick()
  writes autoheal_health.json  ←─ reads autoheal_health.json
  every 5 s (configurable)        if heartbeat timestamp > hang-timeout
  (needs frame-count, display      → TerminateProcess + on_WD_End()
   list — must be in engine)       (hang detection)

autoheal_RestoreState()           (triggers restore indirectly)
  restores playback when           service restarts engine with
  -autoheal-restore flag           -autoheal-restore flag on crash
  (needs live project objects)
```

### What must stay in the engine

| Function                                     | Reason                                                             |
| -------------------------------------------- | ------------------------------------------------------------------ |
| `autoheal_SaveState()`                       | Reads live composition objects, cue indices, audio volumes         |
| `autoheal_RestoreState()`                    | Calls `comp->settime()`, `comp->play()` — requires running project |
| `autoheal_WriteHealthCheck()`                | Reads `m_GL_FrameCount`, display list — engine internals           |
| `autoheal_CheckDisplays/Media/Performance()` | Access render pipeline, project model                              |

### What belongs in the service (and now is there)

| Function             | Location       | Notes                                               |
| -------------------- | -------------- | --------------------------------------------------- |
| Crash detection      | `on_WD_Tick()` | Process exit check via `GetExitCodeProcess`         |
| **Hang detection**   | `on_WD_Tick()` | Heartbeat age check → `TerminateProcess` *(new)*    |
| Crash-loop detection | `on_WD_End()`  | Reads/writes `autoheal_crash_counter.json`          |
| Restart decision     | `on_WD_End()`  | Decides `-autoheal-restore` / `-fresh` / no-restart |

### Configuration

| Key                        | Default | Owner            | Description                                                             |
| -------------------------- | ------- | ---------------- | ----------------------------------------------------------------------- |
| `autoheal.enabled`         | `true`  | engine + service | Master switch                                                           |
| `autoheal.save-interval`   | `"2"`   | engine           | How often to save state (seconds)                                       |
| `autoheal.health-interval` | `"5"`   | engine           | How often to write heartbeat (seconds)                                  |
| `autoheal.hang-timeout`    | `30`    | **service**      | Seconds without a heartbeat before treating as a hang (min 10, max 300) |
| `autoheal.max-state-age`   | `3600`  | engine           | Maximum age of state file accepted for restore                          |
| `watchdog-restart`         | `false` | service          | Enable auto-restart on crash                                            |
| `watchdog-restart-wait`    | `"0"`   | service          | Seconds to wait before restart                                          |

***

## Summary

| Responsibility                 | Still needed?     | Notes                                             |
| ------------------------------ | ----------------- | ------------------------------------------------- |
| **Watchdog / crash recovery**  | ✅ Yes — essential | Cannot be inside the engine process               |
| **Hang detection (autoheal)**  | ✅ Yes — essential | Service reads heartbeat; terminates frozen engine |
| **Autostart (show-mode)**      | ✅ Yes             | Could be a Task Scheduler task instead            |
| **Licensing (QLM)**            | ✅ Yes (currently) | Could move to `exaplay3.exe` via `USE_LICENSE`    |
| **Remote management HTTP API** | ✅ Yes             | Used by service web UI and third-party tools      |
| **Scheduled reboot**           | ✅ Yes             | Required for 24/7 installations                   |
| **Tray icon**                  | ✅ Yes             | Operator shortcut for service-level actions       |

The service cannot be reduced to only licensing and autostart without losing crash recovery, hang detection, remote management, and scheduled reboot — all of which are required for unattended production use.

***

## Potential future simplification

If the intent is to eventually remove the service, the following steps would be required:

1. **Move watchdog + hang detection** to a lightweight external process (or Windows service with `sc.exe`) that monitors `exaplay3.exe` and reads its heartbeat.
2. **Move licensing** into `exaplay3.exe` (the `USE_LICENSE` ifdef already exists).
3. **Replace autostart** with a Windows Task Scheduler entry or startup folder shortcut.
4. **Remove or merge the service HTTP API** into the main engine's HTTP server.

Until those steps are taken the service remains necessary for production deployments.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.exaplay.one/v3/developer-reference/service-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
