Nav apraksta

Marius b754e816d2 Initial commit 1 dienu atpakaļ
.env.example b754e816d2 Initial commit 1 dienu atpakaļ
README.md b754e816d2 Initial commit 1 dienu atpakaļ
requirements.txt b754e816d2 Initial commit 1 dienu atpakaļ
services_uptime_monitor.py b754e816d2 Initial commit 1 dienu atpakaļ
uptime_history.db b754e816d2 Initial commit 1 dienu atpakaļ

README.md

Notion Uptime Monitor

Monitors the availability of self-hosted services on smallmountains.de and publishes live status to a Notion dashboard. A Python script runs every 5 minutes via cron, checks each service, stores history locally in SQLite, and writes the current status + rolling uptime percentages to Notion.

Notion Dashboard: https://app.notion.com/p/38210a5f51bd807bae1edb699d9591e8


Table of Contents

  1. User Guide
  2. Architecture
  3. Services
  4. Technical Reference
  5. Deployment (NAS / Docker)
  6. Local Development
  7. Adding or Removing Services
  8. Troubleshooting

User Guide

Reading the Dashboard

The Notion page has two sections:

Last Updated callout — shows the timestamp of the most recent check run. If this is stale by more than 10 minutes, the monitoring script may have stopped.

Service Status table — one row per service with these columns:

Column What it means
Service Name of the monitored service
Status Online (green) or Offline (red) or Unknown (gray — no data yet)
Response Time (ms) How long the last successful HTTP request took. Empty for UDP services (Factorio).
Last Checked Exact timestamp of the last check for that service
Uptime 24h % Percentage of checks in the last 24 hours where the service was reachable
Uptime 7d % Same, over the last 7 days
Uptime 30d % Same, over the last 30 days

Uptime percentages are blank for the first few checks — they fill in as history accumulates.

What counts as "Online"?

  • HTTP services: The service returns an HTTP response with status code below 500 (2xx, 3xx, 4xx all count as online — a 4xx means the server is up but rejected the request, which is fine for monitoring purposes). A network error or 5xx response counts as offline.
  • Factorio (UDP): A probe packet is sent to localhost:34197. If the OS returns an immediate port-unreachable error, Factorio is offline. If the probe times out (the server is running but ignores unknown packets), the host is checked via ICMP ping as a tiebreaker.

Typical causes of false "Offline" readings

  • A service restarted mid-check (transient)
  • SSL certificate expired on a service
  • The NAS itself was rebooting (all services offline simultaneously)
  • Gitea is intentionally down for maintenance

Architecture

┌─────────────────────────────────────────┐
│  Ugreen NAS (Docker)                    │
│                                         │
│  ┌─────────────┐   every 5 min (cron)   │
│  │ monitor.py  │──────────────────────┐ │
│  └──────┬──────┘                      │ │
│         │ checks                      │ │
│  ┌──────▼──────────────────────────┐  │ │
│  │ Services                        │  │ │
│  │  • HTTP GET → status code       │  │ │
│  │  • UDP probe → port reachable?  │  │ │
│  └──────┬──────────────────────────┘  │ │
│         │ results                     │ │
│  ┌──────▼──────┐                      │ │
│  │  SQLite DB  │ uptime_history.db    │ │
│  │  (35d ring) │                      │ │
│  └──────┬──────┘                      │ │
│         │ computes uptime %           │ │
└─────────┼─────────────────────────────┘ │
          │                               │
          │ Notion API (HTTPS)            │
          ▼                               │
┌─────────────────────────────────────────┤
│  Notion — Services database             │
│  (one row per service, live data)       │
└─────────────────────────────────────────┘

Data flow per run:

  1. Script starts, opens/creates uptime_history.db
  2. Prunes records older than 35 days
  3. For each service: runs the appropriate check, records result in SQLite
  4. Queries SQLite to compute 24h / 7d / 30d uptime percentages from stored history
  5. Updates the corresponding row in the Notion database via the Notion API
  6. Scans the Uptime Tracker page blocks to find the "Last Updated" callout and refreshes its timestamp
  7. Exits — next run is triggered by cron

Services

Services are configured entirely in the Notion database — the script reads them fresh on every run. No code changes are needed to add, rename, or reconfigure a service.

Each row in the Services database has a Check Type column (HTTP or UDP) that controls how the service is checked:

Service Check Type URL / Endpoint
Plex HTTP https://plex.smallmountains.de
Gitea HTTP https://git.smallmountains.de
Audiobookshelf HTTP https://audiobooks.smallmountains.de
Sounds HTTP https://sounds.smallmountains.de
Cloud (NAS) HTTP https://cloud.smallmountains.de
Kitchenowl HTTP https://home.smallmountains.de
Factorio UDP localhost:34197 (see note below)

Factorio UDP note: External UDP checks are unreliable for Factorio because:

  • The server ignores unrecognized UDP packets (no standard "ping" response)
  • ICMP is blocked on the host (common for game servers)

The URL is set to localhost:34197 so the check runs locally on the NAS, where the OS returns an immediate port-unreachable error when the container is down. To use a different host, just edit the URL field of the Factorio row in Notion.


Technical Reference

Files

.
├── services_uptime_monitor.py   # Main monitoring script
├── requirements.txt             # Python dependencies
├── uptime_history.db            # SQLite database (created on first run)
└── README.md                    # This file

Configuration

Two constants are hardcoded at the top of services_uptime_monitor.py:

Variable Value Description
NOTION_TOKEN secret_b7Pi… Notion integration API token for "PyBot"
NOTION_DATA_SOURCE_ID 22174dd2… Collection ID of the Services database — the script queries this on every run to get the current service list
UPTIME_PAGE_ID 38210a5f… Notion page ID of the Uptime Tracker

All service configuration (name, URL, check type) lives in the Notion database and is read dynamically — see Adding or Removing Services.

Dependencies

notion-client>=2.2.1   # Official Notion Python SDK
requests>=2.31.0       # HTTP checks

SQLite Schema

The uptime_history.db file contains one table:

CREATE TABLE checks (
    id               INTEGER PRIMARY KEY AUTOINCREMENT,
    service_name     TEXT    NOT NULL,
    checked_at       TEXT    NOT NULL,   -- ISO-8601 UTC datetime
    is_online        INTEGER NOT NULL,   -- 1 = online, 0 = offline
    response_time_ms REAL                -- NULL for UDP checks
);

CREATE INDEX idx_service_time ON checks (service_name, checked_at);

Records older than 35 days are pruned on each run. At 5-minute intervals across 7 services, steady-state size is roughly ~2 MB.

Notion Integration

The script uses the Notion API v1 via notion-client. It accesses two resources:

Services database (ID: 774cb57bfa2c43058d400ed8ce3165d5)
One row per service. The script calls pages.update() with the page ID of each row to set:

{
    "Status":             {"select": {"name": "Online" | "Offline"}},
    "Last Checked":       {"date": {"start": "<ISO-8601 UTC>"}},
    "Response Time (ms)": {"number": <float | null>},
    "Uptime 24h %":       {"number": <0.0–1.0 | null>},   # Notion stores % as fraction
    "Uptime 7d %":        {"number": <0.0–1.0 | null>},
    "Uptime 30d %":       {"number": <0.0–1.0 | null>},
}

Uptime Tracker page (ID: 38210a5f51bd807bae1edb699d9591e8)
The script scans top-level blocks, finds the one containing "Last Updated", and calls blocks.update() to replace its rich text with the current timestamp.

Check Logic

HTTP check (check_http)

GET <url> with 10s timeout, follow redirects
→ status < 500   : online  (returns True, response_time_ms)
→ status >= 500  : offline (returns False, response_time_ms)
→ exception      : offline (returns False, None)

UDP check (check_udp)

UDP socket → connect to host:port → send 4-byte probe
→ response received        : online
→ ConnectionRefusedError   : offline (ICMP port-unreachable)
→ socket.timeout           : fallback to ICMP ping
    → ping succeeds        : online  (host up, Factorio ignoring probe)
    → ping fails           : offline
→ socket.gaierror / OSError: offline (DNS failure or network error)

Uptime calculation

uptime(hours) = (checks where is_online=1 in last <hours>h)
              / (total checks in last <hours>h)

Returns None (shown as blank in Notion) when fewer than 1 check exists for the window.


Deployment (NAS / Docker)

The script is designed to run inside a Docker container on the Ugreen NAS. The NAS has a watched folder — any .py files placed there are executed by a pre-existing Python runner container.

Steps

  1. Copy files to the watched folder:

    monitor.py
    requirements.txt
    
  2. Install dependencies (once, inside the container):

    pip3 install -r requirements.txt
    
  3. Set up cron to run every 5 minutes:

    */5 * * * * /usr/local/bin/python3 /path/to/services_uptime_monitor.py >> /path/to/monitor.log 2>&1
    

    Adjust the Python path with which python3 if needed.

  4. Verify Factorio access: By default the script checks localhost:34197. If the Factorio container is on a different Docker network, change "host": "localhost" in the Factorio entry in SERVICES to the container's hostname or IP.

  5. Check the first run:

    python3 services_uptime_monitor.py
    

    Expected output:

    [2026-06-17 21:37 UTC] Running uptime checks...
     Plex                 ONLINE   183ms
     Gitea                ONLINE   145ms
     ...
     Factorio             ONLINE   —
    Done.
    

Persistence

The SQLite database (uptime_history.db) must survive container restarts to preserve uptime history. Mount the folder containing the script as a persistent volume, or move DB_PATH to a dedicated data directory:

DB_PATH = Path("/data/uptime_history.db")  # example override

Local Development

To test the script on a Mac:

# Install dependencies
pip3 install -r requirements.txt

# Run once
python3 services_uptime_monitor.py

Expected on Mac: All HTTP services show their real status. Factorio shows Offline — this is correct, since localhost:34197 has nothing listening locally. To test against the real server, temporarily change "host": "localhost" to "host": "game.smallmountains.de" in the Factorio entry (note: will likely still time out due to ICMP/firewall restrictions on the server side).


Adding or Removing Services

Adding an HTTP service

  1. Open the Services database in Notion
  2. Add a new row and fill in:
    • Service — display name
    • URL — the full HTTPS URL to check (e.g. https://my-service.smallmountains.de)
    • Check Type — select HTTP
  3. The next script run picks it up automatically — no code changes needed

Adding a UDP service

Same as above, but:

  • URL — set to host:port format (e.g. localhost:1234 for a co-located service)
  • Check Type — select UDP

See the Factorio note for why localhost is recommended for services running on the same machine.

Removing a service

Delete or archive the row in the Notion database. The script will no longer check it from the next run. The SQLite history for that service name is retained but no longer updated — delete it manually with:

DELETE FROM checks WHERE service_name = 'ServiceName';

Troubleshooting

"Could not find page" error in Notion
The "PyBot" integration has lost access to the page. Go to the Uptime Tracker page in Notion → ... menu → Connections → re-add PyBot.

Uptime % is blank after many runs
The window has no data yet (fresh install) or all checks in the window failed to write to SQLite. Check that uptime_history.db is writable by the script's user.

"Last Updated" timestamp is not refreshing
The callout block on the Notion page was deleted or its text no longer contains "Last Updated". Re-add a callout to the page containing that phrase — the script finds it by scanning block content, not by a hardcoded block ID.

Factorio shows Offline on the NAS
Check that the Factorio container and the monitoring container share a Docker network, or that localhost resolves to the NAS host (host network mode). If not, change "host": "localhost" to the Factorio container's hostname or internal IP in services_uptime_monitor.py.

All services Offline simultaneously
Usually means a network outage, NAS reboot, or the script lost its internet route. Check monitor.log for exception messages rather than OFFLINE status lines.