Files
SmartUp/browser-session-cookie-persistence-plan.md
2026-05-29 17:51:12 +08:00

12 KiB

Plan: Browser Session Cookie Persistence

Generated: 2026-05-29 Estimated Complexity: Medium

Overview

Remote browser profiles already live under browser_profiles_dir (/app/data/browser-profiles by default), so Chromium profile files are expected to survive container restarts when /app/data is mounted. The remaining login loss case is mainly session-only cookies that Chromium removes when Playwright closes a persistent context normally.

Implement a backend-only cookie persistence layer in BrowserSessionService:

  • Save all cookies from active persistent browser contexts into a JSON file under the profile directory.
  • Restore those cookies immediately after launch_persistent_context(...) and before page.goto(...).
  • Keep session-only cookies as session cookies when restoring, instead of rewriting them as long-lived cookies by default.
  • Exclude ephemeral auth-capture sessions so temporary login extraction profiles keep their current lifecycle.

Prerequisites

  • Current Playwright dependency: backend/requirements.txt pins playwright==1.52.0.
  • Playwright BrowserContext.add_cookies() accepts cookies with name, value, and either url or domain + path; it also supports expires, httpOnly, secure, sameSite, and partitionKey.
  • No frontend changes are required.

Key Design Decisions

  • Storage location: browser-profiles/{profile_key}/session-cookies.json.
  • Scope: persistent remote-browser sessions only. Do not persist cookies for auth-capture-* profiles.
  • Save trigger: after user interaction events, with debounce; before context.close() as a final save.
  • Restore trigger: after launch_persistent_context(...), before the first navigation.
  • Session cookie behavior: if expires is missing, None, or negative, omit expires when restoring. This preserves session-cookie behavior inside the new context while still allowing our JSON backup to survive service restarts.
  • Security: treat the JSON file as sensitive. Keep it under the already-private profile directory and write it with owner-readable permissions where practical.

Goal: Add isolated helper methods without changing session behavior yet.

Demo/Validation:

  • Unit tests can save, read, normalize, and restore cookie data using fake contexts and a temp profile directory.
  • Invalid or empty JSON files do not break browser startup.
  • Location: backend/app/services/browser_session_service.py
  • Description: Add _cookies_path(profile_key: str) -> Path, returning self._profile_dir(profile_key) / "session-cookies.json".
  • Dependencies: None
  • Acceptance Criteria:
    • Path is profile-local.
    • Existing clear_profile() deletes the cookie JSON automatically because it removes the full profile directory.
  • Validation:
    • Unit test with a temp browser_profiles_dir confirms path location.
  • Location: backend/app/services/browser_session_service.py
  • Description: Add helpers:
    • _normalize_cookie_for_save(cookie: dict[str, Any]) -> dict[str, Any] | None
    • _normalize_cookie_for_restore(cookie: dict[str, Any], now: float) -> dict[str, Any] | None
  • Dependencies: Task 1.1
  • Acceptance Criteria:
    • Preserve supported Playwright fields: name, value, domain, path, expires, httpOnly, secure, sameSite, partitionKey.
    • Drop unsupported or unserializable fields.
    • Skip cookies missing name or value.
    • For restore, skip expired cookies when expires > 0 and expires <= now.
    • For restore, omit expires when it is missing, None, or negative.
    • Ensure every restored cookie has either domain + path or url.
  • Validation:
    • Unit tests for persistent cookies, session cookies, expired cookies, and partitioned cookies.

Task 1.3: Add atomic JSON read/write helpers

  • Location: backend/app/services/browser_session_service.py
  • Description: Add:
    • _read_saved_cookies(profile_key: str) -> list[dict[str, Any]]
    • _write_saved_cookies(profile_key: str, cookies: list[dict[str, Any]]) -> None
  • Dependencies: Tasks 1.1 and 1.2
  • Acceptance Criteria:
    • JSON schema includes version, profile_key, saved_at, and cookies.
    • Write is atomic via session-cookies.json.tmp then replace(...).
    • Malformed JSON logs a warning and returns an empty cookie list.
    • Empty cookie list writes a valid file rather than failing.
  • Validation:
    • Unit tests for normal write/read, corrupted JSON, and atomic replacement.

Sprint 2: Restore Cookies On Session Create

Goal: Restore saved cookies before the first page load.

Demo/Validation:

  • A fake Playwright context receives add_cookies(...) before page.goto(...).
  • Startup continues even if restore fails.

Task 2.1: Add restore method

  • Location: backend/app/services/browser_session_service.py
  • Description: Add _restore_cookies(session_or_context, profile_key: str) -> None, using context.add_cookies(cookies) when saved cookies exist.
  • Dependencies: Sprint 1
  • Acceptance Criteria:
    • No-op for missing JSON or empty cookie list.
    • Logs count of restored cookies at info or debug level without logging cookie values.
    • Catches Playwright restore errors and logs them without blocking session creation.
  • Validation:
    • Unit test fake context records restored cookies.
    • Unit test invalid cookie list does not raise.

Task 2.2: Wire restore into create()

  • Location: backend/app/services/browser_session_service.py
  • Description: Call restore after launch_persistent_context(...) and before page.goto(...).
  • Dependencies: Task 2.1
  • Acceptance Criteria:
    • Applies only to normal persistent sessions.
    • Existing health-check path for already-open sessions is unchanged.
    • page.goto(...) sees restored cookies on first request.
  • Validation:
    • Unit test call order with fakes.
    • Manual test: log into a remote page, restart backend, reopen page, verify logged-in state when server-side session is still valid.

Task 2.3: Do not restore for create_ephemeral()

  • Location: backend/app/services/browser_session_service.py
  • Description: Leave auth-capture sessions isolated.
  • Dependencies: Task 2.1
  • Acceptance Criteria:
    • No cookie JSON is read for auth-capture-*.
    • Existing auth-capture cleanup behavior remains unchanged.
  • Validation:
    • Unit test or code assertion via fake profile key.

Sprint 3: Save Cookies During Activity And Close

Goal: Keep the JSON cache fresh while users interact and before Playwright closes the context.

Demo/Validation:

  • User interactions cause cookie JSON to appear/update.
  • Closing a session saves cookies before context.close().

Task 3.1: Add save method

  • Location: backend/app/services/browser_session_service.py
  • Description: Add _save_cookies(session: BrowserSession, *, force: bool = False) -> None.
  • Dependencies: Sprint 1
  • Acceptance Criteria:
    • Calls await session.context.cookies().
    • Normalizes cookies and writes JSON.
    • Skips auth-capture-* sessions.
    • Does not log cookie values.
    • Handles closed contexts or Playwright errors without raising during cleanup.
  • Validation:
    • Unit test fake context cookies are written.
    • Unit test auth-capture profile is skipped.

Task 3.2: Debounce saves after browser events

  • Location: backend/app/services/browser_session_service.py
  • Description: After supported event() actions complete, call _save_cookies(session) with a debounce interval, for example 5 seconds.
  • Dependencies: Task 3.1
  • Acceptance Criteria:
    • Reuses existing BrowserSession.last_saved_state_at.
    • Saves after meaningful actions including click, type, key, reload, back, forward, resize, and scroll.
    • Does not save on every screenshot request.
    • Does not block event responses for long; if cookie reads are fast, inline is acceptable. If they prove slow, use a background task guarded by session lock.
  • Validation:
    • Unit test repeated events inside debounce produce one write.
    • Unit test event after debounce writes again.

Task 3.3: Force save before close and shutdown

  • Location: backend/app/services/browser_session_service.py
  • Description: In close(), call _save_cookies(session, force=True) before context.close().
  • Dependencies: Task 3.1
  • Acceptance Criteria:
    • Save happens before CDP detach/close when possible.
    • shutdown() benefits automatically because it calls close() for each session.
    • Close still proceeds even if cookie save fails.
  • Validation:
    • Unit test fake session records save before context close.

Sprint 4: Tests And Operational Verification

Goal: Prove the feature works without depending entirely on live websites.

Demo/Validation:

  • Unit tests pass.
  • Manual Docker restart test demonstrates retained login for a site whose server-side session remains valid.

Task 4.1: Add unit tests

  • Location: backend/test_browser_session_service.py
  • Description: Extend current fake-based tests for cookie persistence helper behavior.
  • Dependencies: Sprints 1-3
  • Acceptance Criteria:
    • Tests cover save, restore, malformed JSON, expired cookie skip, session cookie restore, auth-capture skip, close-before-context-close order, and event debounce.
  • Validation:
    • Run pytest backend/test_browser_session_service.py.

Task 4.2: Add manual verification checklist

  • Location: browser-session-cookie-persistence-plan.md or a future PR description
  • Description: Document how to verify in Docker.
  • Dependencies: Implementation complete
  • Acceptance Criteria:
    • Start Docker deployment with /app/data mounted.
    • Open remote browser page and log in.
    • Perform an event after login to trigger cookie save.
    • Confirm session-cookies.json exists under the matching profile directory.
    • Restart backend/container.
    • Reopen remote browser page and verify login state.
    • Clear profile and confirm both Chromium profile and cookie JSON are removed.

Testing Strategy

  • Unit tests: Primary coverage using fake context/page/session objects. This avoids requiring real browser binaries in normal test runs.
  • Manual integration test: Required for confidence because real sites differ in cookie, localStorage, and server-side session behavior.
  • Regression checks:
    • pytest backend/test_browser_session_service.py
    • Existing backend tests if time allows: pytest backend

Potential Risks & Gotchas

  • Some sites use server-side session expiry or revocation; restoring cookies cannot bypass that.
  • Some sites bind sessions to IP, user agent, device fingerprint, or TLS/browser state.
  • Some login state is stored in localStorage, sessionStorage, IndexedDB, or service-worker cache. Persistent context already helps with some of this, but this plan only adds explicit cookie backup.
  • sessionStorage is tab-lifetime state and is not covered here. If a target site depends on it heavily, a later phase can add origin-scoped storage backup.
  • Cookie JSON contains authentication secrets. It must not be committed, logged, or exposed via APIs.
  • For CHIPS/partitioned cookies, preserve partitionKey when Playwright returns it.
  • Do not rewrite all session cookies to long-lived cookies by default; that changes browser semantics and may create security surprises.

Rollback Plan

  • Remove the new helper methods and calls from BrowserSessionService.create(), event(), and close().
  • Delete session-cookies.json files from affected profile directories if needed.
  • Existing Chromium persistent profile behavior will continue to work as before.

Open Decisions

  • Whether to add a config flag such as browser_cookie_persistence_enabled: bool = True. Default can be enabled because this directly addresses the production issue.
  • Whether to also save localStorage through Playwright storage_state() in a later phase. Not required for the first implementation.
  • Whether cookie JSON should be encrypted at rest. For the current Docker single-host deployment, profile-directory isolation is probably sufficient; encryption can be added if this becomes multi-tenant or shared-host.