# Plan: Browser Session Cookie Persistence **Generated**: 2026-05-29 **Estimated Complexity**: Medium ## Overview Remote browser profiles already live under `browser_profiles_dir` (`/app/data/browser-profiles` by default), so Chromium profile files are expected to survive container restarts when `/app/data` is mounted. The remaining login loss case is mainly session-only cookies that Chromium removes when Playwright closes a persistent context normally. Implement a backend-only cookie persistence layer in `BrowserSessionService`: - Save all cookies from active persistent browser contexts into a JSON file under the profile directory. - Restore those cookies immediately after `launch_persistent_context(...)` and before `page.goto(...)`. - Keep session-only cookies as session cookies when restoring, instead of rewriting them as long-lived cookies by default. - Exclude ephemeral auth-capture sessions so temporary login extraction profiles keep their current lifecycle. ## Prerequisites - Current Playwright dependency: `backend/requirements.txt` pins `playwright==1.52.0`. - Playwright `BrowserContext.add_cookies()` accepts cookies with `name`, `value`, and either `url` or `domain` + `path`; it also supports `expires`, `httpOnly`, `secure`, `sameSite`, and `partitionKey`. - No frontend changes are required. ## Key Design Decisions - **Storage location**: `browser-profiles/{profile_key}/session-cookies.json`. - **Scope**: persistent remote-browser sessions only. Do not persist cookies for `auth-capture-*` profiles. - **Save trigger**: after user interaction events, with debounce; before `context.close()` as a final save. - **Restore trigger**: after `launch_persistent_context(...)`, before the first navigation. - **Session cookie behavior**: if `expires` is missing, `None`, or negative, omit `expires` when restoring. This preserves session-cookie behavior inside the new context while still allowing our JSON backup to survive service restarts. - **Security**: treat the JSON file as sensitive. Keep it under the already-private profile directory and write it with owner-readable permissions where practical. ## Sprint 1: Cookie Persistence Helpers **Goal**: Add isolated helper methods without changing session behavior yet. **Demo/Validation**: - Unit tests can save, read, normalize, and restore cookie data using fake contexts and a temp profile directory. - Invalid or empty JSON files do not break browser startup. ### Task 1.1: Add cookie file path helper - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Add `_cookies_path(profile_key: str) -> Path`, returning `self._profile_dir(profile_key) / "session-cookies.json"`. - **Dependencies**: None - **Acceptance Criteria**: - Path is profile-local. - Existing `clear_profile()` deletes the cookie JSON automatically because it removes the full profile directory. - **Validation**: - Unit test with a temp `browser_profiles_dir` confirms path location. ### Task 1.2: Add cookie serialization helpers - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Add helpers: - `_normalize_cookie_for_save(cookie: dict[str, Any]) -> dict[str, Any] | None` - `_normalize_cookie_for_restore(cookie: dict[str, Any], now: float) -> dict[str, Any] | None` - **Dependencies**: Task 1.1 - **Acceptance Criteria**: - Preserve supported Playwright fields: `name`, `value`, `domain`, `path`, `expires`, `httpOnly`, `secure`, `sameSite`, `partitionKey`. - Drop unsupported or unserializable fields. - Skip cookies missing `name` or `value`. - For restore, skip expired cookies when `expires > 0 and expires <= now`. - For restore, omit `expires` when it is missing, `None`, or negative. - Ensure every restored cookie has either `domain` + `path` or `url`. - **Validation**: - Unit tests for persistent cookies, session cookies, expired cookies, and partitioned cookies. ### Task 1.3: Add atomic JSON read/write helpers - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Add: - `_read_saved_cookies(profile_key: str) -> list[dict[str, Any]]` - `_write_saved_cookies(profile_key: str, cookies: list[dict[str, Any]]) -> None` - **Dependencies**: Tasks 1.1 and 1.2 - **Acceptance Criteria**: - JSON schema includes `version`, `profile_key`, `saved_at`, and `cookies`. - Write is atomic via `session-cookies.json.tmp` then `replace(...)`. - Malformed JSON logs a warning and returns an empty cookie list. - Empty cookie list writes a valid file rather than failing. - **Validation**: - Unit tests for normal write/read, corrupted JSON, and atomic replacement. ## Sprint 2: Restore Cookies On Session Create **Goal**: Restore saved cookies before the first page load. **Demo/Validation**: - A fake Playwright context receives `add_cookies(...)` before `page.goto(...)`. - Startup continues even if restore fails. ### Task 2.1: Add restore method - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Add `_restore_cookies(session_or_context, profile_key: str) -> None`, using `context.add_cookies(cookies)` when saved cookies exist. - **Dependencies**: Sprint 1 - **Acceptance Criteria**: - No-op for missing JSON or empty cookie list. - Logs count of restored cookies at `info` or `debug` level without logging cookie values. - Catches Playwright restore errors and logs them without blocking session creation. - **Validation**: - Unit test fake context records restored cookies. - Unit test invalid cookie list does not raise. ### Task 2.2: Wire restore into `create()` - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Call restore after `launch_persistent_context(...)` and before `page.goto(...)`. - **Dependencies**: Task 2.1 - **Acceptance Criteria**: - Applies only to normal persistent sessions. - Existing health-check path for already-open sessions is unchanged. - `page.goto(...)` sees restored cookies on first request. - **Validation**: - Unit test call order with fakes. - Manual test: log into a remote page, restart backend, reopen page, verify logged-in state when server-side session is still valid. ### Task 2.3: Do not restore for `create_ephemeral()` - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Leave auth-capture sessions isolated. - **Dependencies**: Task 2.1 - **Acceptance Criteria**: - No cookie JSON is read for `auth-capture-*`. - Existing auth-capture cleanup behavior remains unchanged. - **Validation**: - Unit test or code assertion via fake profile key. ## Sprint 3: Save Cookies During Activity And Close **Goal**: Keep the JSON cache fresh while users interact and before Playwright closes the context. **Demo/Validation**: - User interactions cause cookie JSON to appear/update. - Closing a session saves cookies before `context.close()`. ### Task 3.1: Add save method - **Location**: `backend/app/services/browser_session_service.py` - **Description**: Add `_save_cookies(session: BrowserSession, *, force: bool = False) -> None`. - **Dependencies**: Sprint 1 - **Acceptance Criteria**: - Calls `await session.context.cookies()`. - Normalizes cookies and writes JSON. - Skips `auth-capture-*` sessions. - Does not log cookie values. - Handles closed contexts or Playwright errors without raising during cleanup. - **Validation**: - Unit test fake context cookies are written. - Unit test auth-capture profile is skipped. ### Task 3.2: Debounce saves after browser events - **Location**: `backend/app/services/browser_session_service.py` - **Description**: After supported `event()` actions complete, call `_save_cookies(session)` with a debounce interval, for example 5 seconds. - **Dependencies**: Task 3.1 - **Acceptance Criteria**: - Reuses existing `BrowserSession.last_saved_state_at`. - Saves after meaningful actions including click, type, key, reload, back, forward, resize, and scroll. - Does not save on every screenshot request. - Does not block event responses for long; if cookie reads are fast, inline is acceptable. If they prove slow, use a background task guarded by session lock. - **Validation**: - Unit test repeated events inside debounce produce one write. - Unit test event after debounce writes again. ### Task 3.3: Force save before close and shutdown - **Location**: `backend/app/services/browser_session_service.py` - **Description**: In `close()`, call `_save_cookies(session, force=True)` before `context.close()`. - **Dependencies**: Task 3.1 - **Acceptance Criteria**: - Save happens before CDP detach/close when possible. - `shutdown()` benefits automatically because it calls `close()` for each session. - Close still proceeds even if cookie save fails. - **Validation**: - Unit test fake session records save before context close. ## Sprint 4: Tests And Operational Verification **Goal**: Prove the feature works without depending entirely on live websites. **Demo/Validation**: - Unit tests pass. - Manual Docker restart test demonstrates retained login for a site whose server-side session remains valid. ### Task 4.1: Add unit tests - **Location**: `backend/test_browser_session_service.py` - **Description**: Extend current fake-based tests for cookie persistence helper behavior. - **Dependencies**: Sprints 1-3 - **Acceptance Criteria**: - Tests cover save, restore, malformed JSON, expired cookie skip, session cookie restore, auth-capture skip, close-before-context-close order, and event debounce. - **Validation**: - Run `pytest backend/test_browser_session_service.py`. ### Task 4.2: Add manual verification checklist - **Location**: `browser-session-cookie-persistence-plan.md` or a future PR description - **Description**: Document how to verify in Docker. - **Dependencies**: Implementation complete - **Acceptance Criteria**: - Start Docker deployment with `/app/data` mounted. - Open remote browser page and log in. - Perform an event after login to trigger cookie save. - Confirm `session-cookies.json` exists under the matching profile directory. - Restart backend/container. - Reopen remote browser page and verify login state. - Clear profile and confirm both Chromium profile and cookie JSON are removed. ## Testing Strategy - **Unit tests**: Primary coverage using fake context/page/session objects. This avoids requiring real browser binaries in normal test runs. - **Manual integration test**: Required for confidence because real sites differ in cookie, localStorage, and server-side session behavior. - **Regression checks**: - `pytest backend/test_browser_session_service.py` - Existing backend tests if time allows: `pytest backend` ## Potential Risks & Gotchas - Some sites use server-side session expiry or revocation; restoring cookies cannot bypass that. - Some sites bind sessions to IP, user agent, device fingerprint, or TLS/browser state. - Some login state is stored in `localStorage`, `sessionStorage`, IndexedDB, or service-worker cache. Persistent context already helps with some of this, but this plan only adds explicit cookie backup. - `sessionStorage` is tab-lifetime state and is not covered here. If a target site depends on it heavily, a later phase can add origin-scoped storage backup. - Cookie JSON contains authentication secrets. It must not be committed, logged, or exposed via APIs. - For CHIPS/partitioned cookies, preserve `partitionKey` when Playwright returns it. - Do not rewrite all session cookies to long-lived cookies by default; that changes browser semantics and may create security surprises. ## Rollback Plan - Remove the new helper methods and calls from `BrowserSessionService.create()`, `event()`, and `close()`. - Delete `session-cookies.json` files from affected profile directories if needed. - Existing Chromium persistent profile behavior will continue to work as before. ## Open Decisions - Whether to add a config flag such as `browser_cookie_persistence_enabled: bool = True`. Default can be enabled because this directly addresses the production issue. - Whether to also save `localStorage` through Playwright `storage_state()` in a later phase. Not required for the first implementation. - Whether cookie JSON should be encrypted at rest. For the current Docker single-host deployment, profile-directory isolation is probably sufficient; encryption can be added if this becomes multi-tenant or shared-host.