12 KiB
Plan: Browser Session Cookie Persistence
Generated: 2026-05-29 Estimated Complexity: Medium
Overview
Remote browser profiles already live under browser_profiles_dir (/app/data/browser-profiles by default), so Chromium profile files are expected to survive container restarts when /app/data is mounted. The remaining login loss case is mainly session-only cookies that Chromium removes when Playwright closes a persistent context normally.
Implement a backend-only cookie persistence layer in BrowserSessionService:
- Save all cookies from active persistent browser contexts into a JSON file under the profile directory.
- Restore those cookies immediately after
launch_persistent_context(...)and beforepage.goto(...). - Keep session-only cookies as session cookies when restoring, instead of rewriting them as long-lived cookies by default.
- Exclude ephemeral auth-capture sessions so temporary login extraction profiles keep their current lifecycle.
Prerequisites
- Current Playwright dependency:
backend/requirements.txtpinsplaywright==1.52.0. - Playwright
BrowserContext.add_cookies()accepts cookies withname,value, and eitherurlordomain+path; it also supportsexpires,httpOnly,secure,sameSite, andpartitionKey. - No frontend changes are required.
Key Design Decisions
- Storage location:
browser-profiles/{profile_key}/session-cookies.json. - Scope: persistent remote-browser sessions only. Do not persist cookies for
auth-capture-*profiles. - Save trigger: after user interaction events, with debounce; before
context.close()as a final save. - Restore trigger: after
launch_persistent_context(...), before the first navigation. - Session cookie behavior: if
expiresis missing,None, or negative, omitexpireswhen restoring. This preserves session-cookie behavior inside the new context while still allowing our JSON backup to survive service restarts. - Security: treat the JSON file as sensitive. Keep it under the already-private profile directory and write it with owner-readable permissions where practical.
Sprint 1: Cookie Persistence Helpers
Goal: Add isolated helper methods without changing session behavior yet.
Demo/Validation:
- Unit tests can save, read, normalize, and restore cookie data using fake contexts and a temp profile directory.
- Invalid or empty JSON files do not break browser startup.
Task 1.1: Add cookie file path helper
- Location:
backend/app/services/browser_session_service.py - Description: Add
_cookies_path(profile_key: str) -> Path, returningself._profile_dir(profile_key) / "session-cookies.json". - Dependencies: None
- Acceptance Criteria:
- Path is profile-local.
- Existing
clear_profile()deletes the cookie JSON automatically because it removes the full profile directory.
- Validation:
- Unit test with a temp
browser_profiles_dirconfirms path location.
- Unit test with a temp
Task 1.2: Add cookie serialization helpers
- Location:
backend/app/services/browser_session_service.py - Description: Add helpers:
_normalize_cookie_for_save(cookie: dict[str, Any]) -> dict[str, Any] | None_normalize_cookie_for_restore(cookie: dict[str, Any], now: float) -> dict[str, Any] | None
- Dependencies: Task 1.1
- Acceptance Criteria:
- Preserve supported Playwright fields:
name,value,domain,path,expires,httpOnly,secure,sameSite,partitionKey. - Drop unsupported or unserializable fields.
- Skip cookies missing
nameorvalue. - For restore, skip expired cookies when
expires > 0 and expires <= now. - For restore, omit
expireswhen it is missing,None, or negative. - Ensure every restored cookie has either
domain+pathorurl.
- Preserve supported Playwright fields:
- Validation:
- Unit tests for persistent cookies, session cookies, expired cookies, and partitioned cookies.
Task 1.3: Add atomic JSON read/write helpers
- Location:
backend/app/services/browser_session_service.py - Description: Add:
_read_saved_cookies(profile_key: str) -> list[dict[str, Any]]_write_saved_cookies(profile_key: str, cookies: list[dict[str, Any]]) -> None
- Dependencies: Tasks 1.1 and 1.2
- Acceptance Criteria:
- JSON schema includes
version,profile_key,saved_at, andcookies. - Write is atomic via
session-cookies.json.tmpthenreplace(...). - Malformed JSON logs a warning and returns an empty cookie list.
- Empty cookie list writes a valid file rather than failing.
- JSON schema includes
- Validation:
- Unit tests for normal write/read, corrupted JSON, and atomic replacement.
Sprint 2: Restore Cookies On Session Create
Goal: Restore saved cookies before the first page load.
Demo/Validation:
- A fake Playwright context receives
add_cookies(...)beforepage.goto(...). - Startup continues even if restore fails.
Task 2.1: Add restore method
- Location:
backend/app/services/browser_session_service.py - Description: Add
_restore_cookies(session_or_context, profile_key: str) -> None, usingcontext.add_cookies(cookies)when saved cookies exist. - Dependencies: Sprint 1
- Acceptance Criteria:
- No-op for missing JSON or empty cookie list.
- Logs count of restored cookies at
infoordebuglevel without logging cookie values. - Catches Playwright restore errors and logs them without blocking session creation.
- Validation:
- Unit test fake context records restored cookies.
- Unit test invalid cookie list does not raise.
Task 2.2: Wire restore into create()
- Location:
backend/app/services/browser_session_service.py - Description: Call restore after
launch_persistent_context(...)and beforepage.goto(...). - Dependencies: Task 2.1
- Acceptance Criteria:
- Applies only to normal persistent sessions.
- Existing health-check path for already-open sessions is unchanged.
page.goto(...)sees restored cookies on first request.
- Validation:
- Unit test call order with fakes.
- Manual test: log into a remote page, restart backend, reopen page, verify logged-in state when server-side session is still valid.
Task 2.3: Do not restore for create_ephemeral()
- Location:
backend/app/services/browser_session_service.py - Description: Leave auth-capture sessions isolated.
- Dependencies: Task 2.1
- Acceptance Criteria:
- No cookie JSON is read for
auth-capture-*. - Existing auth-capture cleanup behavior remains unchanged.
- No cookie JSON is read for
- Validation:
- Unit test or code assertion via fake profile key.
Sprint 3: Save Cookies During Activity And Close
Goal: Keep the JSON cache fresh while users interact and before Playwright closes the context.
Demo/Validation:
- User interactions cause cookie JSON to appear/update.
- Closing a session saves cookies before
context.close().
Task 3.1: Add save method
- Location:
backend/app/services/browser_session_service.py - Description: Add
_save_cookies(session: BrowserSession, *, force: bool = False) -> None. - Dependencies: Sprint 1
- Acceptance Criteria:
- Calls
await session.context.cookies(). - Normalizes cookies and writes JSON.
- Skips
auth-capture-*sessions. - Does not log cookie values.
- Handles closed contexts or Playwright errors without raising during cleanup.
- Calls
- Validation:
- Unit test fake context cookies are written.
- Unit test auth-capture profile is skipped.
Task 3.2: Debounce saves after browser events
- Location:
backend/app/services/browser_session_service.py - Description: After supported
event()actions complete, call_save_cookies(session)with a debounce interval, for example 5 seconds. - Dependencies: Task 3.1
- Acceptance Criteria:
- Reuses existing
BrowserSession.last_saved_state_at. - Saves after meaningful actions including click, type, key, reload, back, forward, resize, and scroll.
- Does not save on every screenshot request.
- Does not block event responses for long; if cookie reads are fast, inline is acceptable. If they prove slow, use a background task guarded by session lock.
- Reuses existing
- Validation:
- Unit test repeated events inside debounce produce one write.
- Unit test event after debounce writes again.
Task 3.3: Force save before close and shutdown
- Location:
backend/app/services/browser_session_service.py - Description: In
close(), call_save_cookies(session, force=True)beforecontext.close(). - Dependencies: Task 3.1
- Acceptance Criteria:
- Save happens before CDP detach/close when possible.
shutdown()benefits automatically because it callsclose()for each session.- Close still proceeds even if cookie save fails.
- Validation:
- Unit test fake session records save before context close.
Sprint 4: Tests And Operational Verification
Goal: Prove the feature works without depending entirely on live websites.
Demo/Validation:
- Unit tests pass.
- Manual Docker restart test demonstrates retained login for a site whose server-side session remains valid.
Task 4.1: Add unit tests
- Location:
backend/test_browser_session_service.py - Description: Extend current fake-based tests for cookie persistence helper behavior.
- Dependencies: Sprints 1-3
- Acceptance Criteria:
- Tests cover save, restore, malformed JSON, expired cookie skip, session cookie restore, auth-capture skip, close-before-context-close order, and event debounce.
- Validation:
- Run
pytest backend/test_browser_session_service.py.
- Run
Task 4.2: Add manual verification checklist
- Location:
browser-session-cookie-persistence-plan.mdor a future PR description - Description: Document how to verify in Docker.
- Dependencies: Implementation complete
- Acceptance Criteria:
- Start Docker deployment with
/app/datamounted. - Open remote browser page and log in.
- Perform an event after login to trigger cookie save.
- Confirm
session-cookies.jsonexists under the matching profile directory. - Restart backend/container.
- Reopen remote browser page and verify login state.
- Clear profile and confirm both Chromium profile and cookie JSON are removed.
- Start Docker deployment with
Testing Strategy
- Unit tests: Primary coverage using fake context/page/session objects. This avoids requiring real browser binaries in normal test runs.
- Manual integration test: Required for confidence because real sites differ in cookie, localStorage, and server-side session behavior.
- Regression checks:
pytest backend/test_browser_session_service.py- Existing backend tests if time allows:
pytest backend
Potential Risks & Gotchas
- Some sites use server-side session expiry or revocation; restoring cookies cannot bypass that.
- Some sites bind sessions to IP, user agent, device fingerprint, or TLS/browser state.
- Some login state is stored in
localStorage,sessionStorage, IndexedDB, or service-worker cache. Persistent context already helps with some of this, but this plan only adds explicit cookie backup. sessionStorageis tab-lifetime state and is not covered here. If a target site depends on it heavily, a later phase can add origin-scoped storage backup.- Cookie JSON contains authentication secrets. It must not be committed, logged, or exposed via APIs.
- For CHIPS/partitioned cookies, preserve
partitionKeywhen Playwright returns it. - Do not rewrite all session cookies to long-lived cookies by default; that changes browser semantics and may create security surprises.
Rollback Plan
- Remove the new helper methods and calls from
BrowserSessionService.create(),event(), andclose(). - Delete
session-cookies.jsonfiles from affected profile directories if needed. - Existing Chromium persistent profile behavior will continue to work as before.
Open Decisions
- Whether to add a config flag such as
browser_cookie_persistence_enabled: bool = True. Default can be enabled because this directly addresses the production issue. - Whether to also save
localStoragethrough Playwrightstorage_state()in a later phase. Not required for the first implementation. - Whether cookie JSON should be encrypted at rest. For the current Docker single-host deployment, profile-directory isolation is probably sufficient; encryption can be added if this becomes multi-tenant or shared-host.