Files
SandTools/docs/SCRAPE_RUNBOOK.md
DownloadPizza 3df0797acc scrape tooling: live capture triage + master-server WS decoder + PlayFab REST scraper
Built and unit-tested ahead of a live playtest window:
- reverse/capture_hosts.py: pcap -> DNS/SNI/endpoints in order; extracts PlayFab TitleId,
  flags hologryph master-server region + config CDN.
- reverse/ws_scrape.py: TCP reassembly + RFC-6455 framing for the cleartext ws://<region>.
  hologryph.com/gameclient/ stream; decodes JSON/BSON/MessagePack; auto-labels ServerDto,
  CompartmentDefinitionDto, ResearchNodeJsonDto, OperationResult, etc. No MITM needed.
- reverse/playfab_scrape.py: LoginWithSteam (or captured EntityToken) -> Catalog/SearchItems
  (+ Inventory/TitleData); prices resolved to item names. Read-only.
- docs/SCRAPE_RUNBOOK.md: turnkey steps for when servers are online.
2026-06-12 10:06:48 +02:00

3.8 KiB

Live-scrape runbook — when a playtest is online

Everything below is read-only and runs outside the game process (no BattlEye interaction). Tooling is built and unit-tested; the only thing that needs a live backend is the data itself.

0. Capture (once servers are up)

  1. ipconfig /flushdns (so hostnames show as clean DNS queries, incl. the PlayFab TitleId).
  2. Start a packet capture on the game NIC (Wireshark, or pktmon/dumpcap). Save as .pcapng.
    • Master-server traffic is cleartext ws:// on port 80 — Wireshark reads it directly, no MITM/cert needed.
    • PlayFab is HTTPS/443 — to read its bodies you need your MITM (cert already installed) on 443, or use the REST scraper (step 3) instead.
  3. Launch SAND, click through past the "no servers"/welcome dialog and let it log in, then open the screens whose data you want (walker editor → compartment defs; research tree; store → prices). Keep capturing through it. Stop the capture.

1. Triage the capture → get the TitleId + confirm the master server

venv/bin/python reverse/capture_hosts.py <capture.pcapng>

Prints DNS/SNI/endpoints in order and a BACKENDS DETECTED block:

  • PlayFab host=<id>.playfabapi.com ** TitleId = <ID> ** ← the one constant the REST scraper needs
  • Master server host=<region>.hologryph.com (ws://80 cleartext)
  • Config CDN host=sandconfigstorage…

2. Master server (compartments + research tree + server list) — cleartext, no auth replay

venv/bin/python reverse/ws_scrape.py <capture.pcapng> --out extracted/master_ws.json

Reassembles the port-80 WebSocket to *.hologryph.com/gameclient/, parses RFC-6455 frames, and decodes each message (tries JSON → BSON → MessagePack — the game's IDataSerializer is JSON-likely). Messages are auto-tagged when their shape matches a known DTO: ServerDto, RegionInfo, CompartmentDefinitionDto (HP/Weight/Properties/prices), ResearchNodeJsonDto (connections via RequiredNodesIds/DependentNodesIds, costs via ResearchPrice), ItemDto/ShopItemDto/PriceDto, OperationResult, IClientEvent. If it finds no WS stream, the capture didn't span the master-server connection (re-capture through the login), or try --port/--host.

First run, eyeball one frame to confirm the encoding (JSON vs BSON). The decoder already handles both; this is just a sanity check.

3. PlayFab prices / catalog / inventory

Either read them from the MITM'd 443 capture, or pull them directly (cleaner, gets the full catalog, more than the client requests):

# with a Steam auth ticket (captured, or minted via Steamworks GetAuthSessionTicket):
venv/bin/python reverse/playfab_scrape.py --title-id <ID> --steam-ticket <hex> --catalog --inventory

# or skip login with an EntityToken lifted from your MITM capture:
venv/bin/python reverse/playfab_scrape.py --title-id <ID> --entity-token <tok> --catalog

--catalogextracted/playfab_catalog.json: every item with PriceOptions (→ currency-item + amount, names resolved via extracted/item_names.json) and DisplayProperties (check here for any catalog-authored base stats). --inventory → wallet + items + transaction history. --titledataClient/GetTitleData config blobs. Read-only endpoints only — no write/purchase calls.

Notes / unknowns to confirm live

  • WS payload encoding (JSON vs BSON): decoder handles both; confirm on first capture.
  • Steam ticket reuse: tickets are short-lived/single-use — if --steam-ticket fails, lift an EntityToken from the MITM capture and use --entity-token instead.
  • Damage: still server-computed; check DisplayProperties (catalog) and CompartmentDefinitionDto.Properties (master server) for any base values — don't assume present.