Files
SandTools/CLAUDE.md

171 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What this repo is
Reverse-engineering / data-mining toolkit for the game **SAND** (Hologryph, Unity 6000.0.40f1,
IL2CPP). It extracts server-authoritative and static game data from three sources, and reads/writes
the game's local walker save files. There is no application to build — everything is standalone
Python scripts run against game files, network captures, or live servers.
The four data sources, and which tools own them:
- **Unity asset bundles** (static config: items, recipes, loot, islands) → `bundle/`
- **Master server** `wss://<region>.hologryph.com/gameclient/` (economy, walker blueprints, research) → `reverse/master_scrape.py`
- **PlayFab** (Azure; **auth-only** for this title — Economy/catalog disabled) → `reverse/playfab_scrape.py`
- **`.wbt` walker save files** (local, on disk) → `walker/`
## Working rules (from operator memory — follow these)
- **Data only, never heuristics.** Do not invent rules or fill gaps with plausible assumptions.
Derive every value from game files, decompiled code, or captured payloads — or ask. (An invented
"guns point outward" rule and a guessed rotation→facing mapping both produced wrong results.)
- **Report what the data shows, not inferences as fact.** Don't jump to conclusions.
- **No polling wait-loops.** Use background tasks + wakeup notifications; don't `sleep`-poll for completion.
- **Don't hammer the live server.** It is a real playtest backend. Warn the operator *before* any
action that makes repeated/abnormal connections. BattlEye is active in the game — all scraping is
done **outside** the game process (replayed protocol / captures / REST), never via injection.
- **A `/connect` scrape kicks the live player** (single session per account, newest wins — verified
2026-06-16, see `docs/MASTER_SERVER.md`). Don't open `/connect` while the operator is in-game.
## Environment & how to run
- Use the project venv: **`venv/bin/python <script>`** (has `UnityPy`, `bson`/pymongo, `scapy`, `Pillow`, `websockets`).
- Symlinks (git-ignored, machine-specific — repoint if the machine changes):
- `bundles/` → game `StreamingAssets/aa/StandaloneWindows64/` (35 bundles, ~6.8 GB)
- `Walkers/``…/LocalLow/Hologryph/SAND/Data/Walkers/` (live `.wbt` saves)
- Game install referenced by `bundle/` tools: `/mnt/d/SteamLibrary/steamapps/common/Sand Playtest`
(`GameAssembly.dll` + `Sand_Data/il2cpp_data/Metadata/global-metadata.dat`).
- IL2CPP source of truth: `il2cpp/dump.cs` (Il2CppDumper output — signatures/RVAs only, no method
bodies). `il2cpp/`, `ghidra/`, `snapshots/`, `bundles`, `Walkers`, `reverse/.secrets/` are
git-ignored (large/regenerable/secret). Live PlayFab token: `reverse/.secrets/playfab_token.json`.
- **Game runtime data dir** (`%USERPROFILE%\AppData\LocalLow\Hologryph\SAND`, here
`/mnt/c/Users/DownloadPizza/AppData/LocalLow/Hologryph/SAND/`) holds:
- **`Player.log`** — Unity log; check it to see what the client did (walker file reads:
`[FS_STANDALONE] ReadAllFilesAsync … Path: …/Data/Walkers`; master-server handshake:
`[MasterServer] … Login / Connection failed to /connect`). `Player-prev.log` = previous run.
- **`Data/Walkers/*.wbt`** — the live walker saves (the `Walkers/` symlink points here).
## The `.wbt` walker save format (current focus)
Envelope (RE'd from `XorCryptography.Encrypt`, verified byte-exact on all 5 local walkers):
```
save: Newtonsoft-BSON -> XOR encrypt -> gzip
load: gunzip -> XOR decrypt -> Newtonsoft-BSON parse
```
- XOR key (current build): `70 DD 1F 2A 0B 4A` (6 bytes), applied **per 0xA000-byte chunk** with the
key index reset to 0 at each chunk boundary: `decoded[i] = raw[i] XOR KEY[(i % 0xA000) % 6]`.
If a game update changes the key, recover it with no RE via `walker/recover_key.py`.
- `pymongo`'s `bson.encode` reproduces Newtonsoft.Bson byte-for-byte, so decode→encode is identity.
### Walker naming convention
A walker's display name is **two indices**, not a string: top-level `firstNameIndex` + `secondNameIndex`
(BSON int32, **031 each**). Resolve them via `name_index.json``first_name` / `second_name` tables
(e.g. `(16,5)` = "Veteran Veteran", second `6` = "…Beast"). The top-level `name` field is **null/unused**
— the shown name comes from the indices. "name1" = `firstNameIndex`, "name2" = `secondNameIndex`. Set
both with `build_wbt.py rename <wbt> <first> <second> -o out`, or edit the index directly when copying.
### The 5 hashes (a `.wbt` is a serialized `WalkerBlueprintDto`)
All = `MD5(UTF8(JsonConvert.SerializeObject(obj))).hexUPPER` — Newtonsoft compact JSON: no whitespace,
PascalCase, **member declaration order** (do NOT sort keys), nulls included, **enums as NAME strings**.
| Hash | Scope | Offline-computable? |
|------|-------|---------------------|
| `CompartmentsHash` | top-level: `MD5(JSON(Compartments list))` | **YES**`walker/walker_hashes.py` |
| `ConnectionsHash` | top-level: `MD5(JSON(Connections list))` | **YES**`walker/walker_hashes.py` |
| `CompartmentHash` | per-part: placement from CompartmentsDatabase | **YES** — constant per EpbId+placement |
| `DefinitionsHash` | top-level: `MD5(JSON(Compartments→CompartmentDefinitionDto))` | **NO** — server-sourced |
| `DefinitionHash` | per-part: `MD5(JSON(CompartmentDefinitionDto))` | **NO** — server-sourced |
The two **Definition** hashes hash the rich server-side `CompartmentDefinitionDto`, which is **not**
in the blueprint and **not** equal to the server's `GetCompartmentDefinitions` pricing DTO. They can
only be **harvested** (every part placed in-game writes them into the save) — see
`extracted/definition_hashes_known.json` (~18/126 parts) and `walker/harvest_hashes.py`. When editing
offline, `build_wbt.py pack` recomputes the two Compartment* hashes and **copies/reuses** the
Definition* hashes from the source — correct as long as the *set of part definitions* is unchanged;
it raises if you add a part whose `DefinitionHash` has never been harvested.
**Hash lifecycle (verified live 2026-06-16 — see `docs/TRAMPLER.md`):** the client **recomputes all 5
hashes on save** from its own database (same build ⇒ byte-identical hashes; no per-walker secret —
plain unsalted MD5, they're integrity/version-staleness markers, **not** security). Wiping any/all
hashes is harmless: a walker with blank hashes still **loads, lists, and opens in the editor**, and
**one in-editor save regenerates everything** (and mints a new file UUID + `UniqueId`). The `VERSION`
flag in `Player.log`'s `CheckValidBlueprint: ERRORS {0}, VERSION:{1}` (=`WalkerBlueprintContainer
.ValidateVersion`, which recomputes against the *current* DB/definitions) tracks only the **structural**
hashes (Compartments/Connections); Definition hashes don't affect client validation. Server-side upsert
validation is untested (the master-server `/connect` blocker is cleared as of 2026-06-16 — server back
up — but the live re-test has not been run yet).
Enum tables (from `dump.cs`): `ConnectionSlotType` 0 DOOR,1 HATCH,2 STRUCTURE,3 BALCONY,4 DECK ·
`ConnectionState` 0 DEFAULT,1 DOOR,2 OPEN · `ConnectionsCount` 0 FULL,1 PARTIAL,2 ERROR. Note the
master-server **WS form** serializes these as integers and omits null `EpbId`; the storage/hash form
uses name strings and includes `EpbId:null` — convert before hashing (`reverse/walkerdto_to_blueprint.py`).
## Tools (all scripts, with subcommands)
### `walker/` — `.wbt` save files (offline edit + hashes)
- **`sand.py`** — low-level toolkit. Subcommands: `decode <wbt> [-o]` · `snap (--all | files…)` ·
`diff <before> <after> [--no-filter]` · `check <wbt> [--no-filter]` · `watch <wbt> [--interval]`.
- **`build_wbt.py`** — high-level edit/build. Subcommands: `repack <wbt>` (identity sanity) ·
`rename <wbt> <first> <second> [-o]` (name indices 031) · `pack <wbt> -o out [--no-strict]`
(recompute hashes, write fresh) · `get-icon <wbt> [-o png]` · `set-icon <wbt> <png> [-o]`.
- **`harvest_hashes.py`** — scan saves+snapshots, merge `EpbId→{DefinitionHash,CompartmentHash}` into
the known-hashes table. Usage: `harvest_hashes.py [extra_dir …]`.
- **`recover_key.py`** — recover the XOR key from known-plaintext (the icon background pixel) after a
game update; no RE needed. Usage: `recover_key.py <wbt> …`.
- **`walker_hashes.py`** — reproduce `CompartmentsHash`/`ConnectionsHash` offline (the verified module).
### `bundle/` — Unity asset-bundle extraction (static data)
All use UnityPy with an IL2CPP TypeTreeGenerator (`GameAssembly.dll` + `global-metadata.dat`).
- **`unitybundle.py`** — minimal UnityFS extractor (LZ4/LZ4HC + uncompressed). `unitybundle.py [needle]`.
- **`odin_read.py`** — Sirenix **Odin** Binary (SerializedFormat=0) reader; used to decode
`SerializedBytes` blobs. `odin_read.py <file> [out]`.
- **`extract_data.py`** — generic MonoBehaviour extractor via typetrees → JSON in `extracted/`.
- **`extract_loot.py`** — loot/drop tables (Odin) → `extracted/loot_tables.json`.
- **`extract_production_lines.py`** — world conveyor single-recipe production lines → `extracted/production_lines.json`.
- **`extract_conveyor_placements.py`** — map islands→conveyors → `extracted/conveyor_placements.json`.
- **`extract_island_names.py`** — prefab→in-game Toponym (via `LandmarkBehaviour`) → `extracted/island_names.json`.
- **`extract_i2.py`** — I2 Localization English term table (manual parse) → i2 terms JSON.
- **`workbench_bundles.py`** — workbench EntityBlueprint → referenced `CraftingRecipeBundle`s.
- **`discord_recipes.py`** — emit Discord monospace recipe tables (workbench + production lines).
- **`component_census.py`** — tally ECS `$type` components across all 1446 EntityBlueprints. `component_census.py [filter]`.
- **`dump_blueprint.py`** — fully decode named EntityBlueprint(s): components + scalar fields. `dump_blueprint.py <base> …`.
- **`dump_loot_bytes.py`** / **`loot_probe.py`** — raw Odin byte dump / locate loot configs (analysis helpers).
### `reverse/` — network scraping + IL2CPP RE
- **`master_scrape.py`** — **the working master-server client** (2026-06-15 build). Two-socket
ClientMessage handshake: `/login` (no header) → `/connect` (`Authorization: <server ticket>`).
Flags: `--region {ger,eus,…}` `--go` (ARM network) `--data` `--user` `--client-version` `--insecure`
`--selftest`. Does nothing over the network without `--go`. See `docs/MASTER_SERVER.md` for the full
`ClientAction` enum / `OperationResult<T>` envelope.
- **`playfab_scrape.py`** — PlayFab REST (read-only), runs outside the game. Required `--title-id`;
auth via `--steam-ticket` or `--entity-token`; modes `--catalog` `--inventory` `--titledata`.
(Note: catalog/economy is disabled for this title — PlayFab is effectively auth-only.)
- **`capture_hosts.py`** — triage a pcap: DNS/SNI/endpoints, prints the PlayFab TitleId + master region. `capture_hosts.py <pcap>`.
- **`noise_filter.py`** — baseline-subtract a "SAND-off" pcap from a session pcap to isolate game traffic. `noise_filter.py <baseline> [session]`.
- **`ws_scrape.py`** — decode master-server WS frames from a pcap (older cleartext-era decoder; tries JSON/BSON/MessagePack). `ws_scrape.py <pcap> [--port --host --out]`.
- **`trampler_hashes.py`** — generate the blueprint hashes from scratch (Definition hash provisional). Self-test: run directly.
- **`walkerdto_to_blueprint.py`** — convert master-server `WalkerDto` (e.g. `GetExpedition.Trampler`) → loadable `WalkerBlueprintDto` + recompute hashes. Self-verifies via round-trip.
- **`render_trampler.py`** — render a multi-floor PNG map of a trampler (footprints, doors/hatches, guns) → `extracted/host_trampler_*.png`.
- **`il2cpp_re.py`** — IL2CPP helpers: VA↔file-offset, method index from `dump.cs`, xref finder, body disasm + float-constant extraction.
- **`resolve_decomp.py`** — annotate `ghidra/decomp.c` with symbol names + string literals. `resolve_decomp.py [substr]`.
- **`ghidra_decomp_targets.py`** / **`find_damage_writes.py`** — Ghidra headless decompile-target script / scan decomp for damage-write fingerprint.
### `wikigen/` — generate MediaWiki pages from `extracted/`
`make_items_wiki.py` · `make_crafting_wiki.py` · `make_loot_wiki.py` (→ `wiki/*.mediawiki`) ·
`render_wiki.py` (wikitext → standalone HTML in `wiki_site/`, git-ignored).
## Reference docs (`docs/`)
- **`MASTER_SERVER.md`** — master-server WebSocket protocol & scrape (transport, two-socket handshake, ClientAction enum, OperationResult).
- **`BACKEND_PLAYFAB.md`** — PlayFab is auth-only; read the corrections block at top.
- **`TRAMPLER.md`** — walker blueprint structure, the hashes, footprints, rendering.
- **`TASK.md`** — `.wbt` format cracked (BSON-verified) summary.
- **`PRODUCTION_LINES.md`**, **`SALES_VALUE.md`**, **`WEAPON_DAMAGE.md`** — static-data location maps (track across updates).
- **`SCRAPE_RUNBOOK.md`** — read-only live-scrape steps for when a playtest is online.
- **`GHIDRA.md`** — headless Ghidra on `GameAssembly.dll`: **inject Il2CppDumper symbols, don't full-analyze** (`ghidra/scripts/apply_il2cpp_symbols.py`); targeted decompile/disasm; the `_JAVA_OPTIONS` heap gotcha. **The named DB is already built at `ghidra/project/SAND`** (564k methods, git-ignored/local) — decompile any function on demand via `-process … -postScript decomp_targets.py`.
- **`BUNDLES.md`** (repo root) — inventory of the 35 asset bundles.
Operator memory lives in `~/.claude/projects/-home-downloadpizza-sand-tools/memory/` (loaded each session).