docs: add CLAUDE.md (tools, subcommands, architecture, .wbt format/hashes)
This commit is contained in:
143
CLAUDE.md
Normal file
143
CLAUDE.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## What this repo is
|
||||
|
||||
Reverse-engineering / data-mining toolkit for the game **SAND** (Hologryph, Unity 6000.0.40f1,
|
||||
IL2CPP). It extracts server-authoritative and static game data from three sources, and reads/writes
|
||||
the game's local walker save files. There is no application to build — everything is standalone
|
||||
Python scripts run against game files, network captures, or live servers.
|
||||
|
||||
The four data sources, and which tools own them:
|
||||
- **Unity asset bundles** (static config: items, recipes, loot, islands) → `bundle/`
|
||||
- **Master server** `wss://<region>.hologryph.com/gameclient/` (economy, walker blueprints, research) → `reverse/master_scrape.py`
|
||||
- **PlayFab** (Azure; **auth-only** for this title — Economy/catalog disabled) → `reverse/playfab_scrape.py`
|
||||
- **`.wbt` walker save files** (local, on disk) → `walker/`
|
||||
|
||||
## Working rules (from operator memory — follow these)
|
||||
|
||||
- **Data only, never heuristics.** Do not invent rules or fill gaps with plausible assumptions.
|
||||
Derive every value from game files, decompiled code, or captured payloads — or ask. (An invented
|
||||
"guns point outward" rule and a guessed rotation→facing mapping both produced wrong results.)
|
||||
- **Report what the data shows, not inferences as fact.** Don't jump to conclusions.
|
||||
- **No polling wait-loops.** Use background tasks + wakeup notifications; don't `sleep`-poll for completion.
|
||||
- **Don't hammer the live server.** It is a real playtest backend. Warn the operator *before* any
|
||||
action that makes repeated/abnormal connections. BattlEye is active in the game — all scraping is
|
||||
done **outside** the game process (replayed protocol / captures / REST), never via injection.
|
||||
|
||||
## Environment & how to run
|
||||
|
||||
- Use the project venv: **`venv/bin/python <script>`** (has `UnityPy`, `bson`/pymongo, `scapy`, `Pillow`, `websockets`).
|
||||
- Symlinks (git-ignored, machine-specific — repoint if the machine changes):
|
||||
- `bundles/` → game `StreamingAssets/aa/StandaloneWindows64/` (35 bundles, ~6.8 GB)
|
||||
- `Walkers/` → `…/LocalLow/Hologryph/SAND/Data/Walkers/` (live `.wbt` saves)
|
||||
- Game install referenced by `bundle/` tools: `/mnt/d/SteamLibrary/steamapps/common/Sand Playtest`
|
||||
(`GameAssembly.dll` + `Sand_Data/il2cpp_data/Metadata/global-metadata.dat`).
|
||||
- IL2CPP source of truth: `il2cpp/dump.cs` (Il2CppDumper output — signatures/RVAs only, no method
|
||||
bodies). `il2cpp/`, `ghidra/`, `snapshots/`, `bundles`, `Walkers`, `reverse/.secrets/` are
|
||||
git-ignored (large/regenerable/secret). Live PlayFab token: `reverse/.secrets/playfab_token.json`.
|
||||
|
||||
## The `.wbt` walker save format (current focus)
|
||||
|
||||
Envelope (RE'd from `XorCryptography.Encrypt`, verified byte-exact on all 5 local walkers):
|
||||
```
|
||||
save: Newtonsoft-BSON -> XOR encrypt -> gzip
|
||||
load: gunzip -> XOR decrypt -> Newtonsoft-BSON parse
|
||||
```
|
||||
- XOR key (current build): `70 DD 1F 2A 0B 4A` (6 bytes), applied **per 0xA000-byte chunk** with the
|
||||
key index reset to 0 at each chunk boundary: `decoded[i] = raw[i] XOR KEY[(i % 0xA000) % 6]`.
|
||||
If a game update changes the key, recover it with no RE via `walker/recover_key.py`.
|
||||
- `pymongo`'s `bson.encode` reproduces Newtonsoft.Bson byte-for-byte, so decode→encode is identity.
|
||||
|
||||
### The 5 hashes (a `.wbt` is a serialized `WalkerBlueprintDto`)
|
||||
|
||||
All = `MD5(UTF8(JsonConvert.SerializeObject(obj))).hexUPPER` — Newtonsoft compact JSON: no whitespace,
|
||||
PascalCase, **member declaration order** (do NOT sort keys), nulls included, **enums as NAME strings**.
|
||||
|
||||
| Hash | Scope | Offline-computable? |
|
||||
|------|-------|---------------------|
|
||||
| `CompartmentsHash` | top-level: `MD5(JSON(Compartments list))` | **YES** — `walker/walker_hashes.py` |
|
||||
| `ConnectionsHash` | top-level: `MD5(JSON(Connections list))` | **YES** — `walker/walker_hashes.py` |
|
||||
| `CompartmentHash` | per-part: placement from CompartmentsDatabase | **YES** — constant per EpbId+placement |
|
||||
| `DefinitionsHash` | top-level: `MD5(JSON(Compartments→CompartmentDefinitionDto))` | **NO** — server-sourced |
|
||||
| `DefinitionHash` | per-part: `MD5(JSON(CompartmentDefinitionDto))` | **NO** — server-sourced |
|
||||
|
||||
The two **Definition** hashes hash the rich server-side `CompartmentDefinitionDto`, which is **not**
|
||||
in the blueprint and **not** equal to the server's `GetCompartmentDefinitions` pricing DTO. They can
|
||||
only be **harvested** (every part placed in-game writes them into the save) — see
|
||||
`extracted/definition_hashes_known.json` (~18/126 parts) and `walker/harvest_hashes.py`. When editing
|
||||
offline, `build_wbt.py pack` recomputes the two Compartment* hashes and **copies/reuses** the
|
||||
Definition* hashes from the source — correct as long as the *set of part definitions* is unchanged;
|
||||
it raises if you add a part whose `DefinitionHash` has never been harvested.
|
||||
|
||||
Enum tables (from `dump.cs`): `ConnectionSlotType` 0 DOOR,1 HATCH,2 STRUCTURE,3 BALCONY,4 DECK ·
|
||||
`ConnectionState` 0 DEFAULT,1 DOOR,2 OPEN · `ConnectionsCount` 0 FULL,1 PARTIAL,2 ERROR. Note the
|
||||
master-server **WS form** serializes these as integers and omits null `EpbId`; the storage/hash form
|
||||
uses name strings and includes `EpbId:null` — convert before hashing (`reverse/walkerdto_to_blueprint.py`).
|
||||
|
||||
## Tools (all scripts, with subcommands)
|
||||
|
||||
### `walker/` — `.wbt` save files (offline edit + hashes)
|
||||
- **`sand.py`** — low-level toolkit. Subcommands: `decode <wbt> [-o]` · `snap (--all | files…)` ·
|
||||
`diff <before> <after> [--no-filter]` · `check <wbt> [--no-filter]` · `watch <wbt> [--interval]`.
|
||||
- **`build_wbt.py`** — high-level edit/build. Subcommands: `repack <wbt>` (identity sanity) ·
|
||||
`rename <wbt> <first> <second> [-o]` (name indices 0–31) · `pack <wbt> -o out [--no-strict]`
|
||||
(recompute hashes, write fresh) · `get-icon <wbt> [-o png]` · `set-icon <wbt> <png> [-o]`.
|
||||
- **`harvest_hashes.py`** — scan saves+snapshots, merge `EpbId→{DefinitionHash,CompartmentHash}` into
|
||||
the known-hashes table. Usage: `harvest_hashes.py [extra_dir …]`.
|
||||
- **`recover_key.py`** — recover the XOR key from known-plaintext (the icon background pixel) after a
|
||||
game update; no RE needed. Usage: `recover_key.py <wbt> …`.
|
||||
- **`walker_hashes.py`** — reproduce `CompartmentsHash`/`ConnectionsHash` offline (the verified module).
|
||||
|
||||
### `bundle/` — Unity asset-bundle extraction (static data)
|
||||
All use UnityPy with an IL2CPP TypeTreeGenerator (`GameAssembly.dll` + `global-metadata.dat`).
|
||||
- **`unitybundle.py`** — minimal UnityFS extractor (LZ4/LZ4HC + uncompressed). `unitybundle.py [needle]`.
|
||||
- **`odin_read.py`** — Sirenix **Odin** Binary (SerializedFormat=0) reader; used to decode
|
||||
`SerializedBytes` blobs. `odin_read.py <file> [out]`.
|
||||
- **`extract_data.py`** — generic MonoBehaviour extractor via typetrees → JSON in `extracted/`.
|
||||
- **`extract_loot.py`** — loot/drop tables (Odin) → `extracted/loot_tables.json`.
|
||||
- **`extract_production_lines.py`** — world conveyor single-recipe production lines → `extracted/production_lines.json`.
|
||||
- **`extract_conveyor_placements.py`** — map islands→conveyors → `extracted/conveyor_placements.json`.
|
||||
- **`extract_island_names.py`** — prefab→in-game Toponym (via `LandmarkBehaviour`) → `extracted/island_names.json`.
|
||||
- **`extract_i2.py`** — I2 Localization English term table (manual parse) → i2 terms JSON.
|
||||
- **`workbench_bundles.py`** — workbench EntityBlueprint → referenced `CraftingRecipeBundle`s.
|
||||
- **`discord_recipes.py`** — emit Discord monospace recipe tables (workbench + production lines).
|
||||
- **`component_census.py`** — tally ECS `$type` components across all 1446 EntityBlueprints. `component_census.py [filter]`.
|
||||
- **`dump_blueprint.py`** — fully decode named EntityBlueprint(s): components + scalar fields. `dump_blueprint.py <base> …`.
|
||||
- **`dump_loot_bytes.py`** / **`loot_probe.py`** — raw Odin byte dump / locate loot configs (analysis helpers).
|
||||
|
||||
### `reverse/` — network scraping + IL2CPP RE
|
||||
- **`master_scrape.py`** — **the working master-server client** (2026-06-15 build). Two-socket
|
||||
ClientMessage handshake: `/login` (no header) → `/connect` (`Authorization: <server ticket>`).
|
||||
Flags: `--region {ger,eus,…}` `--go` (ARM network) `--data` `--user` `--client-version` `--insecure`
|
||||
`--selftest`. Does nothing over the network without `--go`. See `docs/MASTER_SERVER.md` for the full
|
||||
`ClientAction` enum / `OperationResult<T>` envelope.
|
||||
- **`playfab_scrape.py`** — PlayFab REST (read-only), runs outside the game. Required `--title-id`;
|
||||
auth via `--steam-ticket` or `--entity-token`; modes `--catalog` `--inventory` `--titledata`.
|
||||
(Note: catalog/economy is disabled for this title — PlayFab is effectively auth-only.)
|
||||
- **`capture_hosts.py`** — triage a pcap: DNS/SNI/endpoints, prints the PlayFab TitleId + master region. `capture_hosts.py <pcap>`.
|
||||
- **`noise_filter.py`** — baseline-subtract a "SAND-off" pcap from a session pcap to isolate game traffic. `noise_filter.py <baseline> [session]`.
|
||||
- **`ws_scrape.py`** — decode master-server WS frames from a pcap (older cleartext-era decoder; tries JSON/BSON/MessagePack). `ws_scrape.py <pcap> [--port --host --out]`.
|
||||
- **`trampler_hashes.py`** — generate the blueprint hashes from scratch (Definition hash provisional). Self-test: run directly.
|
||||
- **`walkerdto_to_blueprint.py`** — convert master-server `WalkerDto` (e.g. `GetExpedition.Trampler`) → loadable `WalkerBlueprintDto` + recompute hashes. Self-verifies via round-trip.
|
||||
- **`render_trampler.py`** — render a multi-floor PNG map of a trampler (footprints, doors/hatches, guns) → `extracted/host_trampler_*.png`.
|
||||
- **`il2cpp_re.py`** — IL2CPP helpers: VA↔file-offset, method index from `dump.cs`, xref finder, body disasm + float-constant extraction.
|
||||
- **`resolve_decomp.py`** — annotate `ghidra/decomp.c` with symbol names + string literals. `resolve_decomp.py [substr]`.
|
||||
- **`ghidra_decomp_targets.py`** / **`find_damage_writes.py`** — Ghidra headless decompile-target script / scan decomp for damage-write fingerprint.
|
||||
|
||||
### `wikigen/` — generate MediaWiki pages from `extracted/`
|
||||
`make_items_wiki.py` · `make_crafting_wiki.py` · `make_loot_wiki.py` (→ `wiki/*.mediawiki`) ·
|
||||
`render_wiki.py` (wikitext → standalone HTML in `wiki_site/`, git-ignored).
|
||||
|
||||
## Reference docs (`docs/`)
|
||||
|
||||
- **`MASTER_SERVER.md`** — master-server WebSocket protocol & scrape (transport, two-socket handshake, ClientAction enum, OperationResult).
|
||||
- **`BACKEND_PLAYFAB.md`** — PlayFab is auth-only; read the corrections block at top.
|
||||
- **`TRAMPLER.md`** — walker blueprint structure, the hashes, footprints, rendering.
|
||||
- **`TASK.md`** — `.wbt` format cracked (BSON-verified) summary.
|
||||
- **`PRODUCTION_LINES.md`**, **`SALES_VALUE.md`**, **`WEAPON_DAMAGE.md`** — static-data location maps (track across updates).
|
||||
- **`SCRAPE_RUNBOOK.md`** — read-only live-scrape steps for when a playtest is online.
|
||||
- **`BUNDLES.md`** (repo root) — inventory of the 35 asset bundles.
|
||||
|
||||
Operator memory lives in `~/.claude/projects/-home-downloadpizza-sand-tools/memory/` (loaded each session).
|
||||
Reference in New Issue
Block a user