ghidra: symbol-inject workflow (Il2CppDumper script.json) instead of full auto-analysis
- Full auto-analysis of the 137MB IL2CPP GameAssembly.dll is the wrong default: Decompiler Parameter ID is ~single-threaded, ran 5h+ with no checkpoint/ETA and saves only at the end. It rediscovers what Il2CppDumper already knows. - Add ghidra/scripts/apply_il2cpp_symbols.py: headless-adapted port of yoten/ghidra.py (askFile -> script arg) that imports the dumper's script.json symbol table (function boundaries + names + string/metadata labels) onto a -noanalysis import. Names-only/light path; struct+signature path documented. - docs/GHIDRA.md: full workflow, address convention (base.add(Address), no -0x1000), the _JAVA_OPTIONS=-Xmx4g heap-cap gotcha, targeted decomp/disasm commands.
This commit is contained in:
@@ -25,6 +25,8 @@ The four data sources, and which tools own them:
|
|||||||
- **Don't hammer the live server.** It is a real playtest backend. Warn the operator *before* any
|
- **Don't hammer the live server.** It is a real playtest backend. Warn the operator *before* any
|
||||||
action that makes repeated/abnormal connections. BattlEye is active in the game — all scraping is
|
action that makes repeated/abnormal connections. BattlEye is active in the game — all scraping is
|
||||||
done **outside** the game process (replayed protocol / captures / REST), never via injection.
|
done **outside** the game process (replayed protocol / captures / REST), never via injection.
|
||||||
|
- **A `/connect` scrape kicks the live player** (single session per account, newest wins — verified
|
||||||
|
2026-06-16, see `docs/MASTER_SERVER.md`). Don't open `/connect` while the operator is in-game.
|
||||||
|
|
||||||
## Environment & how to run
|
## Environment & how to run
|
||||||
|
|
||||||
@@ -155,6 +157,7 @@ All use UnityPy with an IL2CPP TypeTreeGenerator (`GameAssembly.dll` + `global-m
|
|||||||
- **`TASK.md`** — `.wbt` format cracked (BSON-verified) summary.
|
- **`TASK.md`** — `.wbt` format cracked (BSON-verified) summary.
|
||||||
- **`PRODUCTION_LINES.md`**, **`SALES_VALUE.md`**, **`WEAPON_DAMAGE.md`** — static-data location maps (track across updates).
|
- **`PRODUCTION_LINES.md`**, **`SALES_VALUE.md`**, **`WEAPON_DAMAGE.md`** — static-data location maps (track across updates).
|
||||||
- **`SCRAPE_RUNBOOK.md`** — read-only live-scrape steps for when a playtest is online.
|
- **`SCRAPE_RUNBOOK.md`** — read-only live-scrape steps for when a playtest is online.
|
||||||
|
- **`GHIDRA.md`** — headless Ghidra on `GameAssembly.dll`: **inject Il2CppDumper symbols, don't full-analyze** (`ghidra/scripts/apply_il2cpp_symbols.py`); targeted decompile/disasm; the `_JAVA_OPTIONS` heap gotcha.
|
||||||
- **`BUNDLES.md`** (repo root) — inventory of the 35 asset bundles.
|
- **`BUNDLES.md`** (repo root) — inventory of the 35 asset bundles.
|
||||||
|
|
||||||
Operator memory lives in `~/.claude/projects/-home-downloadpizza-sand-tools/memory/` (loaded each session).
|
Operator memory lives in `~/.claude/projects/-home-downloadpizza-sand-tools/memory/` (loaded each session).
|
||||||
|
|||||||
86
docs/GHIDRA.md
Normal file
86
docs/GHIDRA.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# Ghidra headless on SAND's `GameAssembly.dll` (IL2CPP)
|
||||||
|
|
||||||
|
How to get a workable Ghidra database for the client, and the **big lesson**: for an IL2CPP binary
|
||||||
|
you **inject the symbol table from Il2CppDumper** — you do *not* sit through full auto-analysis.
|
||||||
|
|
||||||
|
## Inputs (all already on disk)
|
||||||
|
- Binary: `/mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll` (~137 MB).
|
||||||
|
- **Il2CppDumper ("yoten")**: `/mnt/c/Users/downloadpizza/Downloads/yoten/` — produces, for the
|
||||||
|
current build:
|
||||||
|
- `script.json` (~254 MB) — the **mapping**: `ScriptMethod[]` (Address+Name+Signature),
|
||||||
|
`ScriptString[]`, `ScriptMetadata[]`, `Addresses[]` (all function starts).
|
||||||
|
- `il2cpp.h` (~124 MB) — every struct/type.
|
||||||
|
- `dump.cs` (~79 MB) — human-readable signatures/RVAs (mirrored to `il2cpp/dump.cs`).
|
||||||
|
- Ready-made apply scripts: `ghidra.py` (names only), `ghidra_with_struct.py` (names+types+sigs),
|
||||||
|
plus IDA variants. **These use an interactive `askFile()` dialog → not headless-safe as shipped.**
|
||||||
|
- Ghidra 11.1.2 install: `ghidra/ghidra_install/` (`support/analyzeHeadless`). Java 17.
|
||||||
|
|
||||||
|
## The lesson: symbol-inject, not full analysis
|
||||||
|
Full auto-analysis of a 137 MB IL2CPP binary is the **wrong default**:
|
||||||
|
- The **Decompiler Parameter ID** analyzer is essentially single-threaded and runs over hundreds of
|
||||||
|
thousands of functions. Observed: **5h21m wall / ~5h40m CPU, pegged at ~105% (one core), with the
|
||||||
|
log silent for 4.5h and no checkpoint** — headless saves the project **only at the very end**, so a
|
||||||
|
crash/OOM mid-run loses everything. No progress %/ETA is emitted.
|
||||||
|
- It largely **rediscovers** what Il2CppDumper already knows exactly (function boundaries, names,
|
||||||
|
signatures). For our targeted-decompile workflow that's wasted time.
|
||||||
|
|
||||||
|
Instead: import with `-noanalysis` and run the dumper's symbol table in. You get a named,
|
||||||
|
function-bounded DB in well under an hour. On-demand decompilation (`decomp_targets.py`) does its own
|
||||||
|
per-function local analysis, so the global analyzers aren't needed for reading code.
|
||||||
|
|
||||||
|
### Headless-adapted applier — `ghidra/scripts/apply_il2cpp_symbols.py`
|
||||||
|
Adapted from `yoten/ghidra.py`: replaced `askFile()` with a script-arg / default path. Light path —
|
||||||
|
creates functions from `Addresses[]`, names them from `ScriptMethod[]`, labels string literals and
|
||||||
|
metadata. **No `il2cpp.h` import, no signatures** (those need the type archive; see below).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /home/downloadpizza/sand_tools
|
||||||
|
# fresh project, import without analysis, inject symbols, save (background + log):
|
||||||
|
rm -rf ghidra/project; mkdir -p ghidra/project
|
||||||
|
_JAVA_OPTIONS= nohup ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
|
||||||
|
-import "/mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll" \
|
||||||
|
-noanalysis -overwrite \
|
||||||
|
-scriptPath ghidra/scripts -postScript apply_il2cpp_symbols.py \
|
||||||
|
> ghidra/headless_symbols.log 2>&1 &
|
||||||
|
# optional: pass a different script.json path as the postScript arg.
|
||||||
|
```
|
||||||
|
|
||||||
|
### After the DB exists: targeted decompile / disasm (instant, no re-analysis)
|
||||||
|
Put `rva<TAB>name` lines in `ghidra/targets.txt`, then `-process` the saved program:
|
||||||
|
```bash
|
||||||
|
_JAVA_OPTIONS= ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
|
||||||
|
-process GameAssembly.dll -noanalysis \
|
||||||
|
-scriptPath ghidra/scripts -postScript decomp_targets.py \
|
||||||
|
> ghidra/headless.log 2>&1
|
||||||
|
# -> ghidra/decomp.c (or disasm_targets.py -> ghidra/disasm.txt)
|
||||||
|
```
|
||||||
|
`decomp_targets.py`/`disasm_targets.py` already `disassemble()`+`createFunction()` per target, so they
|
||||||
|
work even on a bare `-noanalysis` import; with symbols injected they also resolve names/xrefs.
|
||||||
|
|
||||||
|
## Typed decompiles (optional, heavy)
|
||||||
|
For params shown as real types (`WalkerBlueprintDto *` …) use the `ghidra_with_struct.py` path: it
|
||||||
|
imports `il2cpp.h` (124 MB) into Ghidra's `DataTypeManager` via the C parser **first**, then applies
|
||||||
|
`ScriptMethod` signatures. The header parse is the slow / memory-hungry step (the usual OOM culprit).
|
||||||
|
Usually unnecessary — `il2cpp/dump.cs` already has every signature for reference. Only do it if you
|
||||||
|
specifically need typed struct fields in the decompiler.
|
||||||
|
|
||||||
|
## Address convention (verified)
|
||||||
|
Il2CppDumper `script.json` `Address` = the Ghidra **offset from image base** directly:
|
||||||
|
`baseAddress.add(Address)` (image base `0x180000000`). **No `-0x1000`.** (Note: the local
|
||||||
|
`ghidra/methods.tsv` index used by `reverse/resolve_decomp.py` stores `rva = scriptAddress - 0x1000`
|
||||||
|
for its own bookkeeping — different thing; don't conflate.)
|
||||||
|
|
||||||
|
## Memory / gotchas
|
||||||
|
- `analyzeHeadless` has `MAXMEM=8G` (already bumped). **But the shell exports `_JAVA_OPTIONS=-Xmx4g`**,
|
||||||
|
which silently caps the heap at 4 GB and causes swap thrash — always prefix runs with
|
||||||
|
`_JAVA_OPTIONS=` to clear it. Machine has ~11 GiB RAM.
|
||||||
|
- The run is detached via `nohup` (survives the session); it is **not** in tmux/screen. Watch with
|
||||||
|
`tail -f ghidra/headless_symbols.log`. `REPORT: Save succeeded` = done.
|
||||||
|
- `ghidra/` is git-ignored (install + project + dumps, all large/regenerable).
|
||||||
|
|
||||||
|
## Tooling map (`reverse/`, `ghidra/scripts/`)
|
||||||
|
- `ghidra/scripts/apply_il2cpp_symbols.py` — headless symbol injector (this doc).
|
||||||
|
- `ghidra/scripts/decomp_targets.py` — decompile `targets.txt` → `ghidra/decomp.c`.
|
||||||
|
- `ghidra/scripts/disasm_targets.py` — disassemble `targets.txt` → `ghidra/disasm.txt` (fast, no analysis).
|
||||||
|
- `reverse/il2cpp_re.py` — VA↔file-offset, method index from `dump.cs`, xrefs, body disasm + float consts.
|
||||||
|
- `reverse/resolve_decomp.py` — annotate `ghidra/decomp.c` with symbol names + string literals.
|
||||||
Reference in New Issue
Block a user