Files
SandTools/docs/GHIDRA.md
DownloadPizza e390461a53 ghidra: symbol-inject workflow (Il2CppDumper script.json) instead of full auto-analysis
- Full auto-analysis of the 137MB IL2CPP GameAssembly.dll is the wrong default:
  Decompiler Parameter ID is ~single-threaded, ran 5h+ with no checkpoint/ETA
  and saves only at the end. It rediscovers what Il2CppDumper already knows.
- Add ghidra/scripts/apply_il2cpp_symbols.py: headless-adapted port of
  yoten/ghidra.py (askFile -> script arg) that imports the dumper's script.json
  symbol table (function boundaries + names + string/metadata labels) onto a
  -noanalysis import. Names-only/light path; struct+signature path documented.
- docs/GHIDRA.md: full workflow, address convention (base.add(Address), no -0x1000),
  the _JAVA_OPTIONS=-Xmx4g heap-cap gotcha, targeted decomp/disasm commands.
2026-06-16 15:51:00 +02:00

5.4 KiB

Ghidra headless on SAND's GameAssembly.dll (IL2CPP)

How to get a workable Ghidra database for the client, and the big lesson: for an IL2CPP binary you inject the symbol table from Il2CppDumper — you do not sit through full auto-analysis.

Inputs (all already on disk)

  • Binary: /mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll (~137 MB).
  • Il2CppDumper ("yoten"): /mnt/c/Users/downloadpizza/Downloads/yoten/ — produces, for the current build:
    • script.json (~254 MB) — the mapping: ScriptMethod[] (Address+Name+Signature), ScriptString[], ScriptMetadata[], Addresses[] (all function starts).
    • il2cpp.h (~124 MB) — every struct/type.
    • dump.cs (~79 MB) — human-readable signatures/RVAs (mirrored to il2cpp/dump.cs).
    • Ready-made apply scripts: ghidra.py (names only), ghidra_with_struct.py (names+types+sigs), plus IDA variants. These use an interactive askFile() dialog → not headless-safe as shipped.
  • Ghidra 11.1.2 install: ghidra/ghidra_install/ (support/analyzeHeadless). Java 17.

The lesson: symbol-inject, not full analysis

Full auto-analysis of a 137 MB IL2CPP binary is the wrong default:

  • The Decompiler Parameter ID analyzer is essentially single-threaded and runs over hundreds of thousands of functions. Observed: 5h21m wall / ~5h40m CPU, pegged at ~105% (one core), with the log silent for 4.5h and no checkpoint — headless saves the project only at the very end, so a crash/OOM mid-run loses everything. No progress %/ETA is emitted.
  • It largely rediscovers what Il2CppDumper already knows exactly (function boundaries, names, signatures). For our targeted-decompile workflow that's wasted time.

Instead: import with -noanalysis and run the dumper's symbol table in. You get a named, function-bounded DB in well under an hour. On-demand decompilation (decomp_targets.py) does its own per-function local analysis, so the global analyzers aren't needed for reading code.

Headless-adapted applier — ghidra/scripts/apply_il2cpp_symbols.py

Adapted from yoten/ghidra.py: replaced askFile() with a script-arg / default path. Light path — creates functions from Addresses[], names them from ScriptMethod[], labels string literals and metadata. No il2cpp.h import, no signatures (those need the type archive; see below).

cd /home/downloadpizza/sand_tools
# fresh project, import without analysis, inject symbols, save  (background + log):
rm -rf ghidra/project; mkdir -p ghidra/project
_JAVA_OPTIONS= nohup ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
  -import "/mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll" \
  -noanalysis -overwrite \
  -scriptPath ghidra/scripts -postScript apply_il2cpp_symbols.py \
  > ghidra/headless_symbols.log 2>&1 &
# optional: pass a different script.json path as the postScript arg.

After the DB exists: targeted decompile / disasm (instant, no re-analysis)

Put rva<TAB>name lines in ghidra/targets.txt, then -process the saved program:

_JAVA_OPTIONS= ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
  -process GameAssembly.dll -noanalysis \
  -scriptPath ghidra/scripts -postScript decomp_targets.py \
  > ghidra/headless.log 2>&1
# -> ghidra/decomp.c     (or disasm_targets.py -> ghidra/disasm.txt)

decomp_targets.py/disasm_targets.py already disassemble()+createFunction() per target, so they work even on a bare -noanalysis import; with symbols injected they also resolve names/xrefs.

Typed decompiles (optional, heavy)

For params shown as real types (WalkerBlueprintDto * …) use the ghidra_with_struct.py path: it imports il2cpp.h (124 MB) into Ghidra's DataTypeManager via the C parser first, then applies ScriptMethod signatures. The header parse is the slow / memory-hungry step (the usual OOM culprit). Usually unnecessary — il2cpp/dump.cs already has every signature for reference. Only do it if you specifically need typed struct fields in the decompiler.

Address convention (verified)

Il2CppDumper script.json Address = the Ghidra offset from image base directly: baseAddress.add(Address) (image base 0x180000000). No -0x1000. (Note: the local ghidra/methods.tsv index used by reverse/resolve_decomp.py stores rva = scriptAddress - 0x1000 for its own bookkeeping — different thing; don't conflate.)

Memory / gotchas

  • analyzeHeadless has MAXMEM=8G (already bumped). But the shell exports _JAVA_OPTIONS=-Xmx4g, which silently caps the heap at 4 GB and causes swap thrash — always prefix runs with _JAVA_OPTIONS= to clear it. Machine has ~11 GiB RAM.
  • The run is detached via nohup (survives the session); it is not in tmux/screen. Watch with tail -f ghidra/headless_symbols.log. REPORT: Save succeeded = done.
  • ghidra/ is git-ignored (install + project + dumps, all large/regenerable).

Tooling map (reverse/, ghidra/scripts/)

  • ghidra/scripts/apply_il2cpp_symbols.py — headless symbol injector (this doc).
  • ghidra/scripts/decomp_targets.py — decompile targets.txtghidra/decomp.c.
  • ghidra/scripts/disasm_targets.py — disassemble targets.txtghidra/disasm.txt (fast, no analysis).
  • reverse/il2cpp_re.py — VA↔file-offset, method index from dump.cs, xrefs, body disasm + float consts.
  • reverse/resolve_decomp.py — annotate ghidra/decomp.c with symbol names + string literals.