- Full auto-analysis of the 137MB IL2CPP GameAssembly.dll is the wrong default: Decompiler Parameter ID is ~single-threaded, ran 5h+ with no checkpoint/ETA and saves only at the end. It rediscovers what Il2CppDumper already knows. - Add ghidra/scripts/apply_il2cpp_symbols.py: headless-adapted port of yoten/ghidra.py (askFile -> script arg) that imports the dumper's script.json symbol table (function boundaries + names + string/metadata labels) onto a -noanalysis import. Names-only/light path; struct+signature path documented. - docs/GHIDRA.md: full workflow, address convention (base.add(Address), no -0x1000), the _JAVA_OPTIONS=-Xmx4g heap-cap gotcha, targeted decomp/disasm commands.
5.4 KiB
Ghidra headless on SAND's GameAssembly.dll (IL2CPP)
How to get a workable Ghidra database for the client, and the big lesson: for an IL2CPP binary you inject the symbol table from Il2CppDumper — you do not sit through full auto-analysis.
Inputs (all already on disk)
- Binary:
/mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll(~137 MB). - Il2CppDumper ("yoten"):
/mnt/c/Users/downloadpizza/Downloads/yoten/— produces, for the current build:script.json(~254 MB) — the mapping:ScriptMethod[](Address+Name+Signature),ScriptString[],ScriptMetadata[],Addresses[](all function starts).il2cpp.h(~124 MB) — every struct/type.dump.cs(~79 MB) — human-readable signatures/RVAs (mirrored toil2cpp/dump.cs).- Ready-made apply scripts:
ghidra.py(names only),ghidra_with_struct.py(names+types+sigs), plus IDA variants. These use an interactiveaskFile()dialog → not headless-safe as shipped.
- Ghidra 11.1.2 install:
ghidra/ghidra_install/(support/analyzeHeadless). Java 17.
The lesson: symbol-inject, not full analysis
Full auto-analysis of a 137 MB IL2CPP binary is the wrong default:
- The Decompiler Parameter ID analyzer is essentially single-threaded and runs over hundreds of thousands of functions. Observed: 5h21m wall / ~5h40m CPU, pegged at ~105% (one core), with the log silent for 4.5h and no checkpoint — headless saves the project only at the very end, so a crash/OOM mid-run loses everything. No progress %/ETA is emitted.
- It largely rediscovers what Il2CppDumper already knows exactly (function boundaries, names, signatures). For our targeted-decompile workflow that's wasted time.
Instead: import with -noanalysis and run the dumper's symbol table in. You get a named,
function-bounded DB in well under an hour. On-demand decompilation (decomp_targets.py) does its own
per-function local analysis, so the global analyzers aren't needed for reading code.
Headless-adapted applier — ghidra/scripts/apply_il2cpp_symbols.py
Adapted from yoten/ghidra.py: replaced askFile() with a script-arg / default path. Light path —
creates functions from Addresses[], names them from ScriptMethod[], labels string literals and
metadata. No il2cpp.h import, no signatures (those need the type archive; see below).
cd /home/downloadpizza/sand_tools
# fresh project, import without analysis, inject symbols, save (background + log):
rm -rf ghidra/project; mkdir -p ghidra/project
_JAVA_OPTIONS= nohup ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
-import "/mnt/d/SteamLibrary/steamapps/common/Sand Playtest/GameAssembly.dll" \
-noanalysis -overwrite \
-scriptPath ghidra/scripts -postScript apply_il2cpp_symbols.py \
> ghidra/headless_symbols.log 2>&1 &
# optional: pass a different script.json path as the postScript arg.
After the DB exists: targeted decompile / disasm (instant, no re-analysis)
Put rva<TAB>name lines in ghidra/targets.txt, then -process the saved program:
_JAVA_OPTIONS= ghidra/ghidra_install/support/analyzeHeadless ghidra/project SAND \
-process GameAssembly.dll -noanalysis \
-scriptPath ghidra/scripts -postScript decomp_targets.py \
> ghidra/headless.log 2>&1
# -> ghidra/decomp.c (or disasm_targets.py -> ghidra/disasm.txt)
decomp_targets.py/disasm_targets.py already disassemble()+createFunction() per target, so they
work even on a bare -noanalysis import; with symbols injected they also resolve names/xrefs.
Typed decompiles (optional, heavy)
For params shown as real types (WalkerBlueprintDto * …) use the ghidra_with_struct.py path: it
imports il2cpp.h (124 MB) into Ghidra's DataTypeManager via the C parser first, then applies
ScriptMethod signatures. The header parse is the slow / memory-hungry step (the usual OOM culprit).
Usually unnecessary — il2cpp/dump.cs already has every signature for reference. Only do it if you
specifically need typed struct fields in the decompiler.
Address convention (verified)
Il2CppDumper script.json Address = the Ghidra offset from image base directly:
baseAddress.add(Address) (image base 0x180000000). No -0x1000. (Note: the local
ghidra/methods.tsv index used by reverse/resolve_decomp.py stores rva = scriptAddress - 0x1000
for its own bookkeeping — different thing; don't conflate.)
Memory / gotchas
analyzeHeadlesshasMAXMEM=8G(already bumped). But the shell exports_JAVA_OPTIONS=-Xmx4g, which silently caps the heap at 4 GB and causes swap thrash — always prefix runs with_JAVA_OPTIONS=to clear it. Machine has ~11 GiB RAM.- The run is detached via
nohup(survives the session); it is not in tmux/screen. Watch withtail -f ghidra/headless_symbols.log.REPORT: Save succeeded= done. ghidra/is git-ignored (install + project + dumps, all large/regenerable).
Tooling map (reverse/, ghidra/scripts/)
ghidra/scripts/apply_il2cpp_symbols.py— headless symbol injector (this doc).ghidra/scripts/decomp_targets.py— decompiletargets.txt→ghidra/decomp.c.ghidra/scripts/disasm_targets.py— disassembletargets.txt→ghidra/disasm.txt(fast, no analysis).reverse/il2cpp_re.py— VA↔file-offset, method index fromdump.cs, xrefs, body disasm + float consts.reverse/resolve_decomp.py— annotateghidra/decomp.cwith symbol names + string literals.