yaps is a C++20 research prototype for PUSHAN-style, VPC-sensitive
devirtualization. It loads x86-64 PE binaries, decodes instructions through
Ghidra SLEIGH p-code, recovers a flat CFG keyed by (native_addr, vpc),
simplifies that graph into a p-code CFG, and emits pseudo-C for inspection.
The project is 100% AI-generated under human direction and is still in heavy development. Treat it as a fast-moving research implementation, not a polished product or a claim of universal VM deobfuscation.
PUSHAN is the obvious reference point for this repository: trace-free CFG recovery, VPC-sensitive block identities, constraint-light symbolic emulation, semantic simplification, and eventual C-like output. This repository is not an official PUSHAN implementation. It is an independent rebuild of the parts that can be tested from public descriptions and local experiments.
Original paper: https://arxiv.org/html/2603.18355v1
The current goal is narrower and more explicit than the paper-level claim:
- make PUSHAN-style recovery executable and inspectable;
- keep manual VPC configuration as a first-class workflow;
- recover useful flat CFG and p-code CFG artifacts from virtualized samples;
- expose the limits honestly while improving the implementation.
For a separate technical critique of the paper claims and evaluation framing, see docs/pushan-critique.md.
The current implementation is focused on:
- Windows x86-64 PE inputs;
- Tigress-style virtualized samples as the main public target;
- manually supplied VPC/VIP facts, with optional bytecode-region facts when they are useful;
- Ghidra SLEIGH-based instruction lifting;
- abstract-state propagation over registers and memory;
- VPC-sensitive flat CFG recovery;
- p-code CFG simplification;
- pseudo-C emission from the simplified p-code CFG.
Manual VPC input is not a temporary hack in this project. Automatic VPC discovery is useful future work, but manual facts are the most reliable way to make experiments reproducible and will remain supported.
This repository does not currently claim general VMProtect or Themida devirtualization. It also does not claim high-quality C output yet. The emitted pseudo-C is real output from the current pipeline, and some of it is still noisy enough to show where the decompiler and variable recovery need more work.
The current flagship public sample is samples/tigress/fortigress.c. It is intentionally more involved than a single straight-line hash function:
- four input words;
- nested loops;
- rolling state;
- five-way branch selection;
- extra conditional branches inside the loop body;
- a final hard equality check.
On the current laptop-class Windows test setup, the ifnest recovery run is roughly in the 30-second range. This number is only a working data point, not a formal benchmark; the current run is not tuned around a multi-core benchmark machine.
The intended build environment is Windows with Visual Studio generator support
and clang-cl.
Required local inputs:
- initialized git submodules for
third_party/sleigh,third_party/spdlog,third_party/json,third_party/LIEF, andthird_party/zlib; - a local Ghidra checkout configured by
YAPS_GHIDRA_ROOT; - a local Abseil C++ checkout configured by
YAPS_ABSEIL_ROOT; - a local Z3 package configured by
YAPS_Z3_ROOT, or unpacked underthird_party/z3; - CMake 3.24 or newer.
Initialize submodules:
git submodule update --init --recursiveConfigure and build the release preset:
cmake --preset windows-clangcl-release `
-DYAPS_GHIDRA_ROOT=F:/github/ghidra `
-DYAPS_ABSEIL_ROOT=F:/github/abseil-cpp `
-DYAPS_Z3_ROOT=F:/github/yaps/third_party/z3
cmake --build --preset windows-clangcl-releaseThe default preset paths are local-development defaults. Override them when your Ghidra, Abseil, or Z3 locations differ.
Run tests from the release build tree:
ctest --test-dir build/windows-clangcl-release -C RelWithDebInfo --output-on-failureThe debug preset is also available:
cmake --preset windows-clangcl-debug
cmake --build --preset windows-clangcl-debug
ctest --preset windows-clangcl-debugBinary analysis mode:
build/windows-clangcl-release/RelWithDebInfo/yaps_cli.exe `
--binary samples/tigress_ifnest/fortigress_ifnest.exe `
--config samples/tigress_ifnest/fortigress_ifnest_config.json `
--function-rva 0x14bf `
--out-json out/ifnest_stage1.json `
--out-dot out/ifnest_stage1.dot `
--out-pcode-json out/ifnest_stage2.json `
--out-pcode-dot out/ifnest_stage2.dot `
--out-pseudo-c out/ifnest_stage3.c `
--timeout-ms 300000 `
--log-level infoUse exactly one of --function-rva or --function-va.
The same command is checked in as
samples/tigress_ifnest/command.ps1.
Useful output options:
--out-json: flat CFG JSON;--out-dot: flat CFG DOT;--out-pcode-json: simplified p-code CFG JSON;--out-pcode-dot: simplified p-code CFG DOT;--out-pseudo-c: pseudo-C emitted from the simplified p-code CFG;--transform-dump-dir: write intermediate transform pass dumps;--transform-dump-before: include pre-pass dumps;--transform-dump-after-all: dump after every pass, not only changed passes.
Pseudo-C can also be emitted from a previously saved p-code CFG:
build/windows-clangcl-release/RelWithDebInfo/yaps_cli.exe `
--in-pcode-json out/stage2.json `
--out-pseudo-c out/stage3.cA minimal register-VPC configuration:
{
"target_arch": "x86_64",
"vpc": {
"mode": "manual",
"storage": "register",
"register": "r15",
"width": 8
},
"vip": {
"mode": "same_as_vpc"
},
"entry_context": {
"initial_vpc": "0x0"
}
}bytecode is optional. When omitted, yaps uses an empty bytecode region
(base = 0, size = 0). Supplying the region is still useful when VM bytecode
loads should be treated as known constants.
A memory-at-register VPC configuration, similar to the current ifnest run:
{
"target_arch": "x86_64",
"vpc": {
"mode": "manual",
"storage": "memory_at_register",
"base_register": "rbp",
"displacement": "-0x58",
"width": 8
},
"vip": {
"mode": "same_as_vpc"
},
"bytecode": {
"base": "0x140019040",
"size": "0x5bf"
},
"entry_context": {
"initial_vpc": "0x140019040",
"registers": {
"rsp": "0x5ffed8",
"rcx": 5,
"rdx": "0x11000000"
}
}
}Supported VPC storage modes currently include:
register;memory;memory_at_register.
VIP configuration can use same_as_vpc, a manual VPC-like location, a simple
value derived from VPC, or a manual expression. See
tests/config_test.cpp for the configuration shapes that
are covered by tests.
entry_context can seed initial VPC, registers, and selected memory locations.
warmup can be used to execute an initial native prefix before regular
VPC-sensitive recovery.
Public sample sources and selected artifacts live under samples/:
- samples/tigress_ifnest/: the current public ifnest case, including a generated binary, config, command line, raw pseudo-C, and a hand-cleaned excerpt;
- samples/tigress/fortigress.c: the current ifnest-style Tigress target;
- samples/tigress/yaps_sample.c: a smaller Tigress sample;
- samples/manual_vm/vm_run_sample.c: a manual VM experiment.
There is also a pure C switch-VM fixture in tests/fixtures/switch_vm_fixture.c, with notes in tests/fixtures/switch_vm_fixture.md.
Non-submodule code in this repository is currently licensed under AGPL-3.0-only. This is a temporary strict license while the project is still incomplete and changing quickly. After the implementation, tests, and public evaluation story become more mature, the project is expected to move to a more permissive license such as Apache-2.0.
Git submodules and other third-party dependencies remain governed by their own licenses.
This is still an active research codebase. The flat CFG recovery path is useful enough to run real experiments, but the project is deliberately conservative about public claims:
- no universal virtualization deobfuscation claim;
- no final-quality decompiler claim;
- no claim that automatic VPC identification is solved;
- no claim that commercial virtualizers are supported broadly.
The near-term work is to keep widening the tested VM shapes, improve p-code CFG simplification, and make the pseudo-C output less noisy without hiding the current artifacts.