garble: consider ways to better obfuscate function bodies

Right now we strip some information away from the compiled code in function bodies, such as position information, variable names, and the names of funcs and types being used. However, the compiled code looks otherwise extremely similar to its non-obfuscated counterpart, especially in its structure.

For example, if I perform two obfuscated builds of the same program with different seeds, all the func/type/var names will be different, but one could deobfuscate a function body and quickly spot the pairs of corresponding obfuscated names in the two builds, as the structure of the function will be very similar. Meaning that if I manage to figure out what an obfuscated name in one build stands for, I can reuse that knowledge rather easily in the other build.

One can also imagine deobfuscating the Go code and trying to spot common patterns in “idiomatic” Go code, such as if err != nil { handle(err) }. Being able to quickly spot these patterns, even if the names are obfuscated, could lead to an easier understanding of what the code is doing.

We should investigate ways to improve this situation. In general terms, what we want is to deterministically “shuffle” the code around using the seed, akin to what we already do with literal obfuscation or when reordering declarations.

Doing this at the machine code level definitely seems like a bad idea; we’d need to explicitly support each GOARCH target. It would also require being able to modify object files in-place, further increasing the required complexity.

Doing it at the Go syntax level via go/ast is probably the most obvious option we have. We already do something like it when obfuscating literals, and it seems to work well. I think it could become more feasible if we implemented a “reduction” of the AST first, as per https://github.com/burrowers/garble/issues/459.

Doing this at the compiler’s SSA IR level could also be interesting. Advantages:

  • As a heavily simplified form of the code, we could apply “rewrites” of the SSA program more easily. The go/ast is significantly more complex than go/ssa, as there are multiple ways to write the same piece of logic.
  • We could perhaps perform heavier obfuscation this way, as SSA is further down the compiler pipeline when compared to the Go syntax.

Disadvantages:

  • The complexity in design and execution; the SSA is not exposed via -toolexec, and is only kept in memory. This would likely mean having to build and use a modified version of cmd/compile.
  • Unlike go/ast and the Go syntax, Go’s SSA representation is internal and may change in backwards-incompatible ways over the course of Go releases.
  • Harder to contribute: while SSA isn’t a particularly hard concept in the field of compilers, the average Go developer will probably not be familiar with it.
  • Unhelpful for source code obfuscation; https://github.com/burrowers/garble/issues/369 wouldn’t benefit from it at all.

Thus, my initial thoughts are that we should aim for obfuscating func bodies via go/ast rather than the compiler’s internal SSA. Happy to hear opinions, counter-points, or other potential ways to solve this.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 3
  • Comments: 23 (18 by maintainers)

Most upvoted comments

@mvdan The definitive paper for control flow obfuscation is this one: http://ac.inf.elte.hu/Vol_030_2009/003.pdf

There are several basic techniques described in that paper that should be the initial focus for control flow obfuscation. Flattening, Bogus Instructions, and Substitution. That’s a good place to start, and some of these have been mentioned already. Paper authors have a bit more info on their git here: https://github.com/obfuscator-llvm/obfuscator

There are many implementations of this paper already, including the original fork of LLVM and this one based on GCC that actually has a better explanation of some of the methods: https://github.com/meme/hellscape

Basic control flow obfuscation will require AST manipulation, or at least different code generation from the AST, as has been mentioned above. Once the basic stuff is implemented, you could start trying to invent something new and additional, but… the basic methods in the paper are a good mixture of simple to implement and hard to reverse.

Another project worth mentioning is MovFuscator, which compiles everything to MOV’s: https://github.com/xoreaxeaxeax/movfuscator

Just wanted to point out that the things being discussed above are largely solved problems with multiple implementations in other compilers… INCLUDING an implementation that works for gccgo already (which means it can’t target Windows tho, so not so useful).

Java obfuscation is a different art form entirely, because it leans heavily on reflection and the differences between the VM instructions and the source code. Some of the control flow stuff might translate, but you’ll probably have better luck looking at the LLVM-opcode level obfuscation and GCC port of it linked above.

I intend to do a bit of research into that SSA idea

Blocked by https://github.com/golang/go/issues/48525 at the moment; starting to use the SSA package today would mean breaking the obfuscation of some generic programs.

^ I intend to do a bit of research into that SSA idea, and will likely coordinate with @awgh, @pagran and others as I have any updates. @pagran are you OK with giving me a couple of weeks to look into this? Assuming that the experiment will succeed, it will be a promising approach long-term, but also radically different to altering the go/ast directly 😃