Obfuscated Strings Threatening Your Privacy

Reviewed by Greg Wilson / 2022-03-13
Keywords: Security

One of the drawbacks of reading the software engineering research literature is discovering just how scary the world actually is. Take Glanz2020, for example: in it, the authors show that most string obfuscation techniques can be broken easily and automatically, revealing API keys, passwords, and other sensitive information. Findings like this should be part of every programmer's training: everyone who writes code should know that while scrambling strings is easy, the safety it appears to offer is illusory. More importantly, everyone should know why, so that they can push back against colleagues who say, "But this thread on Reddit says…"

Glanz2020 Leonid Glanz, Patrick Müller, Lars Baumgärtner, Michael Reif, Sven Amann, Pauline Anthonysamy, and Mira Mezini: Hidden in plain sight: obfuscated strings threatening your privacy. In Proc. ACCCS 2020, doi:10.1145/3320269.3384745.

String obfuscation is an established technique used by proprietary, closed-source applications to protect intellectual property. Furthermore, it is also frequently used to hide spyware or malware in applications. In both cases, the techniques range from bit-manipulation over XOR operations to AES encryption. However, string obfuscation techniques/tools suffer from one shared weakness: They generally have to embed the necessary logic to deobfuscate strings into the app code. In this paper, we show that most of the string obfuscation techniques found in malicious and benign applications for Android can easily be broken in an automated fashion. We developed StringHound, an open-source tool that uses novel techniques that identify obfuscated strings and reconstruct the originals using slicing. We evaluated StringHound on both benign and malicious Android apps. In summary, we deobfuscate almost 30 times more obfuscated strings than other string deobfuscation tools. Additionally, we analyzed 100,000 Google Play Store apps and found multiple obfuscated strings that hide vulnerable cryptographic usages, insecure internet accesses, API keys, hard-coded passwords, and exploitation of privileges without the awareness of the developer. Furthermore, our analysis reveals that not only malware uses string obfuscation but also benign apps make extensive use of string obfuscation.