BINARY PROGRAM DEPENDENCE ANALYSIS: TECHNIQUES, CHALLENGES, AND FUTURE DIRECTIONS
Volume 3, Issue 1, Pp 62-83, 2025
DOI: https://doi.org/10.61784/wjit3024
Author(s)
ChunFang Li1,2, Yu Wen1*, Dan Meng2,3
Affiliation(s)
1State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China.
2Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China.
3School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China.
Corresponding Author
Yu Wen
ABSTRACT
Binary program dependence analysis is pivotal for security applications such as vulnerability detection and malware analysis, yet faces significant challenges due to path explosion, indirect branches, and over-approximation. This survey systematically examines state-of-the-art techniques, including value set analysis (VSA), path-sampling methods (BDA, DueForce), block memory models (BPA, BinPointer), and machine learning approaches (NeuDep), to address three core research questions: (1) how existing methods achieve scalability, (2) the compromises made in scalability and their impact on precision/soundness, and (3) alternative strategies to transcend these tradeoffs. We propose a three-dimensional analytical framework—methodological taxonomy, empirical evaluation, and forward-looking synthesis—to categorize 11 representative tools and evaluate their performance on the SPEC CINT 2000 benchmark. Key findings reveal that path-sampling methods like BDA balance soundness and efficiency but struggle with complex control flow, while machine learning-based NeuDep mitigates false positives through hybrid modeling. Dynamic analysis (DueForce) prioritizes precision but suffers from scalability limitations. Our contributions include a novel taxonomy exposing precision-soundness-scalability tradeoffs, a refined evaluation methodology integrating symbolic execution for accuracy validation, and pioneering pathways for next-generation analysis via sparse value-flow analysis. The results underscore the need for context-aware strategies to handle modern software complexity, offering actionable insights for advancing binary analysis in security hardening and vulnerability defense.
KEYWORDS
Dependence analysis; Binary analysis; Static analysis; Path explosion; Abstract interpretation
CITE THIS PAPER
ChunFang Li, Yu Wen, Dan Meng. Binary program dependence analysis: techniques, challenges, and future directions. World Journal of Information Technology. 2025, 3(1): 62-83. DOI: https://doi.org/10.61784/wjit3024.
REFERENCES
[1] Zhang, Z, You, W, Tao, G, et al. BDA: practical dependence analysis for binary executables by unbiased whole-program path sampling and per-path abstract interpretation. Proceedings of the ACM on Programming Languages, 2019; 3(OOPSLA): 1-31. DOI: 10.1145/3360563.
[2] Pei, K, She, D, Wang, M, et al. NeuDep: neural binary memory dependence analysis. in ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022. ACM. DOI: 10.1145/3540250.3549147.
[3] He, D, Xie, D, Wang, Y, et al. Define-Use Guided Path Exploration for Better Forced Execution. in ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024. ACM. DOI: 10.1145/3650212.3652128.
[4] Gui, B, Song, W, Huang, J. UAFSan: an object-identifier-based dynamic approach for detecting use-after-free vulnerabilities. in ISSTA '21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2021. ACM. DOI: 10.1145/3460319.3464835.
[5] Cheng, K, Zheng, Y, Liu, T, et al. Detecting Vulnerabilities in Linux-Based Embedded Firmware with SSE-Based On-Demand Alias Analysis. in ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023. ACM. DOI: 10.1145/3597926.3598062.
[6] Zhang, M, Sekar, R. Control Flow Integrity for COTS Binaries. USENIX Association. 2013.
[7] Van Der Veen, V, Goktas, E, Contag, M, et al. A Tough Call: Mitigating Advanced Code-Reuse Attacks at the Binary Level. in 2016 IEEE Symposium on Security and Privacy (SP). 2016. IEEE. DOI: 10.1109/SP.2016.60.
[8] Gu, Y, Zhao, Q, Zhang, Y, et al. PT-CFI: Transparent Backward-Edge Control Flow Violation Detection Using Intel Processor Trace. in CODASPY '17: Seventh ACM Conference on Data and Application Security and Privacy. 2017. ACM. DOI: 10.1145/3029806.3029830.
[9] Yan, J, Yan, G, Jin, D. Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network. in 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2019. IEEE. DOI: 10.1109/DSN.2019.00020.
[10] Yin, H, Song, D, Egele, M, et al. Panorama: capturing system-wide information flow for malware detection and analysis. in CCS07: 14th ACM Conference on Computer and Communications Security 2007. 2007. ACM. DOI: 10.1145/1315245.1315261.
[11] Cha, S K, Avgerinos, T, Rebert, A, et al. Unleashing Mayhem on Binary Code. in 2012 IEEE Symposium on Security and Privacy (SP) Conference dates subject to change. 2012. IEEE. DOI: 10.1109/SP.2012.31.
[12] Cozzi, E, Graziano, M. Fratantonio, Y, et al. Understanding Linux Malware. in 2018 IEEE Symposium on Security and Privacy (SP). 2018. IEEE. DOI: 10.1109/SP.2018.00054.
[13] Wu, W, Chen, Y, Xing, X, et al. KEPLER: Facilitating control-flow hijacking primitive evaluation for linux kernel vulnerabilities. USENIX Association. 2019.
[14] Spensky, C, Machiry, A, Burow, N, et al. Glitching Demystified: Analyzing Control-flow-based Glitching Attacks and Defenses. in 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2021. IEEE. DOI: 10.1109/DSN48987.2021.00051.
[15] Duta, V, Giuffrida, C, Bos, H, et al. PIBE: practical kernel control-flow hardening with profile-guided indirect branch elimination. in ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021. ACM. DOI: 10.1145/3445814.3446740.
[16] Chen, Y, Zhang, D, Wang, R, et al. NORAX: Enabling Execute-Only Memory for COTS Binaries on AArch64. in 2017 IEEE Symposium on Security and Privacy (SP). 2017. IEEE. DOI: 10.1109/SP.2017.30.
[17] MITRE. CWE Top 25 Most Dangerous Software Weaknesses. 2024. Retrieved from: https://cwe.mitre.org/top25/.
[18] Schloegel, M, Bars, N, Schiller, N, et al. SoK: Prudent Evaluation Practices for Fuzzing. in 2024 IEEE Symposium on Security and Privacy (SP). 2024. IEEE. DOI: 10.1109/SP54263.2024.00137.
[19] Kim, T E, Choi, J, Heo, K, et al. DAFL: Directed grey-box fuzzing guided by data dependency. USENIX Association. 2023.
[20] Balakrishnan, G, Reps, T. Analyzing Memory Accesses in x86 Executables, in Compiler Construction, E. Duesterwald, Editor. Springer Berlin Heidelberg: Berlin, Heidelberg. 2004, 5-23.
[21] Balakrishnan, G, Reps, T. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems, 2010, 32(6): 1-84. DOI: 10.1145/1749608.1749612.
[22] Song, D, Brumley, D, Yin, H, et al. BitBlaze: A New Approach to Computer Security via Binary Analysis, in Information Systems Security, R. Sekar and A.K. Pujari, Editors. Springer Berlin Heidelberg: Berlin, Heidelberg. 2008, 1-25.
[23] Shoshitaishvili, Y, Wang, R, Salls, C, et al. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. in 2016 IEEE Symposium on Security and Privacy (SP). 2016. IEEE. DOI: 10.1109/SP.2016.17.
[24] Cousot, P, Cousot, R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. in the 4th ACM SIGACT-SIGPLAN symposium. 1977. ACM Press. DOI: 10.1145/512950.512973.
[25] Park, J, Lee, H, Ryu, S. A Survey of Parametric Static Analysis. ACM Computing Surveys, 2022, 54(7): 1-37. DOI: 10.1145/3464457.
[26] Baldoni, R, Coppa, E, D’elia, D C, et al. A Survey of Symbolic Execution Techniques. ACM Computing Surveys, 2019, 51(3): 1-39. DOI: 10.1145/3182657.
[27] Reps, T, Balakrishnan, G. Improved Memory-Access Analysis for x86 Executables, in Compiler Construction, L. Hendren, Editor. Springer Berlin Heidelberg: Berlin, Heidelberg. 2008, 16-35.
[28] Amme, W, Braun, P, Zehendner, E, et al. Data dependence analysis of assembly code. in 1998 International Conference on Parallel Architectures and Compilation Techniques. 1998. IEEE Comput. Soc. DOI: 10.1109/PACT.1998.727270.
[29] Kim, S H, Sun, C, Zeng, D, et al. Refining Indirect Call Targets at the Binary Level. in Network and Distributed System Security Symposium. 2021. Internet Society. DOI: 10.14722/ndss.2021.24386.
[30] Kim, S H, Zeng, D, Sun, C, et al. BinPointer: towards precise, sound, and scalable binary-level pointer analysis. in CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction. 2022. ACM. DOI; 10.1145/3497776.3517776.
[31] Chipounov, V, Kuznetsov, V, Candea, G. S2E: a platform for in-vivo multi-path analysis of software systems. ACM SIGARCH Computer Architecture News, 2011, 39(1): 265-278. DOI: 10.1145/1961295.1950396.
[32] Cadar, C, Dunbar, D, Engler, D. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. USENIX Association. 2008.
[33] Mu, D, Guo, W, Cuevas, A, et al. RENN: Efficient Reverse Execution with Neural-Network-Assisted Alias Analysis. in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2019. IEEE. DOI: 10.1109/ASE.2019.00090.
[34] Guo, W, Mu, D, Xing, X, et al. DEEPVSA: Facilitating Value-set Analysis with Deep Learning for Postmortem Program Analysis. USENIX Association. 2019.
[35] Debray, S, Muth, R, Weippert, M. Alias analysis of executable code. in the 25th ACM SIGPLAN-SIGACT symposium. 1998. ACM Press. DOI: 10.1145/268946.268948.
[36] Aho, A V, Lam, M S, Sethi, R, et al. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc. 2006.
[37] Landi, W, Ryder, B G. Pointer-induced aliasing: a problem taxonomy. in the 18th ACM SIGPLAN-SIGACT symposium. 1991. ACM Press. DOI: 10.1145/99583.99599.
[38] Deutsch, A. Interprocedural may-alias analysis for pointers: beyond k-limiting. in PLDI94: ACM SIGPLAN Conference on Programming Language Design and Implementation. 1994. ACM. DOI: 10.1145/178243.178263.
[39] Xu, J, Mu, D, Xing, X, et al. Postmortem Program Analysis with Hardware-Enhanced Post-Crash Artifacts. USENIX Association. 2017.
[40] Zhu, W, Feng, Z, Zhang, Z, et al. Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning. in 2023 IEEE Symposium on Security and Privacy (SP). 2023. IEEE. DOI: 10.1109/SP46215.2023.10179482.
[41] Meng, X, Miller, B P. Binary code is not easy. in ISSTA '16: International Symposium on Software Testing and Analysis. 2016. ACM. DOI: 10.1145/2931037.2931047.
[42] Meng, X, Anderson, J M, Mellor-Crummey, J, et al. Parallel binary code analysis. in PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2021. ACM. DOI: 10.1145/3437801.3441604.
[43] Xu, L, Sun, F, Su, Z. Constructing Precise Control Flow Graphs from Binaries. 2012.
[44] Nguyen, H, Priyadarshan, S, Sekar, R. Scalable, Sound, and Accurate Jump Table Analysis. in ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024. ACM. DOI: 10.1145/3650212.3680301.
[45] Reps, T, Horwitz, S, Sagiv, M. Precise interprocedural dataflow analysis via graph reachability. in the 22nd ACM SIGPLAN-SIGACT symposium. 1995. ACM Press. DOI: 10.1145/199448.199462.
[46] Livshits, V B, Lam, M S. Tracking pointers with path and context sensitivity for bug detection in C programs. in 2003. Association for Computing Machinery. DOI: 10.1145/940071.940114.
[47] Yu, H, Xue, J, Huo, W, et al. Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code. in CGO '10: 8th Annual IEEE/ ACM International Symposium on Code Generation and Optimization. 2010. ACM. DOI: 10.1145/1772954.1772985.
[48] Van Der Veen, V, Andriesse, D, Goktas, E, et al. Practical Context-Sensitive CFI. in CCS'15: The 22nd ACM Conference on Computer and Communications Security. 2015. ACM. DOI: 10.1145/2810103.2813673.
[49] Dillig, I, Dillig, T, Aiken, A. Sound, complete and scalable path-sensitive analysis. in PLDI '08: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2008. ACM. 10.1145/1375581.1375615
[50] Sui, Y., Ye, S, Xue, J, et al. SPAS: Scalable Path-Sensitive Pointer Analysis on Full-Sparse SSA, in Programming Languages and Systems, H. Yang, Editor. Springer Berlin Heidelberg: Berlin, Heidelberg. 2011, 155-171.
[51] Shi, Q, Xiao, X, Wu, R, et al. Pinpoint: fast and precise sparse value flow analysis for million lines of code. in PLDI '18: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2018. ACM. DOI: 10.1145/3192366.3192418.
[52] Li, T, Bai, J J, Sui, Y, et al. Path-sensitive and alias-aware typestate analysis for detecting OS bugs. in ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2022. ACM. DOI: 10.1145/3503222.3507770.
[53] Shi, Q, Wu, R, Fan, G, et al. Conquering the extensional scalability problem for value-flow analysis frameworks. in ICSE '20: 42nd International Conference on Software Engineering. 2020. ACM. DOI: 10.1145/3377811.3380346.
[54] Balakrishnan, G, Gruian, R, Reps, T, et al. CodeSurfer/x86—A platform for analyzing x86 executables, in Proceedings of the 14th international conference on Compiler Construction. Springer-Verlag: Edinburgh, UK. 2005, 250-254.
[55] Pesch, R H, Osier, J M. The GNU binary utilities. Free Software Foundation, 1993.
[56] Ferguson, J, Kaminsky, D. Reverse engineering code with IDA Pro. Syngress. 2008.
[57] Balakrishnan, G, Gruian, R, Reps, T, et al. CodeSurfer/x86—A platform for analyzing x86 executables, in Proceedings of the 14th international conference on Compiler Construction. Springer-Verlag: Edinburgh, UK. 2005, 250-254.
[58] Wang, S, Wang, P, Wu, D. Reassembleable Disassembling. USENIX Association. 2015.
[59] Bauman, E, Lin, Z, Hamlen, K W. Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics. in Network and Distributed System Security Symposium. 2018. Internet Society. DOI: 10.14722/ndss.2018.23300
[60] Miller, K, Kwon, Y, Sun, Y, et al. Probabilistic Disassembly. in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 2019. IEEE. DOI: 10.1109/ICSE.2019.00121.
[61] Wang, R, Shoshitaishvili, Y, Bianchi, A, et al. Ramblr: Making Reassembly Great Again. in Network and Distributed System Security Symposium. 2017. Internet Society. DOI: 10.14722/ndss.2017.23225.
[62] Alves-Foss, J, Song, J. Function boundary detection in stripped binaries. in ACSAC '19: 2019 Annual Computer Security Applications Conference. 2019. ACM. DOI: 10.1145/3359789.3359825.
[63] Kim, S, Kim, H, Cha, S K. FunProbe: Probing Functions from Binary Code through Probabilistic Analysis. in ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023. ACM. DOI: 10.1145/3611643.3616366.
[64] Andriesse, D, Slowinska, A, Bos, H. Compiler-Agnostic Function Detection in Binaries. in 2017 IEEE European Symposium on Security and Privacy (EuroS&P). 2017. IEEE. DOI: 10.1109/EuroSP.2017.11.
[65] Di Federico, A, Payer, M, Agosta, G. rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. in CC '17: Compiler Construction. 2017. ACM. DOI: 10.1145/3033019.3033028.
[66] Luk, C K, Cohn, R, Muth, R, et al. Pin: building customized program analysis tools with dynamic instrumentation. Acm sigplan notices, 2005, 40(6): 190-200.
[67] Eom, H, Kim, D, Lim, S, et al. R2I: A Relative Readability Metric for Decompiled Code. Proceedings of the ACM on Software Engineering, 2024, 1(FSE): 383-405.
[68] Borzacchiello, L, Coppa, E, Demetrescu, C. SENinja: A symbolic execution plugin for Binary Ninja. SoftwareX, 2022, 20, 101219. DOI: https://doi.org/10.1016/j.softx.2022.101219.
[69] Pang, C, Yu, R, Chen, Y, et al. SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask. in 2021 IEEE Symposium on Security and Privacy (SP). 2021. IEEE. DOI: 10.1109/SP40001.2021.00012.
[70] Priyadarshan, S, Nguyen, H, Sekar, R. Accurate Disassembly of Complex Binaries Without Use of Compiler Metadata. in ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4. 2023. ACM. DOI: 10.1145/3623278.3624766.
[71] Afianian, A, Niksefat, S, Sadeghiyan, B. et al. Malware Dynamic Analysis Evasion Techniques: A Survey. ACM Computing Surveys, 2020, 52(6): 1-28. DOI: 10.1145/3365001.
[72] Andriesse, D, Slowinska, A, Bos, H.Compiler-agnostic function detection in binaries. 2017. DOI; 10.1109/EuroSP.2017.11.
[73] Morrisett, G, Tan, G, Tassarotti, J, et al. RockSalt: better, faster, stronger SFI for the x86. SIGPLAN Not., 2012, 47(6): 395-404. DOI: 10.1145/2345156.2254111.
[74] Morrisett, G, Tan, G, Tassarotti, J, et al. RockSalt: better, faster, stronger SFI for the x86, in Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery: Beijing, China. 2012, 395-404.
[75] Pei, K, Guan, J, Williams-King, D, et al. XDA: Accurate, Robust Disassembly with Transfer Learning. in Network and Distributed System Security Symposium. 2021. Internet Society. 10.14722/ndss.2021.23112
[76] Yu, S., Y. Qu, X. Hu, et al. DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly. in 2022. USENIX Association.
[77] David, Y., U. Alon, and E. Yahav. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28. DOI: 10.1145/3428293.
[78] Chen, S, Lin, Z, Zhang, Y. SelectiveTaint: Efficient Data Flow Tracking With Static Binary Rewriting. USENIX Association. 2021.
[79] Ming, J, Xu, D, Jiang, Y, et al. BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking. USENIX Association. 2017.
[80] Ghaffarinia, M, Hamlen, K W. Binary Control-Flow Trimming. in CCS '19: 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019. ACM. DOI: 10.1145/3319535.3345665.
[81] Lemerre, M. SSA Translation Is an Abstract Interpretation. Proceedings of the ACM on Programming Languages, 2023, 7(POPL): 1895-1924. DOI: 10.1145/3571258.
[82] Cui, W, Ge, X, Kasikci, B, et al. REPT: Reverse debugging of failures in deployed software. USENIX Association. 2018.
[83] Corporation, S P E. SPEC CINT2000 (Integer Component of SPEC CPU2000). 2006.
[84] Sui, Y, Xue, J. SVF: interprocedural static value-flow analysis in LLVM. in CGO '16: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 2016. ACM. DOI: 10.1145/2892208.2892235.
[85] Hackett, B, Aiken, A. How is aliasing used in systems software? in 2006. Association for Computing Machinery. DOI: 10.1145/1181775.1181785.
[86] Cherem, S, Princehouse, L, Rugina, R. Practical memory leak detection using guarded value-flow analysis. in PLDI '07: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2007. ACM. DOI: 10.1145/1250734.1250789.
[87] Hardekopf, B, Lin, C. Semi-sparse flow-sensitive pointer analysis. 2009. Association for Computing Machinery. DOI: 10.1145/1480881.1480911.
[88] Hardekopf, B, Lin, C. Flow-sensitive pointer analysis for millions of lines of code. in 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2011. IEEE. DOI: 10.1109/CGO.2011.5764696.
[89] Akers. Binary Decision Diagrams. IEEE Transactions on Computers, 1978, C-27(6): 509-516. DOI: 10.1109/TC.1978.1675141.
[90] Sui, Y, Ye, D, Xue, J. Static memory leak detection using full-sparse value-flow analysis. in ISSTA '12: International Symposium on Software Testing and Analysis. 2012. ACM. DOI: 10.1145/2338965.2336784.
[91] Sui, Y, Ye, D, Xue, J. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Trans. Softw. Eng., 2014, 40(2): 107-122. DOI: 10.1109/tse.2014.2302311.
[92] de Moura, L, Bjrner, N. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg. 2008, 4963, 337-340. DOI: https://doi.org/10.1007/978-3-540-78800-3_24.
[93] Shi, Q, Yao, P, Wu, R, et al. Path-sensitive sparse analysis without path conditions. in PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 2021. ACM. DOI: 10.1145/3453483.3454086.
[94] Yao, P, Zhou, J, Xiao, X, et al. Falcon: A Fused Approach to Path-Sensitive Sparse Data Dependence Analysis. Proceedings of the ACM on Programming Languages, 2024, 8(PLDI): 567-592. DOI: 10.1145/3656400.
[95] Yao, P, Shi, Q, Huang, H, et al. Fast bit-vector satisfiability. in ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2020. ACM. DOI: 10.1145/3395363.3397378.