The accuracy and efficiency of program analyses are hindered by very large control flow graphs (CFG). This paper presents an improved GN (Girvan-Newman) algorithm for CFG division. The node weights are added as parameters to the betweenness calculation to better balance the subgraph sizes with the sizes controlled dynamically to terminate the algorithm at a suitable time to improve the execution efficiency. Then, the binary programs indicated by the CFGs are analyzed using the angr tool. The improved GN algorithm, K-means algorithm, spectral clustering algorithm and naive aggregation algorithm were all tested with the results showing the improved GN algorithm provided the best modularity and subgraph size balance.
摘要 针对控制流图规模过大导致的程序分析准确度和效率不够理想的问题, 该文提出了一种用于控制流图划分的改进GN (Girvan-Newman) 算法, 在边介数计算中加入点权值作为参数, 使划分所得各子图的规模更加平衡; 通过动态控制子图的规模, 在合适的时机提前终止算法执行, 提高执行效率。利用angr工具对二进制程序进行分析所得到的控制流图, 分别采用改进GN算法、 K-means算法、谱聚类算法和朴素凝聚算法进行实验, 比较不同算法对控制流图划分结果中的模块度以及均衡性等指标, 证明改进GN算法具有最佳的划分结果和执行效率。