Software Spots and Fixes Hang Bugs in Seconds, Rather Than Weeks
For Immediate Release
Hang bugs – when software gets stuck, but doesn’t crash – can frustrate both users and programmers, taking weeks for companies to identify and fix. Now researchers from North Carolina State University have developed software that can spot and fix the problems in seconds.
“Many of us have experience with hang bugs – think of a time when you were on website and the wheel just kept spinning and spinning,” says Helen Gu, co-author of a paper on the work and a professor of computer science at NC State. “Because these bugs don’t crash the program, they’re hard to detect. But they can frustrate or drive away customers and hurt a company’s bottom line.”
With that in mind, Gu and her collaborators developed an automated program, called HangFix, that can detect hang bugs, diagnose the relevant problem, and apply a patch that corrects the root cause of the error. Video of Gu discussing the program can be found here.
The researchers tested a prototype of HangFix against 42 real-world hang bugs in 10 commonly used cloud server applications. The bugs were drawn from a database of hang bugs that programmers discovered affecting various websites. HangFix fixed 40 of the bugs in seconds.
“The remaining two bugs were identified and partially fixed, but required additional input from programmers who had relevant domain knowledge of the application,” Gu says.
For comparison, it took weeks or months to detect, diagnose and fix those hang bugs when they were first discovered.
“We’re optimistic that this tool will make hang bugs less common – and websites less frustrating for many users,” Gu says. “We are working to integrate Hangfix into InsightFinder.” InsightFinder is the AI-based IT operations and analytics startup founded by Gu.
The paper, “HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems,” is being presented at the ACM Symposium on Cloud Computing (SoCC’20), being held online Oct. 19-21. The paper was co-authored by Jingzhu He, a Ph.D. student at NC State who is nearing graduation; Ting Dai, a Ph.D. graduate of NC State who is now at IBM Research; and Guoliang Jin, an assistant professor of computer science at NC State.
The work was done with support from the National Science Foundation under grants 1513942 and 1149445.
HangFix is the latest in a long line of tools Gu’s team has developed to address cloud computing challenges. Her 2011 paper, “CloudScale: Elastic Resource Scaling for Multi-tenant Cloud Systems,” was selected as the winner of the 2020 SoCC 10-Year Award at this year’s conference.
Note to Editors: The study abstract follows.
“HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems”
Authors: Jingzhu He, Xiaohui Gu and Guoliang Jin, North Carolina State University; Ting Dai, IBM Research
Presented: Oct. 19-21, ACM Symposium on Cloud Computing (SoCC’20)
Abstract: Software hang bugs are notoriously difficult to debug, which often cause serious service outages in cloud systems. In this paper, we present HangFix, a software hang bug fixing framework which can automatically fix a hang bug that is triggered and detected in production cloud environments. HangFix first leverages stack trace analysis to localize the hang function and then performs root cause pattern matching to classify hang bugs into different types based on likely root causes. Next, HangFix generates effective code patches based on the identified root cause patterns. We have implemented a prototype of HangFix and evaluated the system on 42 real-world software hang bugs in 10 commonly used cloud server applications. Our results show that HangFix can successfully fix 40 out of 42 hang bugs in seconds.