skip to main content
10.1145/3540250.3558958acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections

AutoTSG: learning and synthesis for incident troubleshooting

Published: 09 November 2022 Publication History


Incident management is a key aspect of operating large-scale cloud services. To aid with faster and efficient resolution of incidents, engineering teams document frequent troubleshooting steps in the form of Troubleshooting Guides (TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed, unstructured, and often incomplete, requiring developers to manually understand and execute necessary steps. This results in a plethora of issues such as on-call fatigue, reduced productivity, and human errors. In this work, we conduct a large-scale empirical study of over 4K+ TSGs mapped to incidents and find that TSGs are widely used and help significantly reduce mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and propose a taxonomy of issues that highlights significant gaps in TSG quality. To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG -- a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis. Our evaluation of AutoTSG on 50 TSGs shows the effectiveness in both identifying TSG statements (accuracy 0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly, we survey ten Microsoft engineers and show the importance of TSG automation and the usefulness of AutoTSG.


Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, and David C Shepherd. 2020. Software documentation: the practitioners’ perspective. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 590–601.
Emad Aghajani, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, and Michele Lanza. 2019. Software documentation issues unveiled. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1199–1210.
Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. 2017. Low data drug discovery with one-shot learning. ACS central science, 3, 4 (2017), 283–293.
Naomi S Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46, 3 (1992), 175–185.
Jesper Andersen and Julia L Lawall. 2010. Generic patch inference. Automated software engineering, 17, 2 (2010), 119–148.
Chetan Bansal, Sundararajan Renganathan, Ashima Asudani, Olivier Midy, and Mathru Janakiraman. 2020. DeCaf: Diagnosing and Triaging Performance Issues in Large-Scale Cloud Services. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).
Y-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A Theoretical Analysis of Feature Pooling in Visual Recognition. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10). Omnipress, Madison, WI, USA. 111–118. isbn:9781605589077
Leo Breiman. 2001. Random forests. Machine learning, 45, 1 (2001), 5–32.
J. Chen, X. He, Q. Lin, Y. Xu, H. Zhang, D. Hao, F. Gao, Z. Xu, Y. Dang, and D. Zhang. 2019. An Empirical Investigation of Incident Triage for Online Service Systems. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 111–120.
J. Chen, X. He, Q. Lin, H. Zhang, D. Hao, F. Gao, Z. Xu, Y. Dang, and D. Zhang. 2019. Continuous Incident Triage for Large-Scale Online Service Systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 364–375.
Jie-Cherng Chen and Sun-Jen Huang. 2009. An empirical analysis of the impact of software development problem factors on software maintainability. Journal of Systems and Software, 82, 6 (2009), 981–992.
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1 (1960), 37–46.
Sergio Cozzetti B de Souza, Nicolas Anquetil, and Káthia M de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd annual international conference on Design of communication: documenting & designing for pervasive information. 68–75.
Yan Duan, Marcin Andrychowicz, Bradly Stadie, OpenAI Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. 2017. One-shot imitation learning. Advances in neural information processing systems, 30 (2017).
Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28, 4 (2006), 594–611.
Michael Fink. 2004. Object classification from a single example utilizing class relevance metrics. Advances in neural information processing systems, 17 (2004).
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. 1126–1135.
Golara Garousi, Vahid Garousi, Mahmoud Moussavi, Guenther Ruhe, and Brian Smith. 2013. Evaluating usage and quality of technical software documentation: an empirical study. In Proceedings of the 17th international conference on evaluation and assessment in software engineering. 24–35.
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM Sigplan Notices, 46, 1 (2011), 317–330.
Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program synthesis. Foundations and Trends® in Programming Languages, 4, 1-2 (2017), 1–119.
Zellig S Harris. 1954. Distributional structure. Word, 10, 2-3 (1954), 146–162.
Jiajun Jiang, Weihai Lu, Junjie Chen, Qingwei Lin, Pu Zhao, Yu Kang, Hongyu Zhang, Yingfei Xiong, Feng Gao, and Zhangwei Xu. 2020. How to mitigate the incident? an effective troubleshooting guide recommendation technique for online service systems. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1410–1420.
Shinji Kikuchi. 2015. Prediction of workloads in incident management based on incident ticket updating history. In 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC). 333–340.
Nikita Kitaev, Steven Cao, and Dan Klein. 2019. Multilingual Constituency Parsing with Self-Attention and Pre-Training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy. 3499–3505.
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop. 2, 0.
Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
Vu Le and Sumit Gulwani. 2014. Flashextract: A framework for data extraction by examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 542–553.
Alexander LeClair, Zachary Eberhart, and Collin McMillan. 2018. Adapting Neural Text Classification for Improved Software Categorization. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 461–472.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1, 4 (1989), 541–551.
Olaf Leß enich, Sven Apel, and Christian Lengauer. 2015. Balancing precision and performance in structured merge. Automated Software Engineering, 22, 3 (2015), 367–397.
Liqun Li, Xu Zhang, Xin Zhao, Hongyu Zhang, Yu Kang, Pu Zhao, Bo Qiao, Shilin He, Pochian Lee, and Jeffrey Sun. 2021. Fighting the Fog of War: Automated Incident Detection for Cloud Systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 131–146.
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory, 28, 2 (1982), 129–137.
Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, and Zhe Wang. 2014. Correlating events with time series for incident diagnosis. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1583–1592.
John MacFarlane. [n.d.]. Pandoc.
Na Meng, Miryung Kim, and Kathryn S McKinley. 2011. Systematic editing: generating program transformations from an example. ACM SIGPLAN Notices, 46, 6 (2011), 329–342.
Microsoft. [n.d.]. “Azure Data Factory”.
Microsoft. [n.d.]. “Azure Monitor”.
Microsoft. [n.d.]. “Kusto Query Language (KQL)”.
Microsoft. [n.d.]. “Microsoft program synthesis using examples (prose) sdk.”. Accessed: 2022-05-19.
Microsoft. [n.d.]. “Powershell”.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Vinod Nair, Ameya Raul, Shwetabh Khanduja, Vikas Bahirwani, Qihong Shao, Sundararajan Sellamanickam, Sathiya Keerthi, Steve Herbert, and Sudheer Dhulipalla. 2015. Learning a hierarchical monitoring system for detecting and diagnosing service issues. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2029–2038.
Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu Lahiri, and Mike Kaufman. 2021. Can program synthesis be used to learn merge conflict resolutions? an empirical analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 785–796.
Constituency Parsing. 2009. Speech and language processing.
Reinhold Plösch, Andreas Dautovic, and Matthias Saft. 2014. The value of software documentation quality. In 2014 14th International Conference on Quality Software. 333–342.
Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta. 45–50.
John Robinson. 2014. Likert Scale. Springer Netherlands, Dordrecht. 3620–3621. isbn:978-94-007-0753-5
Amrita Saha and Steven CH Hoi. 2022. Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps. arXiv preprint arXiv:2204.11598.
Jürgen Schmidhuber. 1987. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Ph.D. Dissertation. Technische Universität München.
Manish Shetty, Chetan Bansal, Sumit Kumar, Nikitha Rao, and Nachiappan Nagappan. 2021. SoftNER: Mining Knowledge Graphs From Cloud Incidents.
Manish Shetty, Chetan Bansal, Sumit Kumar, Nikitha Rao, Nachiappan Nagappan, and Thomas Zimmermann. 2021. Neural knowledge extraction from cloud service incidents. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 218–227.
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30 (2017).
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1199–1208.
Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B Tenenbaum, and Phillip Isola. 2020. Rethinking few-shot image classification: a good embedding is all you need? In European Conference on Computer Vision. 266–282.
Secil Ugurel, Robert Krovetz, and C. Lee Giles. 2002. What’s the Code? Automatic Classification of Source Code Archives. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). Association for Computing Machinery, New York, NY, USA. 632–638. isbn:158113567X
Amrisha Vaish, Tobias Grossmann, and Amanda L Woodward. 2008. Not all emotions are created equal: the negativity bias in social-emotional development. Psychological bulletin, 134 3 (2008), 383–403.
Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548.
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, and Daan Wierstra. 2016. Matching networks for one shot learning. Advances in neural information processing systems, 29 (2016).
Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12203–12213.

Cited By

View all
  • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
  • (2024)Building AI Agents for Autonomous Clouds: Challenges and Design PrinciplesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698525(99-110)Online publication date: 20-Nov-2024
  • (2024)LLexus: an AI agent system for incident managementACM SIGOPS Operating Systems Review10.1145/3689051.368905658:1(23-36)Online publication date: 14-Aug-2024



Information & Contributors


Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022


Request permissions for this article.

Check for updates

Author Tags

  1. Cloud Reliability
  2. Incident Management
  3. Meta Learning
  4. Program Synthesis
  5. Troubleshooting


  • Research-article



Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)16
Reflects downloads up to 03 Jan 2025

Other Metrics


Cited By

View all
  • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
  • (2024)Building AI Agents for Autonomous Clouds: Challenges and Design PrinciplesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698525(99-110)Online publication date: 20-Nov-2024
  • (2024)LLexus: an AI agent system for incident managementACM SIGOPS Operating Systems Review10.1145/3689051.368905658:1(23-36)Online publication date: 14-Aug-2024

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media