FDiag

A log-analysis toolkit to support failure diagnosis from cluster message logs

View the Project on GitHub diag-toolkits/FDiag

Welcome to the FDiag log-analysis Github project page.

FDiag is an open-source toolkit, developed using C++, Boost [3] and R [4], that extracts message templates from Linux syslogs, Rationalized message logs [1], IBM BlueGene logs and Cray-XT logs, and automatically generates reports that administrators of cluster systems (both large and small) can use to identify the sources (nodes) and the likely causes (correlated events) of system failures. The FDiag toolkit is a result of an international collaboration between researchers from Singapore, the United States of America and the United Kingdoms.

FDiag has evolved through three versions - the current version is version 3 [2]. You may download the source code here or click "Download TAR GZ File". The source code will compile under a Linux operating system (Ubuntu, Fedora, CentOS, etc.). An installer script is provided in the package to automate the install process.

References:
[1] J.L. Hammond, T. Minyard, J.C. Browne, End-to-end framework for fault management for open-source clusters: Ranger, in Proceedings of ACM TeraGrid, no. 9, 2010.
[2] E. Chuah, A. Jhumka, J.C. Browne, B. Barth, S. Narasimhamurthy, Insights into the Diagnosis of System Failures from Cluster Message Logs, in Proceedings of EDCC 2015.
[3] Boost, http://www.boost.org/
[4] R project, http://www.r-project.org/