Computers and computer systems have become a significant part of our modern society. It is virtually impossible to conduct many day-to-day activities without the aid of computer systems controlled by software. As more reliance is placed on these software systems it is essential that they operate in a reliable manner. Failure to do so can result in high monetary, property or human loss.
The NASA Software Assurance Standard, NASA-STD-8739.8, defines software reliability as a discipline of software assurance that:
This section will provide software practitioners with a basic overview of software reliability, as well as tools and resources on software reliability as documented in NASA-STD-8739.8.
Reliability techniques can be divided into two categories: Trending and Predictive
In practice, reliability trending is more appropriate for software, whereas predictive reliability is more suitable for hardware. Trending reliability can be further classified into four categories: Error Seeding, Failure Rate, Curve Fitting, and Reliability Growth
Listed below are some static code and dynamic metrics which are related to software reliability. The static code metric is divided into three categories with measurements under each: Line Count, Complexity and Structure, and Object-Oriented metrics. The dynamic metric has two major measurements: Failure Rate Data and Problem Reports.
For descriptions and recommended thresholds of the metrics see Software Metrics and Reliability (ISSRE 1998 Best Paper) by Linda Rosenberg, Ted Hammer, and Jack Shaw. IEEE International Symposium on Software Reliability Engineering. 1998.
Mathematically reliability R(t) is the probability that a system will be successful in the interval from time 0 to time t:
where T is a random variable denoting the time-to-failure or failure time.
Unreliability F(t), a measure of failure, is defined as the probability that the system will fail by time t.
In other words, F(t) is the failure distribution function. The following relationship applies to reliability in general. The Reliability R(t), is related to failure probability F(t) by:
Q: How does software reliability relate to software quality?
A: Software reliability is a quality characteristic which quantifies the operational profile of a system.
Q: It is said that most of the concepts from hardware reliability are inappropriately applied to software reliability. Is this true?
A: No. Software reliability was not inappropriately applied to hardware reliability. However, some of the basic statistical concepts in software reliability were deliberately defined in terms of hardware reliability so that systems containing both hardware and software can be assessed.
Q: The software system has been tested and all the bugs correlated to system failure have been fixed. Do we still need to estimate the system's reliability?
A: Yes. The reliability tells you how much additional testing is needed.
Q: Is it more cost effective to focus resources on best practices than on trying to estimate reliability to improve overall software quality?
A: Best practices are directly correlated to improvement of software quality. The improvement is measured using reliability measures.
Q: What are the differences between software and hardware reliability?
A: Some of the important differences between software and hardware reliability are:
Q: Do the same factors, which affect hardware reliability, also affect software reliability?
A: No. Hardware reliability is mostly affected by physical factors while software reliability is mostly affected by how the software is used.
Q: What are some misconceptions about software reliability?
A: Some of the most common misconceptions about software reliability are listed below:
The NASA IV&V Facility provides an excellent free Metrics Data Program (MDP) repository. The repository not only contains software reliability data but also how the data is related to other software metric and artifacts. Some of the metrics are:
See the NASA IV&V MDP website for a full listing and descriptions of the metrics and datasets.
The Software Life Cycle Empirical/Experience Database (SLED) is hosted by the Data and Analysis Center for Software (DACS). The SLED contains software empirical lifecycle data. The data is organized into five datasets:
See the SLED webpage for a full description of the datasets.
The articles and whitepapers are divided into three categories: General Software Reliability Topics, Software Reliability from a Process Approach, and Software Reliability from a Testing Approach. The papers under the general topics provide broad overviews on software reliability. The next two categories provide papers which discuss how software reliability is related to the software development process and testing.
General Software Reliability Topics
Software Reliability from a Process Approach
Software Reliability from a Testing Approach
Listed are some selected books covering various topics on software reliability. The books by Lyu and Musa are primarily written for software practitioners, while the book by Peled covers formal methods geared towards researchers at all levels.
Pham's book provides an excellent introduction on Software Reliability Modeling and gives detailed mathematical descriptions of the models used in the book.
Lastly, the book by Gritzalis is a compilation of the proceedings of the 1997 3rd International Conference for the European Network of Clubs for the Reliability and Safety of Software-Intensive Systems (ENCRESS).
More than 50 statistical models are currently being used in practice and listed below are some software reliability models separated by category taken from the book entitled Software Reliability by Hoang Pham. Springer-Verlag. ISBN: 981-3083-84-0. 2000. See this book for full descriptions of these models.
Error Seeding:
Failure Rate:
Curve Fitting:
Reliability Growth:
Non-Homogeneous Poisson Process:
Listed are three software reliability tools including some excerpts from each tool's website, where they can be freely downloaded:
SMERFS (Statistical Modeling and Estimation of Reliability Functions for Software) allows the user to perform a complete software reliability analysis. It allows the user to enter either of the two types of model data, modify that data if necessary including transforming it, doing a preliminary model analysis to help select candidate models that are most appropriate for the entered data set, fitting the appropriate models, and then determining the adequacy of the fit. SMERFS allows the user to perform risk analyses with some of these measures to help determine the optimum release time and/or time for reengineering of the software.
CASRE (Computer Aided Software Reliability Estimation) was developed as a software reliability measurement tool that is easier for non-specialists in Software Reliability Engineering to use than many other currently available tools. CASRE incorporates the mathematical modeling capabilities of the public domain tool SMERFS (Statistical Modeling and Estimation of Reliability Functions for Software), and executes using the Microsoft Windows environment.
SARA (Software Assurance Reliability Automation, also known as Non-Parametric Software Reliability) is a comprehensive system which incorporates both reliability growth modeling and design code metrics for analyzing software time between failure data. SARA 1.0 was a research initiative funded by the NASA IV&V Facility via the Software Assurance Technology Center (SATC) located at the Goddard Space Flight Center (GSFC).
Listed are some organizations providing software reliability training:
Listed are related sites on software reliability, assurance, engineering, and research: