I've been wanting to do a post about code metrics for quite a while now - mostly to organize my thoughts on the topic as it is something that I want to introduce at work, but also to get some feedback from other people as to if and how they are using metrics to assist them in crafting quality code. After reading Jeremy Miller's post on the topic, I thought I might as well take the plunge. I'll start by musing over what metrics I found to be useful, then continue with looking at tool support for generating these metrics and finish off with considering when to use these metrics.
MetricsI am not going to cover all the different metrics in detail but instead highlight what seems to me to be the most useful metrics and refer to articles/links where other people who have done an excellent job on covering these metrics in detail. Here are the reference articles that I used:
- Robert C. Martin's article on OO design quality metrics 
- Wikipedia's summary of these package metrics 
- Kirk Knoernschild's excellent introductory article on metrics with sample refactorings included 
- Patrick Smacchia's (developer of NDepend software) excellent coverage on all the types of metrics supported by NDepend 
- Write up on the software metrics supported by the Software Design Metrics software 
Size metricsSize metrics are consistently good indicators of fault-proneness: large methods/classes/packages contain more faults .
- Source Lines Of Code (SLOC) measures the amount of lines of code. To be really useful comment lines and lines that have been broken into multiple lines need to be factored out. Some people refer to this as the logical LOC vs. the physical LOC.
"2 significant advantages of logical LOC over physical LOC are:
- Coding style doesn’t interfere with logical LOC. For example the LOC won’t change because a method call is spawn on several lines because of a high number of argument.
- logical LOC is independent from the language. Values obtained from assemblies written with different languages are comparable and can be summed." 
Complexity metricsThere is a direct correlation between complexity and the defect rate of software, so keeping code simple is a solid first step toward lowering the defect rate of software .
- Cyclomatic Complexity (CC) measures code complexity by counting the number of linearly independent paths through code. Complex conditionals and boolean operators increase the number of linear paths, resulting in a higher CCN. Methods with a CCN of five or higher are good refactoring candidates to help ensure code remains easy to understand 
Coupling/Dependency metricsExcessive dependencies between packages compromise architecture and design. Complex dependencies inhibit the testability of your system and presents numerous other challenges as presented in  and .
- Afferent Coupling (Ca) measures the number of types outside a package that depend on types within the package (incoming dependencies). High afferent coupling indicates that the concerned packages have many responsibilities. 
- Efferent Coupling (Ce) measures the number of types inside a package that depends on types outside of the package (outgoing dependencies). High efferent coupling indicates that the concerned package is dependant. 
"...afferent and efferent coupling allows you to more effectively evaluate the cost of change and the likelihood of reuse. For instance, maintaining a module with many incoming dependencies is more costly and risky since there is greater risk of impacting other modules, requiring more thorough integration testing. Conversely, a module with many outgoing dependencies is more difficult to test and reuse since all dependent modules are required ... Concrete modules with high afferent coupling will be difficult to change because of the high number of incoming dependencies. Modules with many abstractions are typically more extensible, so long as the dependencies are on the abstract portion of a module." 
- Instability (I) measures the ratio of efferent coupling (Ce) to total coupling. I = Ce / (Ce + Ca). This metric is an indicator of the package's resilience to change. The range for this metric is 0 to 1, with I=0 indicating a completely stable package and I=1 indicating a completely instable package. 
- Abstractness (A) measures the ratio of the number of internal abstract types (i.e abstract classes and interfaces) to the number of internal types. The range for this metric is 0 to 1, with A=0 indicating a completely concrete package and A=1 indicating a completely abstract package. 
- Distance from main sequence (D) measures the perpendicular normalized distance of a package from the idealized line A + I = 1 (called main sequence). This metric is an indicator of the package's balance between abstractness and stability. A package squarely on the main sequence is optimally balanced with respect to its abstractness and stability. Ideal packages are either completely abstract and stable (I=0, A=1) or completely concrete and instable (I=1, A=0). The range for this metric is 0 to 1. 
"A value approaching zero indicates a module is abstract is relation to its incoming dependencies. As distance approaches one, a module is either concrete with many incoming dependencies or abstract with many outgoing dependencies. The first case represents a lack of design integrity, while the second is useless design." 
Cohesion metrics"A low cohesive design element has been assigned many unrelated responsibilities. Consequently, the design element is more difficult to understand and therefore also harder to maintain and reuse. Design elements with low cohesion should be considered for refactoring, for instance, by extracting parts of the functionality to separate classes with clearly defined responsibilities." 
- Relational Cohesion (H) measures the average number of internal relationships per type. Let R be the number of type relationships that are internal to this package (i.e that do not connect to types outside the package). Let N be the number of types within the package. H = (R + 1)/ N. The extra 1 in the formula prevents H=0 when N=1. The relational cohesion represents the relationship that this package has to all its types. As classes inside an package should be strongly related, the cohesion should be high. On the other hand, too high values may indicate over-coupling. A good range for RelationalCohesion is 1.5 to 4.0. Packages where RelationalCohesion < 1.5 or RelationalCohesion > 4.0 might be problematic. 
- Lack of Cohesion of Methods (LCOM) The single responsibility principle states that a class should not have more than one reason to change. Such a class is said to be cohesive. A high LCOM value generally pinpoints a poorly cohesive class 
Inheritance metrics"Deep inheritance structures are hypothesized to be more fault-prone. The information needed to fully understand a class situated deep in the inheritance tree is spread over several ancestor classes, thus more difficult to overview. Similar to high export coupling, a modification to a design element with a large number of descendents can have a large effect on the system." 
- Depth of Inheritance Tree (DIT) measures the number of base classes for a class or structure. Types where DIT is higher than 6 might be hard to maintain. However it is not a rule since sometime your classes might inherit from tier classes which have a high value for depth of inheritance. 
ToolsWhen it comes to tools, the Mercedes Benz of .NET code metrics tools from my point of view has to be NDepend 2.0. NDepend 2 provides more than 60 metrics (including all of the metrics listed above) and includes integration into your automated build via support for MSBuild, NAnt and CruiseControl.NET. Browse to here for a sample report and here for a demo on how to integrate it into your build.
There is a visual GUI (VisualNDepend) that allows you to browse your code structure and evaluate the metrics as well as a console application that you can generate the metrics with. Patrick has also created a CQL (Code Query Language) which allows NDepend to consider your code as a database with CQL being the query language with which you can check some assertions on this database. As a consequence, CQL is similar to SQL and supports the typical SELECT TOP FROM WHERE ORDER BY patterns. Here is an example of a CQL query:
How cool is this! To quote:
WARN IF Count > 0 IN SELECT METHODS WHERE NbILInstructions > 200 ORDER BY NbILInstructions DESC // METHODS WHERE NbILInstructions > 200 are extremely complex and // should be split in smaller methods.
"CQL constraints are customisable and typically tied with a particular application. For example, they can allow the specification of customized encapsulation constraints, such as, I want to ensure that this layer will never use this other layer or I want to ensure that this class will never be instantiated outside this particular namespace."VisualNDepend also provides a CQL editor which supports intellisense and verbose compile error descriptions to make writing CQL queries a lot easier. Enough said! Browse to here for a complete overview of the NDepend 2 features.
Other tools that that you can have a look at include SourceMonitor, DevMetrics, Software Design Metrics and vil to name a few. vil does not support .NET 2.0 and does not seem to be under active development. DevMetrics, after being open-sourced, seemed to have stagnated with no visible activity on SourceForge. SourceMonitor is actively under development and supports a variety of programming languages. However, it supports only a small subset of the metrics mentioned which does not include support for important metrics like efferent and afferent coupling etc. Software Design Metrics takes a novel approach in that it measures the complexity based on the UML models for the software. This has the advantage of being language independent, but you obviously need to have UML models to run
When to useWhen should one use these metrics? I agree with Jeremy Miller in his post that the metrics should not replace the visual inspection/QA process and be performed in addition to it. It would be nice to have these metrics at hand to assist in the QA process though. I also agree with Frank Kelly in his post that a working system with no severity 1/2 errors and happy end users are more important than getting the right balance of Ca/Ce or whatever metric you are interested in.
I think I will stick with an approach of identifying a subset of useful metrics and using these as part of an overall process of static code analysis on a regular basis. When I say regular basis I feel it should be part of your continuous build process to prevent people from committing code into your repository that does not satisfy your constraints. With a tool like NDepend you can create your own custom level of acceptance criteria by which the build will fail/succeed and exclude metrics that you feel should not apply to your code base.
As mentioned, the code metrics should form part of a bigger quality process that includes:
- Visual inspection/QA via peer code reviews (as mentioned having the metrics via a tool like VisualNDepend can greatly assist in the QA)
- Automated code standards check (I prefer FxCop)
- Automated code metric check (NDepend seems like the tool to use here)
- Automated code coverage statistics (I prefer NCover and NCoverExplorer)