User Guide for CCCC |
Table of Contents |
Introduction |
Report Contents |
Counting Methods |
Command line syntax |
Configuration |
Disclaimers |
Getting CCCC |
CCCC is a tool for the analysis of source code in various languages (primarily C++), which generates a report in HTML format on various measurements of the code processed. Although the tool was originally implemented to process C++ and ANSI C, the present version is also able to process Java source files, and support has been present in earlier versions for Ada95. The name CCCC stands for 'C and C++ Code Counter'.
Measurements of source code of this kind are generally referred to as 'software metrics', or more precisely 'software product metrics' (as the term 'software metrics` also covers measurements of the software process, which are called 'software process metrics'). There is a reasonable consensus among modern opinion leaders in the software engineering field that measurement of some kind is probably a Good Thing, although there is less consensus on what is worth measuring and what the measurements mean.
CCCC has been developed as freeware, and is released in source code form. Users are encouraged to compile the program themselves, and to modify the source to reflect their preferences and interests.
The simplest way of using CCCC is just to run it with the names of a selection of files on the command line like this:
cccc my_types.h big.h small.h *.cc
Alternatively, the for a complex hierarchy, the user could enter a command like this:
find . | cccc - (on Unix family platforms)
or
dir /b/s | cccc - (on DOS/Windows family platforms)
CCCC will process each of the files specified on the command line (using standard wildcard processing were appropriate), or, if the '-' option is specified in the standard input stream. For each file, named, CCCC will examine the extension of the filename, and if the extension is recognized as indicating a supported language, the appropriate parser will run on the file. As each file is parsed, recognition of certain constructs will cause records to be written into an internal database. When all files have been processed, a report on the contents of the internal database will be generated in HTML format. By default the main summary HTML report is generated to the file cccc.htm in a subdirectory called .cccc of the the current working directory, with detailed reports on each module (i.e. C++ or Java class) identified by the analysis run.
In addition to the summary and detailed HTML reports, the run will cause generation of corresponding summary and detailed reports in XML format, and a further file called cccc.db to be created. cccc.db will contain a dump of the internal database of the program in a format delimited with the character '@' (chosen because it is one of the few characters which cannot legally appear in C/C++ non-comment source code).
The report contains a number of tables identifying the modules in the files submitted and covering:
measures of the procedural volume and complexity of each module and its functions;
measures of the number and type of the relationships each module is a party to either as a client or a supplier;
identification of any parts of the source code submitted which the program failed to parse; and
a summary report over the whole body of code processed of the measures identified above.
Some of the data presented in the report may be displayed in an emphasized form (either with a bold or italic font, or with a red or yellow background). These are items which have been identified as lying outside ranges which have been laid down as desirable for the particular items. A bold font or red background indicates a value which exceeds a threshold defined as being dangerous for that measure, while italic fonts and yellow backgrounds indicate values below the danger threshold but still above a second lower threshold which has been laid down to indicate cause for concern. The two thresholds are configurable by the user of the tool: see the section below on configuring metric treatment for more details.
The report generated by CCCC normally consists of six tables plus a table of contents at the beginning and some informational material about CCCC itself at the end.
Table name |
Description |
Project Summary |
This table presents summary values of various measures over the body of source code submitted. |
Procedural Summary |
This table presents values of procedural measures summed for each module identified in the code submitted. |
Procedural Details |
This table presents values of the same procedural measures covered in the procedural summary report, but this time broken down within each module into the contributions of each member function of the module. |
Structural Summary |
This table presents counts of fan-in and fan-out relationships to each module identified, and a derived metric called the Henry/Kafura/Shepperd measure, which is calculated as the square of the product of the fan-in and fan-out counts. |
Structural Details |
This table presents lists of the modules contributing to the relationship counts reported in the structural summary. |
Rejected Extents |
This table presents a list of code regions which the analyser was unable to parse. |
Tag |
Metric Name |
Description |
LOC |
Lines of Code |
This metric counts the lines of non-blank, non-comment source code in a function (LOCf), module (LOCm), or project (LOCp). LOC was one of the earliest metrics to come into use (principally because it is straightforward to measure). It has an obvious relation to the size or complexity of a piece of code, and can be calibrated for use in prediction of maintenance effort, although concern has been expressed that use of this metric as a measure of programmer productivity may tend to encourage verbose programming practises and discourage desirable simplification. |
MVG |
McCabe's Cyclomatic Complexity |
A measure of a body of code based on analysis of the cyclomatic complexity of the directed acyclic graph which represents the flow of control within each function. First proposed as a measure of the minimum number of test cases to ensure all parts of each function are exercised, it is now widely accepted as a measure for the detection of code which is likely to be error-prone and/or difficult to maintain. |
COM |
Comment Lines |
A crude measure comparable to LOC of the extent of commenting within a region of code. Not very meaningful in isolation, but sometimes used in ratio with LOC or MVG to ensure that comments are distributed proportionately to the bulk or complexity of a region of code. |
L_C,M_C |
LOC/COM, MVG/COM |
See above |
FO,FOc,FOv |
Fan-out, Fan-in |
For a given module A, the fan-out is the number of other
modules which the module A uses, while the fan-in is the number of
other modules which use A. |
HKS, HKSv, HKSc |
Henry-Kafura/Shepperd measure |
This metric is derived by squaring the product of the fan-in
and fan-out of each module. The original Henry-Kafura measure,
which has been described as a measure of 'information flow
complexity' includes a term for the length of the module under
consideration, but CCCC uses the measure as modified by Shepperd,
which omits this term on the basis that it debases the measure by
combining two attributes which can and should be separately
measured. |
NOM |
Number of modules |
Number of modules identified in the project. See discussion below about what constitutes a module. |
WMC |
Weighted methods per class |
This measure, proposed by Chidamber and Kemerer, is a count of the number of functions defined in a module multiplied by a weighting factor. The only weighting algorithm suggested in the original formulation is a uniform weighting of one unit per function. |
REJ |
Rejected lines |
This is a measure of the number of non-blank non-comment lines of code which was not successfully analysed by the parser. This is more of a validity check on the report generated than a metric of the code submitted: if the amount of code rejected was more than a small fraction (say 10%) of the total code processed, the meaningfulness of the numbers generated by the run must be in doubt. |
CCCC implements simple algorithms to calculate each of the measures presented. The algorithms are intended to present a useful approximation to the underlying quantities, rather than meticulously exact counting: in general agreement with manual counts based on the same definitions should agree with CCCC to within 2-3%. If larger discrepancies are discovered, or if this level of agreement is not considered adequate, users are welcome to modify the source code to implement closer agreement, or to change the counting behaviour to reflect a desired basis of calculation. The basic definitions of each count are as follows:
Number of Modules (NOM)
CCCC
defines modules in terms of a grouping of member functions: C++
classes and namespaces, Java classes and interfaces and Ada packages
are all defined as modules. In addition to this, functions which are
not members of one of these structures are treated as belonging to
an anonymous module, which is counted if it has member functions.
Lines of Code (LOC)
This count
follows the industry standard of counting non-blank, non-comment
lines of source code. Preprocessor lines are treated as blank. Class
and function declarations are counted, but declarations of global
data are ignored. There may be some double counting of lines in
class definitions as the algorithm treats the total over a module as
the sum of lines belonging to the module itself and lines belonging
to its member functions (the declarations and definitions of member
functions in the body of the class definition will contribute to
both counts).
Comment Lines (COM)
Any line
which contains any part of a comment for the language concerned is
treated as a comment by CCCC, and is allocated to the module and
member of the next following 'real' token. This ensures that leading
comments are treated as part of the function or class definition
which follows them. There is one exception to this rule: the
Rational Rose design tool is known to insert directives into source
files disguised as C++ comments with the prefix '//##'. These are
not counted as comments, however all other comment structures are,
even if there is no content within the comment delimiters, or the
content is commented-out source code.
McCabe's Cyclomatic Complexity(MVG)
The formal definition
of cyclomatic complexity is that it is the count of linearly
independent paths through a flow of control graph derived from a
subprogram. A pragmatic approximation to this can be found by
counting language keywords and operators which introduce extra
decision outcomes. This can be shown to be quite accurate in most
cases. In the case of C++, the count is incremented for each of the
following tokens: 'if','while','for','switch','break','&&','||'
Note that the boolean operations introduce extra paths through the code because the second operand may or may not be evaluated according to the value of the first operand. Note also that the treatment of switch statements is problematic: it is quite common for multiple 'case' labels to be attached to the same block of code, so counting these might overstate the value. Counting the 'break' tokens instead is better so long as their are no case labels in the middle of the block of code which the break terminates. The motive for counting the 'switch' token is to provide for the default case, which gives rise to a path whether or not the programmer defines a default label. Counting the break token in this way may distort the count where it is used in other contexts (i.e. to exit from a block).
Weighted methods per class
(WMC)
This is a count of the member functions known to exist in a
class. Knowledge of existence of a function is only gained from
declarations or definitions directly contained in files processed by
CCCC: files included by a preprocessor are ignored, and CCCC does
not at present identify invocations of member or non-member
functions within procedural code.
Fan-In, Fan-Out (FI,FIc,FIv,FO,FOc,FOv)
Traditionally, use
relationships between modules were identified by counting function
invocations or access to module data in procedural code. CCCC
identifies relationships only through structures apparent in the
definitions of the interfaces of C++ classes or Java classes or
interfaces. The specific relationships which can be detected are
inheritance of a supplier class by a client, containment of an
instance of a supplier class in a client, and the existence of
member functions of the client class which accept or return an
instance of the supplier.
While these relationships may seem unrelated to the invocation and module data counts, they are likely to show a strong corelation because of the fact that in an object oriented environment, it is likely (but not inevitable) that the low-level use relationships of invocation and direct access to data structures require an object of the class of the supplier module to be available. This availability can be through instantiation of an instance of the supplier class within procedural code, but will often be due to the existence of one of the higher level relationships described above.
The counts of Fan-In and Fan-Out are regarded as a measure of the structural quality of a program, with high values of either (and particularly high values of both within the same module) indicating increased risk of changes required in one module requiring changes across other modules. CCCC chooses to define the relationship counts in such a way that each supplier or client module is counted only once, however many separate ways the relationship is detected. CCCC applies filtering to the relationships identified to distinguish between different kinds of uses which may carry with them different levels of structural risk. There are two filters: visibility and concreteness.
The visibility filter removes from consideration relationships which are known to be only accessible from the private interface of a module. Relationships which are defined in the visible part of the interface can be exploited by clients of the current module, thus forcing those clients also to be clients of the current module's supplier. Visible relationships also increase the range of operations available on an object, thus increasing the cognitive complexity of the interface from the point of view of a programmer required to use a module.
The concreteness filter removes from consideration relationships which do not create a dependency of the implementation of the client module on the implementation of the supplier class. Dependency-creating relationships increase risk because they may not be cyclical, and thus inhibit the creation of other relationships. They also inhibit the ability of modules to be built separately, requiring recompilation of the client module when the supplier changes. The test for this filter in C++ is whether a forward declaration of the supplier class is adequate to allow the client module definition to be compiled: containment and parameter passing where the client module is modified by a referential operator are allowed in this case, containment or passing by value or inheritance are all dependency-creating. In Java, relationships except inheritance are treated as non-dependency creating.
Number of Modules (NOM)
All instances of the following
syntactic constructs are treated as modules: C++ classes and
namespaces; Java classes and interfaces, Ada packages. There are
contexts where the analyser detects something which may belong to
one of these categories but may not. In this case, the name is
treated as a module name if and only if member functions are
identified for it. Functions which do not belong to a module of one
of the categories defined above are treated as belonging to a single
anonymous module: if any members are identified for this module it
is also counted.
The command line flags supported by CCCC are defined in the file ccccmain.cc. A brief usage message can be generated on standard output by entering the command 'cccc --help'. The same message will be generated on standard error if an invalid command line is entered. As of version 3.pre57 the text generated by this command is as follows:
Usage: cccc [options] file1.c ... Process files listed on command line. If the filenames include '-', read a list of files from standard input. This program is work in progress and is not well documented. Please be prepared to refer to the source code for the meaning of some options. Options: --help * generate this help message --outdir=<dname> * directory for generated files (default=.cccc) --html_outfile=<fname> * name of primary HTML report generated (default=<outdir>/cccc.html) --xml_outfile=<fname> * name of primary XML report generated (default=<outdir>/cccc.xml) --db_infile=<fname> * preload internal database from named file (default=no initial content) --db_outfile=<fname> * save internal database to named file (default=<outdir>/cccc.db) --opt_infile=<fname> * load options from named file (default=use compiled-in option values, refer to cccc_opt.cc for option information) --opt_outfile=<fname> * save options to named file (default=<outdir>/cccc.opt) --lang=<string> * use language specified for files specified after this option languages supported are c,c++,java (default=use language/extension mapping controlled by options) --report_mask=<hex> * control report content (refer to ccccmain.cc for mask values) --debug_mask=<hex> * control debug output content (refer to ccccmain.cc for mask values)
Much of the behaviour of CCCC can be controlled by a stream of configuration data. The file cccc_opt.cc contains the default value for this stream, which can be dumped using the --opt_outfile qualifier. The resulting file can then be edited to customize behaviour and loaded into a later run of cccc using the --opt_infile qualifier. As of version 3.pre57, the contents of the file dumped by the command cccc --opt_outfile=cccc.opt are as follows:
CCCC_FileExt@.C@c++.ansi@ CCCC_FileExt@.CC@c++.ansi@ CCCC_FileExt@.CPP@c++.ansi@ CCCC_FileExt@.CXX@c++.ansi@ CCCC_FileExt@.H@c++.ansi@ CCCC_FileExt@.H++@c++.ansi@ CCCC_FileExt@.HH@c++.ansi@ CCCC_FileExt@.HPP@c++.ansi@ CCCC_FileExt@.HXX@c++.ansi@ CCCC_FileExt@.J@java@ CCCC_FileExt@.JAV@java@ CCCC_FileExt@.JAVA@java@ CCCC_FileExt@.c@c.ansi@ CCCC_FileExt@.c++@c++.ansi@ CCCC_FileExt@.cc@c++.ansi@ CCCC_FileExt@.cpp@c++.ansi@ CCCC_FileExt@.cxx@c++.ansi@ CCCC_FileExt@.h@c++.ansi@ CCCC_FileExt@.h++@c++.ansi@ CCCC_FileExt@.hh@c++.ansi@ CCCC_FileExt@.hpp@c++.ansi@ CCCC_FileExt@.hxx@c++.ansi@ CCCC_FileExt@.j@java@ CCCC_FileExt@.jav@java@ CCCC_FileExt@.java@java@ CCCC_MetTmnt@8.3@999999.000000@999999.000000@0@8@3@General format for fixed precision 3 d.p.@ CCCC_MetTmnt@CBO@12.000000@30.000000@0@6@0@Coupling between objects@ CCCC_MetTmnt@COM@999999.000000@999999.000000@0@6@0@Comment lines@ CCCC_MetTmnt@COMper@999999.000000@999999.000000@0@6@3@Comment lines (averaged)@ CCCC_MetTmnt@DIT@3.000000@6.000000@0@6@0@Depth of Inheritance Tree@ CCCC_MetTmnt@FI@12.000000@20.000000@0@6@0@Fan in (overall)@ CCCC_MetTmnt@FIc@6.000000@12.000000@0@6@0@Fan in (concrete uses only)@ CCCC_MetTmnt@FIv@6.000000@12.000000@0@6@0@Fan in (visible uses only)@ CCCC_MetTmnt@FO@12.000000@20.000000@0@6@0@Fan out (overall)@ CCCC_MetTmnt@FOc@6.000000@12.000000@0@6@0@Fan out (concrete uses only)@ CCCC_MetTmnt@FOv@6.000000@12.000000@0@6@0@Fan out (visible uses only)@ CCCC_MetTmnt@IF4@100.000000@1000.000000@0@6@0@Henry-Kafura/Shepperd measure (overall)@ CCCC_MetTmnt@IF4c@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (concrete)@ CCCC_MetTmnt@IF4v@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (visible)@ CCCC_MetTmnt@LOCf@30.000000@100.000000@0@6@0@Lines of code/function@ CCCC_MetTmnt@LOCm@500.000000@2000.000000@0@6@0@Lines of code/single module@ CCCC_MetTmnt@LOCp@999999.000000@999999.000000@0@6@0@Lines of code/project@ CCCC_MetTmnt@LOCper@500.000000@2000.000000@0@6@3@Lines of code/average module@ CCCC_MetTmnt@L_C@7.000000@30.000000@20@6@3@LOC/COM Lines of code/comment line@ CCCC_MetTmnt@MVGf@10.000000@30.000000@0@6@0@Cyclomatic complexity/function@ CCCC_MetTmnt@MVGm@200.000000@1000.000000@0@6@0@Cyclomatic complexity/single module@ CCCC_MetTmnt@MVGp@999999.000000@999999.000000@0@6@0@Cyclomatic complexity/project@ CCCC_MetTmnt@MVGper@200.000000@1000.000000@0@6@3@Cyclomatic complexity/average module@ CCCC_MetTmnt@M_C@5.000000@10.000000@5@6@3@MVG/COM McCabe/comment line@ CCCC_MetTmnt@NOC@4.000000@15.000000@0@6@0@Number of children@ CCCC_MetTmnt@WMC1@30.000000@100.000000@0@6@0@Weighting function=1 unit per method@ CCCC_MetTmnt@WMCv@10.000000@30.000000@0@6@0@Weighting function=1 unit per visible method@ CCCC_Dialect@c++.mfc@BEGIN_MESSAGE_MAP@start_skipping@ CCCC_Dialect@c++.mfc@END_MESSAGE_MAP@stop_skipping@ CCCC_Dialect@c++.stl@__STL_BEGIN_NAMESPACE@ignore@ CCCC_Dialect@c++.stl@__STL_END_NAMESPACE@ignore@
Records in the configuration stream of type CCCC_FileExt control the mapping of file extensions to languages.
Records in the configuration stream of type CCCC_MetTmnt control the treatment of values for each of the metrics defined by CCCC.
Records in the configuration stream of type CCCC_Dialect control the handling of dialect-specific pseudo keywords by the CCCC parsers.
CCCC was produced as an artifact of an academic research project. The primary motivation was to provide a platform for the exploration of issues related to metrics. The program is not now, and will never become, a commercial standard supported product. While CCCC attempts to recover from parse failures, there are some language constructs which cause crashes, and others which result in code going unanalyzed. If CCCC does not report syntax errors and terminates normally, it is likely that all files have been analyzed, otherwise it is strongly recommended that the user does some kind of independent check on the quantity of code ignored, rather than relying on CCCC's own report on 'Rejected Extents'.
There is one further important disclaimer. As noted above, the primary motivation for the development of CCCC was to aid an academic project to investigate the use of metrics. Over the five years or so the project was running various research activities were conducted, culminating in a practical experiment into the value of metric analysis data in a simulated software engineering task. The aim of this final experiment was to attempt to demonstrate a benefit from the use of such data, its conclusion was that, at best, the presence of a benefit was "not proven". The research project, including the design and outcomes of the final experiment is described at http://www.fchs.ecu.edu.au/~tlittlef, which includes a link to download a PDF of the final PhD thesis arising from the project.
The CCCC project is now dormant. There are no plans for new releases, either to add new features to the program or to fix existing defects. The current version of the program is released under the GNU Public License, giving users the right to work on the source code to address any specific issues they have. The project is hosted on sourceforge.net, the primary developer, Tim Littlefair, can be contacted by email and will be happy to provide advice and encouragement. Contact details appear on the sourceforge website.
The best place to look for information about CCCC is the CCCC home page at http://cccc.sourceforge.net.
CCCC downloads are accessible via the standard SourceForge project hierarchy starting at http://sourceforge.net/projects/cccc. SourceForge also hosts mailing lists where new versions are announced and a bug tracker database for the project.
The CCCC distribution includes a version of the Purdue Compiler Construction Toolset (PCCTS) originally created by Terence Parr and coworkers at Purdue University, later maintained by Tom Moog. Many thanks to Terence, his colleagues and Tom for developing this excellent tool, and for releasing it under terms which make it possible for it to be included in the CCCC distribution.
The Win32 installer package for CCCC is created using version 2.0.18 of the "My Inno Setup Extensions" package by Jordan Russell, based on "Inno Setup" by Martjin Laan. This package can be downloaded from http://www.wintax.nl/isx
The program will also require a C++ compiler to build. Past versions have been buildable with various versions of the GNU C++ compiler and/or Microsoft Visual C++, although the code is intended to be portable to a range of modern C++ compilers (with a bit of work in some cases, as the original code base dates back to times before the ANSI standardisation of the C++ language). The reference build tools for the current version are GCC version 3.3 and the freely distributed Microsoft Visual C++ Toolkit 2003. See http://msdn.microsoft.com/visualc/vctoolkit2003/ for details.