User Guide for CCCC

Introduction

CCCC is a tool for the analysis of source code in various languages (primarily C++), which generates a report in HTML format on various measurements of the code processed. Although the tool was originally implemented to process C++ and ANSI C, the present version is also able to process Java source files, and support has been present in earlier versions for Ada95. The name CCCC stands for 'C and C++ Code Counter'.

Measurements of source code of this kind are generally referred to as 'software metrics', or more precisely 'software product metrics' (as the term 'software metrics` also covers measurements of the software process, which are called 'software process metrics'). There is a reasonable consensus among modern opinion leaders in the software engineering field that measurement of some kind is probably a Good Thing, although there is less consensus on what is worth measuring and what the measurements mean.

CCCC has been developed as freeware, and is released in source code form. Users are encouraged to compile the program themselves, and to modify the source to reflect their preferences and interests.

The simplest way of using CCCC is just to run it with the names of a selection of files on the command line like this:

cccc my_types.h big.h small.h *.cc

Alternatively, the for a complex hierarchy, the user could enter a command like this:

find . | cccc - (on Unix family platforms)

dir /b/s | cccc - (on DOS/Windows family platforms)

CCCC will process each of the files specified on the command line (using standard wildcard processing were appropriate), or, if the '-' option is specified in the standard input stream. For each file, named, CCCC will examine the extension of the filename, and if the extension is recognized as indicating a supported language, the appropriate parser will run on the file. As each file is parsed, recognition of certain constructs will cause records to be written into an internal database. When all files have been processed, a report on the contents of the internal database will be generated in HTML format. By default the main summary HTML report is generated to the file cccc.htm in a subdirectory called .cccc of the the current working directory, with detailed reports on each module (i.e. C++ or Java class) identified by the analysis run.

In addition to the summary and detailed HTML reports, the run will cause generation of corresponding summary and detailed reports in XML format, and a further file called cccc.db to be created. cccc.db will contain a dump of the internal database of the program in a format delimited with the character '@' (chosen because it is one of the few characters which cannot legally appear in C/C++ non-comment source code).

The report contains a number of tables identifying the modules in the files submitted and covering:

measures of the procedural volume and complexity of each module and its functions;
measures of the number and type of the relationships each module is a party to either as a client or a supplier;
identification of any parts of the source code submitted which the program failed to parse; and
a summary report over the whole body of code processed of the measures identified above.

Some of the data presented in the report may be displayed in an emphasized form (either with a bold or italic font, or with a red or yellow background). These are items which have been identified as lying outside ranges which have been laid down as desirable for the particular items. A bold font or red background indicates a value which exceeds a threshold defined as being dangerous for that measure, while italic fonts and yellow backgrounds indicate values below the danger threshold but still above a second lower threshold which has been laid down to indicate cause for concern. The two thresholds are configurable by the user of the tool: see the section below on configuring metric treatment for more details.

Report Contents

The report generated by CCCC normally consists of six tables plus a table of contents at the beginning and some informational material about CCCC itself at the end.

Tables generated

Table name	Description
Project Summary	This table presents summary values of various measures over the body of source code submitted.
Procedural Summary	This table presents values of procedural measures summed for each module identified in the code submitted.
Procedural Details	This table presents values of the same procedural measures covered in the procedural summary report, but this time broken down within each module into the contributions of each member function of the module.
Structural Summary	This table presents counts of fan-in and fan-out relationships to each module identified, and a derived metric called the Henry/Kafura/Shepperd measure, which is calculated as the square of the product of the fan-in and fan-out counts.
Structural Details	This table presents lists of the modules contributing to the relationship counts reported in the structural summary.
Rejected Extents	This table presents a list of code regions which the analyser was unable to parse.

Metrics displayed

Tag	Metric Name	Description
LOC	Lines of Code	This metric counts the lines of non-blank, non-comment source code in a function (LOCf), module (LOCm), or project (LOCp). LOC was one of the earliest metrics to come into use (principally because it is straightforward to measure). It has an obvious relation to the size or complexity of a piece of code, and can be calibrated for use in prediction of maintenance effort, although concern has been expressed that use of this metric as a measure of programmer productivity may tend to encourage verbose programming practises and discourage desirable simplification.
MVG	McCabe's Cyclomatic Complexity	A measure of a body of code based on analysis of the cyclomatic complexity of the directed acyclic graph which represents the flow of control within each function. First proposed as a measure of the minimum number of test cases to ensure all parts of each function are exercised, it is now widely accepted as a measure for the detection of code which is likely to be error-prone and/or difficult to maintain.
COM	Comment Lines	A crude measure comparable to LOC of the extent of commenting within a region of code. Not very meaningful in isolation, but sometimes used in ratio with LOC or MVG to ensure that comments are distributed proportionately to the bulk or complexity of a region of code.
L_C,M_C	LOC/COM, MVG/COM	See above
FO,FOc,FOv FI,FIc,FIc	Fan-out, Fan-in	For a given module A, the fan-out is the number of other modules which the module A uses, while the fan-in is the number of other modules which use A. See the section below on counting methods for a discussion of the distinction between the variants on each of these measures. these figures.
HKS, HKSv, HKSc	Henry-Kafura/Shepperd measure	This metric is derived by squaring the product of the fan-in and fan-out of each module. The original Henry-Kafura measure, which has been described as a measure of 'information flow complexity' includes a term for the length of the module under consideration, but CCCC uses the measure as modified by Shepperd, which omits this term on the basis that it debases the measure by combining two attributes which can and should be separately measured. Corresponding to the variants on the fan-in and fan-out measures described above, similar variants are calculated on this metric.
NOM	Number of modules	Number of modules identified in the project. See discussion below about what constitutes a module.
WMC	Weighted methods per class	This measure, proposed by Chidamber and Kemerer, is a count of the number of functions defined in a module multiplied by a weighting factor. The only weighting algorithm suggested in the original formulation is a uniform weighting of one unit per function.
REJ	Rejected lines	This is a measure of the number of non-blank non-comment lines of code which was not successfully analysed by the parser. This is more of a validity check on the report generated than a metric of the code submitted: if the amount of code rejected was more than a small fraction (say 10%) of the total code processed, the meaningfulness of the numbers generated by the run must be in doubt.

Counting methods

CCCC implements simple algorithms to calculate each of the measures presented. The algorithms are intended to present a useful approximation to the underlying quantities, rather than meticulously exact counting: in general agreement with manual counts based on the same definitions should agree with CCCC to within 2-3%. If larger discrepancies are discovered, or if this level of agreement is not considered adequate, users are welcome to modify the source code to implement closer agreement, or to change the counting behaviour to reflect a desired basis of calculation. The basic definitions of each count are as follows:

Number of Modules (NOM)
CCCC defines modules in terms of a grouping of member functions: C++ classes and namespaces, Java classes and interfaces and Ada packages are all defined as modules. In addition to this, functions which are not members of one of these structures are treated as belonging to an anonymous module, which is counted if it has member functions.
Lines of Code (LOC)
This count follows the industry standard of counting non-blank, non-comment lines of source code. Preprocessor lines are treated as blank. Class and function declarations are counted, but declarations of global data are ignored. There may be some double counting of lines in class definitions as the algorithm treats the total over a module as the sum of lines belonging to the module itself and lines belonging to its member functions (the declarations and definitions of member functions in the body of the class definition will contribute to both counts).
Comment Lines (COM)
Any line which contains any part of a comment for the language concerned is treated as a comment by CCCC, and is allocated to the module and member of the next following 'real' token. This ensures that leading comments are treated as part of the function or class definition which follows them. There is one exception to this rule: the Rational Rose design tool is known to insert directives into source files disguised as C++ comments with the prefix '//##'. These are not counted as comments, however all other comment structures are, even if there is no content within the comment delimiters, or the content is commented-out source code.
McCabe's Cyclomatic Complexity(MVG)
The formal definition of cyclomatic complexity is that it is the count of linearly independent paths through a flow of control graph derived from a subprogram. A pragmatic approximation to this can be found by counting language keywords and operators which introduce extra decision outcomes. This can be shown to be quite accurate in most cases. In the case of C++, the count is incremented for each of the following tokens: 'if','while','for','switch','break','&&','||'

Note that the boolean operations introduce extra paths through the code because the second operand may or may not be evaluated according to the value of the first operand. Note also that the treatment of switch statements is problematic: it is quite common for multiple 'case' labels to be attached to the same block of code, so counting these might overstate the value. Counting the 'break' tokens instead is better so long as their are no case labels in the middle of the block of code which the break terminates. The motive for counting the 'switch' token is to provide for the default case, which gives rise to a path whether or not the programmer defines a default label. Counting the break token in this way may distort the count where it is used in other contexts (i.e. to exit from a block).

Weighted methods per class (WMC)
This is a count of the member functions known to exist in a class. Knowledge of existence of a function is only gained from declarations or definitions directly contained in files processed by CCCC: files included by a preprocessor are ignored, and CCCC does not at present identify invocations of member or non-member functions within procedural code.
Fan-In, Fan-Out (FI,FIc,FIv,FO,FOc,FOv)
Traditionally, use relationships between modules were identified by counting function invocations or access to module data in procedural code. CCCC identifies relationships only through structures apparent in the definitions of the interfaces of C++ classes or Java classes or interfaces. The specific relationships which can be detected are inheritance of a supplier class by a client, containment of an instance of a supplier class in a client, and the existence of member functions of the client class which accept or return an instance of the supplier.

While these relationships may seem unrelated to the invocation and module data counts, they are likely to show a strong corelation because of the fact that in an object oriented environment, it is likely (but not inevitable) that the low-level use relationships of invocation and direct access to data structures require an object of the class of the supplier module to be available. This availability can be through instantiation of an instance of the supplier class within procedural code, but will often be due to the existence of one of the higher level relationships described above.

The counts of Fan-In and Fan-Out are regarded as a measure of the structural quality of a program, with high values of either (and particularly high values of both within the same module) indicating increased risk of changes required in one module requiring changes across other modules. CCCC chooses to define the relationship counts in such a way that each supplier or client module is counted only once, however many separate ways the relationship is detected. CCCC applies filtering to the relationships identified to distinguish between different kinds of uses which may carry with them different levels of structural risk. There are two filters: visibility and concreteness.

The visibility filter removes from consideration relationships which are known to be only accessible from the private interface of a module. Relationships which are defined in the visible part of the interface can be exploited by clients of the current module, thus forcing those clients also to be clients of the current module's supplier. Visible relationships also increase the range of operations available on an object, thus increasing the cognitive complexity of the interface from the point of view of a programmer required to use a module.

The concreteness filter removes from consideration relationships which do not create a dependency of the implementation of the client module on the implementation of the supplier class. Dependency-creating relationships increase risk because they may not be cyclical, and thus inhibit the creation of other relationships. They also inhibit the ability of modules to be built separately, requiring recompilation of the client module when the supplier changes. The test for this filter in C++ is whether a forward declaration of the supplier class is adequate to allow the client module definition to be compiled: containment and parameter passing where the client module is modified by a referential operator are allowed in this case, containment or passing by value or inheritance are all dependency-creating. In Java, relationships except inheritance are treated as non-dependency creating.

Number of Modules (NOM)
All instances of the following syntactic constructs are treated as modules: C++ classes and namespaces; Java classes and interfaces, Ada packages. There are contexts where the analyser detects something which may belong to one of these categories but may not. In this case, the name is treated as a module name if and only if member functions are identified for it. Functions which do not belong to a module of one of the categories defined above are treated as belonging to a single anonymous module: if any members are identified for this module it is also counted.

Command-line syntax

The command line flags supported by CCCC are defined in the file ccccmain.cc. A brief usage message can be generated on standard output by entering the command 'cccc --help'. The same message will be generated on standard error if an invalid command line is entered. As of version 3.pre57 the text generated by this command is as follows:

Usage: 
cccc [options] file1.c ...  
Process files listed on command line.
If the filenames include '-', read a list of files from standard input.
This program is work in progress and is not well documented.
Please be prepared to refer to the source code for the 
meaning of some options.
Options:
--help                   * generate this help message
--outdir=<dname>         * directory for generated files
                           (default=.cccc)
--html_outfile=<fname>   * name of primary HTML report generated 
                           (default=<outdir>/cccc.html)
--xml_outfile=<fname>    * name of primary XML report generated 
                           (default=<outdir>/cccc.xml)
--db_infile=<fname>      * preload internal database from named file
                           (default=no initial content)
--db_outfile=<fname>     * save internal database to named file
                           (default=<outdir>/cccc.db)
--opt_infile=<fname>     * load options from named file
                           (default=use compiled-in option values, 
                           refer to cccc_opt.cc for option information)
--opt_outfile=<fname>    * save options to named file
                           (default=<outdir>/cccc.opt)
--lang=<string>          * use language specified for files specified 
                           after this option 
                           languages supported are c,c++,java
                           (default=use language/extension mapping 
                           controlled by options)
--report_mask=<hex>      * control report content 
                           (refer to ccccmain.cc for mask values) 
--debug_mask=<hex>       * control debug output content 
                           (refer to ccccmain.cc for mask values)

Configuration

Much of the behaviour of CCCC can be controlled by a stream of configuration data. The file cccc_opt.cc contains the default value for this stream, which can be dumped using the --opt_outfile qualifier. The resulting file can then be edited to customize behaviour and loaded into a later run of cccc using the --opt_infile qualifier. As of version 3.pre57, the contents of the file dumped by the command cccc --opt_outfile=cccc.opt are as follows:

CCCC_FileExt@.C@c++.ansi@
CCCC_FileExt@.CC@c++.ansi@
CCCC_FileExt@.CPP@c++.ansi@
CCCC_FileExt@.CXX@c++.ansi@
CCCC_FileExt@.H@c++.ansi@
CCCC_FileExt@.H++@c++.ansi@
CCCC_FileExt@.HH@c++.ansi@
CCCC_FileExt@.HPP@c++.ansi@
CCCC_FileExt@.HXX@c++.ansi@
CCCC_FileExt@.J@java@
CCCC_FileExt@.JAV@java@
CCCC_FileExt@.JAVA@java@
CCCC_FileExt@.c@c.ansi@
CCCC_FileExt@.c++@c++.ansi@
CCCC_FileExt@.cc@c++.ansi@
CCCC_FileExt@.cpp@c++.ansi@
CCCC_FileExt@.cxx@c++.ansi@
CCCC_FileExt@.h@c++.ansi@
CCCC_FileExt@.h++@c++.ansi@
CCCC_FileExt@.hh@c++.ansi@
CCCC_FileExt@.hpp@c++.ansi@
CCCC_FileExt@.hxx@c++.ansi@
CCCC_FileExt@.j@java@
CCCC_FileExt@.jav@java@
CCCC_FileExt@.java@java@
CCCC_MetTmnt@8.3@999999.000000@999999.000000@0@8@3@General format for fixed precision 3 d.p.@
CCCC_MetTmnt@CBO@12.000000@30.000000@0@6@0@Coupling between objects@
CCCC_MetTmnt@COM@999999.000000@999999.000000@0@6@0@Comment lines@
CCCC_MetTmnt@COMper@999999.000000@999999.000000@0@6@3@Comment lines (averaged)@
CCCC_MetTmnt@DIT@3.000000@6.000000@0@6@0@Depth of Inheritance Tree@
CCCC_MetTmnt@FI@12.000000@20.000000@0@6@0@Fan in (overall)@
CCCC_MetTmnt@FIc@6.000000@12.000000@0@6@0@Fan in (concrete uses only)@
CCCC_MetTmnt@FIv@6.000000@12.000000@0@6@0@Fan in (visible uses only)@
CCCC_MetTmnt@FO@12.000000@20.000000@0@6@0@Fan out (overall)@
CCCC_MetTmnt@FOc@6.000000@12.000000@0@6@0@Fan out (concrete uses only)@
CCCC_MetTmnt@FOv@6.000000@12.000000@0@6@0@Fan out (visible uses only)@
CCCC_MetTmnt@IF4@100.000000@1000.000000@0@6@0@Henry-Kafura/Shepperd measure (overall)@
CCCC_MetTmnt@IF4c@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (concrete)@
CCCC_MetTmnt@IF4v@30.000000@100.000000@0@6@0@Henry-Kafura/Shepperd measure (visible)@
CCCC_MetTmnt@LOCf@30.000000@100.000000@0@6@0@Lines of code/function@
CCCC_MetTmnt@LOCm@500.000000@2000.000000@0@6@0@Lines of code/single module@
CCCC_MetTmnt@LOCp@999999.000000@999999.000000@0@6@0@Lines of code/project@
CCCC_MetTmnt@LOCper@500.000000@2000.000000@0@6@3@Lines of code/average module@
CCCC_MetTmnt@L_C@7.000000@30.000000@20@6@3@LOC/COM Lines of code/comment line@
CCCC_MetTmnt@MVGf@10.000000@30.000000@0@6@0@Cyclomatic complexity/function@
CCCC_MetTmnt@MVGm@200.000000@1000.000000@0@6@0@Cyclomatic complexity/single module@
CCCC_MetTmnt@MVGp@999999.000000@999999.000000@0@6@0@Cyclomatic complexity/project@
CCCC_MetTmnt@MVGper@200.000000@1000.000000@0@6@3@Cyclomatic complexity/average module@
CCCC_MetTmnt@M_C@5.000000@10.000000@5@6@3@MVG/COM McCabe/comment line@
CCCC_MetTmnt@NOC@4.000000@15.000000@0@6@0@Number of children@
CCCC_MetTmnt@WMC1@30.000000@100.000000@0@6@0@Weighting function=1 unit per method@
CCCC_MetTmnt@WMCv@10.000000@30.000000@0@6@0@Weighting function=1 unit per visible method@
CCCC_Dialect@c++.mfc@BEGIN_MESSAGE_MAP@start_skipping@
CCCC_Dialect@c++.mfc@END_MESSAGE_MAP@stop_skipping@
CCCC_Dialect@c++.stl@__STL_BEGIN_NAMESPACE@ignore@
CCCC_Dialect@c++.stl@__STL_END_NAMESPACE@ignore@

Configuration of the extension/language mapping

Records in the configuration stream of type CCCC_FileExt control the mapping of file extensions to languages.

Treatment of metric values

Records in the configuration stream of type CCCC_MetTmnt control the treatment of values for each of the metrics defined by CCCC.

Ignoring compiler-specific keywords

Records in the configuration stream of type CCCC_Dialect control the handling of dialect-specific pseudo keywords by the CCCC parsers.

Disclaimers

CCCC was produced as an artifact of an academic research project. The primary motivation was to provide a platform for the exploration of issues related to metrics. The program is not now, and will never become, a commercial standard supported product. While CCCC attempts to recover from parse failures, there are some language constructs which cause crashes, and others which result in code going unanalyzed. If CCCC does not report syntax errors and terminates normally, it is likely that all files have been analyzed, otherwise it is strongly recommended that the user does some kind of independent check on the quantity of code ignored, rather than relying on CCCC's own report on 'Rejected Extents'.

There is one further important disclaimer. As noted above, the primary motivation for the development of CCCC was to aid an academic project to investigate the use of metrics. Over the five years or so the project was running various research activities were conducted, culminating in a practical experiment into the value of metric analysis data in a simulated software engineering task. The aim of this final experiment was to attempt to demonstrate a benefit from the use of such data, its conclusion was that, at best, the presence of a benefit was "not proven". The research project, including the design and outcomes of the final experiment is described at http://www.fchs.ecu.edu.au/~tlittlef, which includes a link to download a PDF of the final PhD thesis arising from the project.

The CCCC project is now dormant. There are no plans for new releases, either to add new features to the program or to fix existing defects. The current version of the program is released under the GNU Public License, giving users the right to work on the source code to address any specific issues they have. The project is hosted on sourceforge.net, the primary developer, Tim Littlefair, can be contacted by email and will be happy to provide advice and encouragement. Contact details appear on the sourceforge website.

Getting CCCC

The best place to look for information about CCCC is the CCCC home page at http://cccc.sourceforge.net.

CCCC downloads are accessible via the standard SourceForge project hierarchy starting at http://sourceforge.net/projects/cccc. SourceForge also hosts mailing lists where new versions are announced and a bug tracker database for the project.

The CCCC distribution includes a version of the Purdue Compiler Construction Toolset (PCCTS) originally created by Terence Parr and coworkers at Purdue University, later maintained by Tom Moog. Many thanks to Terence, his colleagues and Tom for developing this excellent tool, and for releasing it under terms which make it possible for it to be included in the CCCC distribution.

The Win32 installer package for CCCC is created using version 2.0.18 of the "My Inno Setup Extensions" package by Jordan Russell, based on "Inno Setup" by Martjin Laan. This package can be downloaded from http://www.wintax.nl/isx

The program will also require a C++ compiler to build. Past versions have been buildable with various versions of the GNU C++ compiler and/or Microsoft Visual C++, although the code is intended to be portable to a range of modern C++ compilers (with a bit of work in some cases, as the original code base dates back to times before the ANSI standardisation of the C++ language). The reference build tools for the current version are GCC version 3.3 and the freely distributed Microsoft Visual C++ Toolkit 2003. See http://msdn.microsoft.com/visualc/vctoolkit2003/ for details.

User Guide for CCCC

Table of Contents

Introduction

Report Contents

Counting Methods

Command line syntax

Configuration

Disclaimers

Getting CCCC

Introduction

Report Contents

Tables generated

Metrics displayed

Counting methods

Command-line syntax

Configuration

Configuration of the extension/language mapping

Treatment of metric values

Ignoring compiler-specific keywords

Disclaimers

Getting CCCC