Software Coding Guidelines for C and Assembler

Software Coding Guidelines for C. Rev.2

S&M Process Control Technology Inc.

Introduction.

The coding guidelines presented below are the result of my 25 years of experience, which began in my college years with paper tape and ferrite core memory, and progressed through the era of Fortran, Pascal, PL/1, C and C++.

The single most important thing I have learned in software development is that:

- Writing a clear, easy to understand and maintainable code is as important as making that code perform a specified design objective. I cannot overemphasize the importance of this statement. Cutting corners definitely does not work. Any amount of time saved by the quick & dirty programming style or by employing more or less than the optimum number of programmers will inevitably show up (three-fold) in debugging, in more revisions or in having to scrap & re-write entire modules.

- Most problems occur when the human side is neglected and instead, the priority is given to time schedule or to hardware optimization. Of course, the software has to perform the specified task too; however neglecting the above mentioned principle will always (trust me) result in slower overall project development and higher costs.

- The following guidelines were devised with the "human" principle in mind., to make life easier for software and hardware engineers. Even if these guidelines result in slightly slower execution or slightly higher memory usage, the benefits are worth it. Another point that has to be mentioned is that I am very strongly biased towards the function-oriented programming style, as opposed to the data-oriented style. This approach is especially important for embedded or real-time control systems, as it matches extremely well the event-driven real world processes, such as:

Functions <==> events & processes,

Data <==> physical objects

Unfortunately, more often than not, C (and C++) software courses tend to overemphasize the data (or object) oriented approach. Data (or object) oriented approach does indeed work well in certain applications, e.g. business data bases, pure Windows programming etc., but this is not necessarily the case in the embedded software. These guidelines will help to clarify this. One final note – the quidelines were written specifically for C, not C++. Although in C++ some of the rules would still be perfectly reasonable, some others would have to be added.

1. Global structures and unions.

No direct access to global C structures or unions should normally be permitted, except in some very special cases (e.g. legacy code). If the specific nature of the project leads naturally to some specific structure constructs (e.g. display window, database record, initialization data etc.), the individual fields of a structure should be accessed through special functions or macros, rather than directly, and the structure itself should be declared local to the module rather than global.

The purpose of this rule is threefold:

to make the code easier to read by hiding the complexity of proprietary, hard-to-memorize data structures within functions,
encapsulating the data assignment inside function results in better robustness by allowing data range checking and verification,
makes it much easier to modify, extend or upgrade the data structure without having to comb through the whole code searching for all the structure instances.

Generally, extensive usage of structures should be discouraged, within the bounds of common sense, of course. A structure is often less memory-efficient and slower to access than a simple variable or an array. Too many stuctures and unions (especially if nested) make the program harder to read and make it bug-prone. In some extreme cases (example: Microsoft Foundation Classes) , the programmer has to waste a lot of time searching between various definitions, declarations and the instantiations of the structures/unions or classes.

2. "Dreaded" instruction goto.

Similarily as in case of data where the overuse of the structure or unions leads to less readable data, similarly the overuse of goto leads to less readable program flow. However, contrary to the popular texbook opinion, a complete elimination of goto can be quite counterproductive. There are often situations when a goto instruction to jump out of a midst of a nested while/for loop may actually enhance the readability of the program. Use your common sense and observe the primary objective (see the introduction)!

3. Definitions.

Useage of macros (#define, typedef) and struct/union definition passed acrossed modules through include files (*.h) should be minimized, within the common sense. For large projects, it is usually difficult to remember which include file should be included in wich module, which results in confussion and bugs. Keep it simple. Common constants such as PI, TRUE,FALSE (did anyone ever need to change them?) , common macros or typedefs such as MIN, MAX, uint, uchar etc should simply be written locally in the modules that need them. On the other hand, certain global project-specific definitions, such as DEBUG_SWITCH, REVISION_NUMBER, CPU_TYPE etc. should be included globally. In such a case, it is recommended to have only one global project-specific header file of the same name as the project, e.g.: MYPROJ.H , which should contain all those definitions and macros.

It is worth keeping in mind that a program with too many definitions and macros is harder to debug. For example, watching under debugger a variable disp_type having a value of 12 does not tell us much, especially if we forgot that somewhere in one of many header files we might have had:

#define VGA800 12

The solution is to use strings instead of numerals to store parameters, for example:

char *display_type = "VGA800";

The above is less computer memory efficient, but much more human efficient.

4. Global variables.

Global data should not be recommended, except in some special cases (such as legacy code, or for some singular very large structure or array when conservation of memory or execution time is critical). Data structures that need to be shared among multiple modules should be allocated locally (statically or dynamically) in one module and accessed or manipulated only through function calls.

It is recommended to assign and to read program configuration or setup parameters through function calls (one-liners!) during execution time, rather than accessing them directly as global variables, or as #defines declared at compilation time. In fact there should be no need for global variables if data is accessed through functions and stored locally; these variables can for example, be declared static within a module. Example:

// Module AAAA.C

static uint s_rpm = 0;

void aa_asgn_rpm(uint x) { s_rpm = (x>=300 ? x:0); return; }

uint aa_get_rpm(void) { return(s_rpm); }

The above function-oriented example is slightly less memory efficient than the usual data style, but more human-friendly and much easier to maintain throughout future revisions. For instance, if we ever wanted to upgrade it to float at some stage in future, or add more error checking, we can very easily do it without affecting the existing code at all, such as:

// Module AAAA.C

static float s_rpm = 0.0;

void aa_asgn_rpm(float x) // Note: this function must have a prototype declared

{ // for the arg. type cast to work when called with uint

if(x<300 || x>720) mess("aa_asgn_rpm: out of range");

else s_rpm = x;

return;

}

uint aa_get_rpm(void) {return( (uint)s_rpm);} // called by existing old code

float aa_get_rpmf(void) {return( s_rpm);} // called by new code

The reasons behind the above recommendations are:

makes program easier to understand and less cluttered with cryptic assignment instructions.

better robustness: function arguments checked at run-time,
easier to debug (tracing function calls is easier than tracing data assignments),
program is easier to update and to maintain,
easier to port to a different programming platform,

functional style is better suited to split work across several members of a team.

5. Included header files.

One should try to minimize the number of #include files in any given module. In particular - multilevel nesting of header files should be strongly discouraged! Lint program (see Section 9) is highly recommended for checking whether a particular #include file is really needed. See also section 11 on naming conventions.

6. Prototypes.

All functions of a module must be prototyped, and the prototypes should be placed in a header files with the extension "I" and of the name of the module. For example, prototypes of all functions of module AAAA.C should reside in the header file AAAA.I, and this header file must contain only the prototypes of functions and declarations of global variables (if any) defined by AAAA.C . It is strongly recommended that the prototype file be generated automatically using LINT, example:

lint co-msc options -u -mL AAAA.C -od(AAAA.I) > AAAA.ER

Notes:

(1) it is not advisable to place other definitions in .I files (use .H) , otherwise it would not be possible to automatically LINT-regenerate the prototype files.

(2) If some function aruments are of complex types, then the definition of these types should also be exported alongside the prototypes. Unfortunately, this cannot be automatized easily by LINT. For that reason, it is more convenient to avoid passing structures or pointers to structures as arguments, and use generic pointers (void *) instead.

7. Simple memory saving techniques.

If possible, variables of type float should be used rather than double in order to save memory, except when arithmetic precision is really needed. For the same reason, variables of type uchar should generally be used instead of int or uint as switches, flags or boolean (logical) values. Temporary buffers should be allocated (and freed) dynamically inside the functions, rather than being declared statically, e.g.:

void myfunc(void)

{

char *buf;

buf = (char *)malloc((size_t)4000);

//......

free((void *)buf);

return;

}

or simply:

void myfunc(void)

{

char buf[4000];

//......

}

If big arrays of numbers are needed, one should attempt to formulate an algorithm in such a way as to store data as arrays of int or uchar instead of float, long int or double.

(Note: saving of memory by using bit-fields for flags is usually not worth the effort, unless very big arrays of flags are needed)

Frequently used constant strings should be declared & initialized as static string variables local to a module, rather than by #define . Similarly, frequently used double constants should also be declared and initialized as static variables not as #define macros.

8. Parameters passed into functions.

Most function arguments should be checked at run-time against out-of-range, null-pointer or other obvious conditions. Static data that need to be initialized should be checked inside functions that access them if they had been initialized. It is also a good idea to structure a module in such a way as to have most of the functions taking only one or no arguments; it makes them easier to read and remember. It is also recommended to use string arguments as switches instead of #define constants, examples:

RECOMMENDED (EXCEPT UNDER MEMORY CONSTRAINTS):

aa_set_disp("LCD");

aa_set_amp("gain=HIGH");

aa_set_amp("gain=LOW filter=ON");

aa_set_amp("filter=ON GAIN=low"); // it is nice if function is intelligent enough to

// be able to decode it, regardless of order and case

NOT RECOMMENDED BUT ACCEPTABLE:

#define AMP_GAIN_HIGH 1

#define AMP_GAIN_LOW 0

#define AMP_FILT_ON 1

#define AMP_FILT_OFF 0

aa_set_amp_gain(AMP_GAIN_HIGH);

aa_set_amp_filter(AMP_FILT_ON); // with strings - one function would be enough!

NOT RECOMMENDED, NOT ACCEPTABLE!

#define ON 1 // somewhere else one might require ON defined as 0

#define OFF 0 // it is not obvious that OFF apllies to amp filter

#define HIGH 1

#define LOW 0

struct ampsetup // this data structure should be hidden from view

{ // otherwise any future modification would

int gain; // require changes to all instances in other modules

int filter; // (that's the main problem with data-oriented style!)

} amp;

aa_set_amp(1, 1); // cryptic arguments

aa_set_amp(HIGH, ON); // hard to remember what HIGH and ON are for

amp.gain = HIGH; // harder to modify this code in future revisions,

amp.filter = ON; // must remember the structure definition ,

aa_set_amp( &amp); // difficult to see what goes inside 'amp'

Generally, when making a decision on what kind of arguments a function ought to pass, one should consider the ease of use first (from the human point of view), and the CPU efficiency last. A function that takes initially half-an our longer to code may save days of work later on when it comes to using it repeatedly in other modules, and by other people.

9. LINT

It is mandatory to use C preprocessor program LINT on every module before it is internally released. It is recommended to LINT a module before each recompilation during development, as the bug-catching capabilities of LINT are far superior to most compilers. Modules should be debugged until there is no lint warnings or lint errors reported (only lint infos are allowed).

The following table contains the recommended LINT parameter settings, contained in the file

OPTIONS.LNT:

-e720		test of assignment
-e502 -e713 -e737 -eau		unsigned-signed
-e734		sub-integer loss of info
-e701 -e703		shifting int left is OK
+e718		undeclared functions
-h#2		error message height
-e123		macro arg check
-e534		function call return, no check
+e533		check function return statement
-e565		struct in prototypes not defined
-e569		allows signed/unsigned
-e732		allows signed/unsigned arg.
-e736		info - loss of precision
-e506		const boolean allowed
-e716		while(1)

An example of a batch command to call LINT:

lint co-msc options -u -mL AAAA.C -od(AAAA.I) > AAAA.ER

where

AAAA.C - LINTed module (input)

AAAA.ER - file containing LINT errors (output)

AAAA.I - automatically generated function prototypes and global data definitions (output)

10. Passing data to and from modules (big systems).

A module might communicate with other modules (or with the outside world) by means of function calls, by employing various operating system specific means (such as messages, semaphores, etc.), using other hardware specific communication channels or by text ASCII files. It is not recommended to use binary or record-indexed file formats for small or medium size files, since their speed advantage is minimal (except for very large data bases!), and are much more troublesome in terms of project maintanance.

Names of the communication channels (such as I/O devices, network drivers, files etc) should not be hard-coded but should rather be initialized from a specific configuration file or other channel (I/Odevice, EEPROM etc). For example, in case of PC-based systems, the configuration file should have the name of the main exe program file and extension of INI. Note (1): because of reliability issues, I would not recommend using the Windows registry system to store initialization data (other than the bare basics such as program name, revision and directory path).

(2) The above recommendations apply only to big systems, they are not very relevant in case of small embedded firmware.

11. Naming conventions.

This is quite arbitrary and subject to some aesthetical judgement, to a large extent. For the sake of consistency and appearance it is recommended that some naming standard is established, at least for the duration of a project and within a team. For instance, it does not really matter whether one uses this name style:

float gain = 0.25;

aa_set_amp_gain( gain );

or the following:

float fGain = 0.25;

AASetAmpGain (fGain);

as long as it is used consistently throughout the duration of a project, or better - company-wide.

It is suggested that all functions defined within a module share the same prefix in their names, for example, if the module name is AAAA.C the functions defined inside it should begin with 'aa_'. A header file containing definitions of constants (#define), typedefs and structures/unions which are to be used by other modules in conjuction with the functions from AAAA.C should be named AAAA.H and the (LINT-generated) prototype file should be AAAA.I , example:

#include "AAAA.H" // contains definition of aa_wind

#include "AAAA.I" // contains prototype of aa_disp

main()

{

struct aa_wind w={0,10,"this is defined in AAAA.H"};

aa_disp(&w);

return;

}

Static variables local to the modules should be prefixed with 's_', global variables accessible by all modules (if there are any!) should be prefixed with 'g_'

Simple functions that can only assign values to internal variables should contain 'asgn in their names, simple functions that only test status flags or port bits should contain 'test' in their names.

12. Error messages.

Error messages should appear in both the source code and on an output channel in plain English, NEVER encrypted as bare numbers or symbols, though they may contain embedded number codes and symbols, as well as text.

Error and warning messages should be printed or displayed using two standardized project-dependent routines err and mess placed inside a main program module, example:

void err(char *txt) // display text and quit program

{

printf(txt); // or something equivalent to display

// free all storage, restore video mode etc.

exit(0); // exit program or end task

}

void mess(char *txt) // display text and continue

{

printf(txt); // or something equivalent to display

return;

}

Example of error calls:

if( a<b) then err("8011: Fatal error, a<b");

if(a==0) then mess("8012: Warning, a is not initialized!");

Note that the above guidelines do not prevent implementing a different language output. For example, even though the original source code is hard-coded in English, it is possible to easily implement automatic re-assignment of all these messages to another language using the embedded codes, such as "8011" or "8012" of the above example, as indexes to to a string look-up table. That string table could be compiled with the code, read from EEPROM, read from a file, etc.

When a function needs to return an error code, it is NOT recommended to return it by return statement, instead it is better to pass it as a pointer and use a second function to decode and to display the errors, example:

uint aa_adc_read( uchar adc_channel, uint *perrcode )

{

// ..... (026)

// ..... the actual code to start, to poll and to read a serial A/D converter,

// .....

if( percode==NULL ) return( adc_value ); //error checking disabled

// ..... assign the bits of *perrcode as error flags, for example:

*perrcode=0 ; // 0 = no errors

// bit 0 = argument adc_channel out of range

// bit 1 = serial communication channel not initialized

// bit 2 = serial communication channel error

// bit 3 = A/D not ready, polling timeout,

// bit 4 = A/D value out of range

return( adc_value );

}

void aa_adc_disperr( uint *perrcode )

{

if( (perrcode==NULL) || (*perrcode==0) || (s_display==0) ) return;

if( *perrcode & 0x0001 ) mess("0001: A/D arg. adc_channel out of range");

if( *perrcode & 0x0002 ) mess("0002: A/D serial com channel not init.");

if( *perrcode & 0x0004 ) mess("0004: A/D serial com channel error");

if( *perrcode & 0x0008 ) mess("0008: A/D not ready, polling timeout");

if( *perrcode & 0x0010 ) mess("0010: A/D value out of range");

return;

}

and the actual calls would look like:

uint adc_value, errcode;

uchar adc_chan=1;

adc_value = aa_adc_read( adc_chan, &errcode );

aa_adc_disperr( &errcode );

By providing the library routine to display meaningfull error messages in plain text, we ease the uncertainty of having to rely on a user (=other programmer) to implement this task on every call to aa_adc_read . At the same time it allows the flexibility of NOT checking the error codes in some instances, for example, if the function is called repeatedly inside a loop.

13. Other rules of good coding.

a) At the top of the module, as a minimum - name the exported header files and list all the functions defined there, along with a short (one-line) description, example:

// Module AAAA.C Program EXAMPLE.EXE, Project C-Guide

// Exporting header files: AAAA.H , AAAA.I

// List of functions:

// aa_adc_ena - intialize serial channel for A/D converter

// aa_adc_read - read one value from the A/D converter

// aa_adc_disperr - display error messages after aa_adc_read

// aa_adc_dis - disable the serial channel

// ......

b) At the top section of the module, underneath the list of functions, write the numbered list of modification and bug fixes in chronological order. Optionally, use the consecutive tag numbers (such as 025, 026 in the example below) to tag the actual locations of the code affected by the change. Example:

// Tag# | Date |Initials| Comments

//------|----------|--------|---------------------------------------------

// 001 | 3-May-00 | SB | The code is written, Rev.0.00 alpha

// ...........

// 025 | 2-July-00| SB | Added aa_adc_disperr

// 026 | 3-July-00| SB | Fixed bug in aa_adc_read, released Rev.0.88 beta

c) If a module is badly written, it is very often more economical to scrap it and re-write it, than to try to fix it.

d) Never trust any data used by your function that originates outside of it, check it as much as possible (within reason).

e) Always check input arguments passed into functions for null pointers, check string-length, and verify that numerical values are within permitted ranges.

f) Place checks in the middle of the code to detect abnormal data values. Use Watchdog Timer (if available) to break/reset out of deadlocks. Use external device or built-in power-low (brown-out) reset feature in order to avoid running program with corrupted CPU registers or memory. For industrial applications, use single-chip controllers (SOC, SystemOnChip) rather than multi-chip systems, if possible.

g) Use traps to capture instances of run-time errors, for example load a code number into a top array location for checking against overflow at run time.

h) When using data files, place some key-words or key-numbers in regular intervals in order to be able to check integrity of the file when read. Check if the file exists (when opening).

i) Check static variables at run-time if initialised (using compiler's default initialization).

j) Use as natural notation as possible, for example, I think p[i] looks better than *p+1 , although both do the same. Similarily, it may be more friendly to write p[i], i++; rather than *p++ .

k) If an array pointer is passed into a function, pass the array length as a separate argument as well. Check arrays against out-of-bound conditions. Use string manipulation functions to check if a target string has been allocated (not NULL) and if there is enough storage to copy onto it.

l) Keep the amount of code at less than about 50 lines of C instructions per function, even if it means producing more functions. It is no tragedy if some functions grow bigger, but for the sake of clarity, the majority of the functions should be kept rather short.

m) All or most of input/output operations or hardware accesses should take place through functions acting as gateways. This will allow these accesses to be easily debugged or redirected whenever needed. For example, instead of printing directly to stdio, using printf etc, define and use the following function:

void

pr(char *fmt, float arg1)

{

// s_fileptr ---> screen, printer, file or none

if(s_fileptr != NULL) fprintf(s fileptr, fmt, arg1);

return;

}

n) If an I/O device require initialization, enabling, disabling, opening, selecting or closing a channel, it is recommended to construct individual functions performing the above operations explicitly, rather that built them into one big function, see example (a) above. If there is a need to have one general function to perform I/O operation including all details such as opening, selecting, and closing a channel, it is advisable to first write individual functions (called "primitives"), and then use them to construct the bigger function. Taking that example from (a), we can can now easily build this:

void

aa_adc_rdarray( uchar adc_chan, uint *pval, int nval, uint *perrcode )

{

int i;

if( pval==NULL || nval<=0 ) {mess("aa_adc_rdarray - bad arg."); return;}

aa_adc_ena( perrcode);

aa_adc_disperr( perrcode);

if( *perrcode!=0 ) return;

for( i=0; i<nval && *perrcode==0; i++ )

pval[i] = aa_adc_read( adc_chann , perrcode );

aa_adc_disperr( perrcode);

aa_adc_dis();

return;

}

---------- End Of Document --------