3 GCC distilled
Abstract: The GNU Compiler Collection (GCC)1 is a collection of compilers for various programming languages. GCC has been developed within the GNU Project2. Here we discuss how to employ GCC in order to compile, link, debug, and organize C and C++ programs.
3.1 General information
GCC is a free software and the standard compiler for Linux and other UNIX-like operating systems.
3.1.1 Supported languages
The compilation process is divided into two phases. During the first phase (the front end), a compiler analyzes a source code and converts it into internal instructions in the form of an abstract tree that is independent from languages and processors. During the second phase (the back end), these instructions are processed to create a code working on a given platform.
At present, front ends are written for various programming languages, and back ends are designed for all basic processors. GCC supports several basic programming languages: C, C++, Objective-C, Java, Fortran, Ada, and Go.
Historically, the first meaning of the GCC abbreviation was associated with GNU C Compiler. In this case, the compilation of C programs is emphasized.
GCC is written basically in C. The distribution kit contains standard libraries for C++, Java, and Ada,
The GCC collection works on many platforms. A platform is a combination of a certain processor and operating system.
In fact, GCC is the standard compiler for Linux systems. Note that GCC is compatible with many platforms. Several basic platforms are used to test the correct work of release versions of this software. The basic platforms are the following operating systems: Debian Linux, Red Hat Linux, and FreeBSD on Intel x86-64. Significant attention is also given to porting GCC to Windows.
3.1.3 Installation on Linux
For most users, the easiest way to install GCC is to install a package designed for your operating system3. The GCC Project does not provide binary files and contains only source codes.
The good news for Linux users is that all GNU/Linux distribution kits contain GCC. The user may only need to expand the list of installed components. For instance, one can install the GCC documentation.
The simplest way to install a new program on Ubuntu is to employ the Ubuntu Software Center in the Launcher. One can also use the Software Center to remove programs and search for a program by name or description.
The Synaptic Package Manager provides advanced package management capabilities for Ubuntu. This program combines all the features of the command-line package manager apt with a convenient graphic interface. Employing Synaptic, we can install, remove, configure, and upgrade packages for our system, browse among the list of available and installed packages, manage repositories, and upgrade the operating system.
The Advanced packaging tool (apt) can be used to manage packages on Ubuntu. We can install programs applying apt (apt-get). To install a new program xxx, the command apt-get install xxx is used. Downloading, installation, and configuration of programs are performed automatically. If the configuration needs more information, a request to the user will be displayed. The command apt-get remove xxx is utilized to remove the program xxx.
3.1.4 Work with GCC on Windows
The user can employ a precompiled and ready-to-work version of GCC on a windows operating system in the Cygwin Project. Cygwin provides the integration of windows-based applications, data, and resources with a UNIX-like environment. Cygwin is a powerful and advanced free collection of tools for porting UNIX programs to windows and cross-compilation (creating a binary code for one platform on other one). Cygwin has a large collection of applications which ensure a normal UNIX environment. In particular, besides GCC, Cygwin contains many other GNU development tools to perform basic programming tasks.
On the official website of Cygwin4, we can find the information about the current status of the project, software updates, and a list of ftp-servers from where Cygwin packages can be installed. The setup. exe file is used to install Cygwin. Applying it, only those components which are really needed can be selected. On the page Select Packages, packages are selected for downloadibg. In the development tools category Devel, the following packages should be selected: gcc-core (the C compiler), gcc-g++ (the C++ compiler), gdb (GNU debugger), and make (a tool to work with makefiles). For convenient application other packages may also be needed.
The Minimalist GNU for Windows (MinGW) Project was separated from Cygwin some years ago. minGW5 Project provides GCC compilers with a minimal set of development tools for windows applications. Cygwin provides much more complete compatibility with UNIX. The Minimal SYStem (MSYS) component provides a small independent environment with a lightweight UNIx-like shell. MinGW binaries and its components are available ans free to download6.
The main difference between MinGW and Cygwin is that in MinGW executive files are free from any additional dependencies. We also point out that there is a difference in licensing. Cygwin is distributed under the terms of the GPL license, which can impose certain restrictions on the commercial use of a developed software. MinGW is more liberal from this point of view and can be more appropriate for commercial use.
3.2 Available documentation
3.2.1 Documentation on the internet
A workable system implies the possibility of obtaining help on various questions. Questions maybe different, which means different types of help information and ways to get them.
For GCC, we can use resources on the internet. We can begin with pages of online documentation of GCC7. Here we mention, in particular, Using the GNU Compiler Collection.
Among the available online tutorials, An Introduction to GCC for the GNU Compilers gcc and g++10 by Brian Gough should be mentioned as very useful.
3.2.2 Help via –help
Let us consider the basic ways of getting help with GCC and other programs on a Linux system. On windows systems, access to help information is provided by using Cygwin. In this regard, MinGW is not as good, the user will find little information.
The quick reference for most Linux programs is easy to call up by running a program with the option --help (or -h). For instance, for the C++ compiler (the command g++), we have
To display the pages, the command more is used in combination with --help |more. After the screen is full, the command pauses and displays in the bottom line the message --More--. By pressing Enter, the next line of the text is displayed. By pressing SPACE, the next full screen will appear.
The command less provides more functionality: navigation through a page using back and forward, scrolling a text with arrows, transition to a specific line of the text, search in the text, etc.
3.2.3 Manual pages
Manual pages for Linux are split into sections which reflect the functionality of programs (see Table 3.1). The man program looks for manual pages just in this order. Manual pages are browsed with the command less.
The structure of manual pages is well-established. Usually, they contain the following parts: NAME – the name and a brief description; SYNOPSIS – how to use the command; DESCRIPTION – the description of the command functionality in detail; EXAMPLES – examples of use; SEE ALSO – related man pages.
The command apropos is used for searching by keywords in the names of man pages and short descriptions of manual pages. The command whatis is applied to search for man pages. It searches in the names of man pages and displays a brief description of found pages.
|1||Executable programs or shell commands|
|8||System administration commands|
3.2.4 Use of info
The advanced help system for Linux programs is known as info pages. In contrast to man pages, info supports hypertext, which makes it possible to easily move around a document.
The list of existing info pages is available by running the info command:
To get help on a command, info with the name of the command must be executed. We can move around the document using the arrow keys: page navigation is performed via Page Up and Page Down. An example of the command info gcc:
3.2.5 Tools to visualize help information
The above-considered man and info pages are focused on the command-line interface. This may be inconvenient to use.
There are special programs which permit viewing of man pages in graphics mode. Xman should be mentioned among them.
Konqueror – a web-browser, file manager and universal document viewer – provides ample functionality for viewing and navigation through help documentation.
The yelp program demonstrates a similar functionality.
3.2.6 Additional help
Most man pages are located in the directory /usr/share/man. The manpath command returns the full list of search directories for manual pages. The files of the info pages are located in the directory /usr/share/info.
For most programs, additional information is presented in other formats (text, PDF, Pos tScript, HTML) and located in the directory /usr/share/doc. An example is the contents of the directory /usr/share/doc/gcc-4. 6 (the 1s command):
3.3 Compilation workflow
A compiler reads a set of instructions in a high-level programming language implemented as a program or separate program module and translates it into a set of instructions in machine language or a language close to machine language. To illustrate the work of a compiler, let us consider a C++ program.
3.3.1 Up-and-runningwith GCC
The main stages of compiling a C program are presented in Figure 3.3.
The preprocessor is used by the compiler before processing a program to make some changes in it. The preprocessor modifies the source code, includes additional files with function definitions, replaces macros (reductions for fragments of the source code) by the corresponding macros definitions, etc. Special preprocessor directives of conditional compilation can be applied to include or exclude parts of a program. In GCC, the executable program of the preprocessor is named cpp.
One of the main steps of compilation is the translation of a program or separate program module written in C/C++ programming languages into assembler language (as a file with. s extension), which is a low-level programming language. In GCC, translation of a program is provided by the program gcc with the -s option.
The translation of a program into an executable machine code is performed by the assembler. The assembler of GCC creates object files with the. o extension.
The final step is linking the executable file to the object files, a part of which may be in the object libraries. For this, in GCC, the ld program is used. The complete program (a. out) can be executed on a computer.
3.3.2 Test program
We now illustrate the work of compilers on a test program written in the C++ language.
The program consists of three files: int. cpp, int. h, and test. cpp. The intm function implements the computation of an integral by the rectangle method (the midpoint rule).
In test. cpp, the integral
with the exact value I = π is calculated approximately.
For the C++ language, we invoke the g++ compiler from GCC. To see the output of the preprocessor, we run g++ with the – E option (or the cpp command). If we apply this option, the compiler stops after preprocessing. In the result, we obtain the source file including the contents of header files.
The iostream header file is included into the code. In turn, iostream comprises several header files associated with input and output operations. Further, the math. h header file contained the sqrt function, as well as our header file int. h, are also included in the code. In the main function, the string PI is replaced by 3.141592653589 according to the preprocessor directive.
3.3.4 Object code generation
The following command is employed to generate a code in the assembly language:
In the result, the file test. is created:
The compiler command with the -c option is used to compile the test. cpp file.
The output is a file test. o.
3.4 Compiling a C/C++ program
3.4.1 Input files for compiling
A compiler analyzes the names of the files passed to the compiler as arguments and determines what actions it has to perform. In particular, the compiler defines the language and includes the corresponding standard libraries. Tables 3.2 and 3.3 present a brief description of input files for c and c++, respectively.
We can explicitly instruct the compiler to consider source files with the suffix . c and h as C++ source files. To specify the language, the gcc command with the -x LANGUAGE option is used. The LANGUAGE option is, e.g. c, c-header, cpp-output (for c) or c++, c++-header, c++-cpp-output (for C++). The second way is to invoke the g++ command instead of gcc (for C++).
|.a||Static object library|
|.c||c source code|
|.i||c source code that should not be preprocessed|
|.so||Shared object library|
|.a||Static object library|
|.C, .cc, .cpp, .c++, .cp, .cxx||c++ source code|
|.h, .hh, .hpp, h++||Header file|
|.ii||c++ source code that should not be preprocessed|
|.so||Shared object library|
3.4.2 Options for searching for a directory
To get a list of directories to be searched by gcc for programs and libraries, we employ the command gcc-print-search-dirs. The output is
The – IDIR option is applied to add the directory DIR to the beginning of the list of directories to be searched for header files. This allows changing of e.g. system header files, since system directories are scanned after. If we use several – I options, then the specified directories are scanned in left-to-right order.
The similar – LDIR option adds the directory DIR to a list of directories of libraries.
3.4.3 Options for controlling a language
A set of options controls dialects of the languages that the compiler accepts. The option – std = STANDARD is applied to define the language standard version STANDARD. For instance, –std = C90 is used to determine the C90 standard for the C language. The GNU dialect of the C 9 0 standard, which includes some features of C99, is selected using -std=gnu90.
For the C++ language, the option -std=c++98 corresponds to the c++98 standard (ISO/IEC 14882), the GNU dialect of this standard is governed by c++98.
The -ansi option is equivalent to -std=c90 and -std=c++98 for c and c++, respectively.
3.4.4 Options for warnings
A set of options is available to handle warnings. A warning is a diagnostic message about constructions which are risky or probably contain an error.
The -pedantic option is used to print warnings about standard mismatches. In this case, the compliance with the language standard is checked and constructions forbidden by the standard version are rejected. This option is useful in order to achieve maximum portability.
Code syntax checking is provided by the -fsyntax-only option. The -Wall option is applied to print the main warnings. Also, the –Werror option transforms all warnings into errors.
3.4.5 Options for optimization
The GCC compiler supports various options for optimization. It is necessary to keep in mind that the impact of most of these options on the performance of a resulting program is ambiguous; in some cases, we get an increase in essential performance, whereas in other cases, a significant decrease in performance is observed.
The level of optimization is specified by the option –OLEVEL. The minimal optimization corresponds to LEVEL equals 1 (the option –O1), the moderate and aggressive levels are prescribed by 2 and 3, respectively.
At our own risk, we can use the –Os option to optimize (by reduction) the code size (based on –O2). If we consider -Ofast (the basic optimization-03), the optimization is conducted by disregardibg the strict standard compliance.
3.5 Libraries and linking programs
A compiler creates object files which are linked with other object modules into an executable program. Here we will briefly discuss working with object modules, which can be implemented as static or dynamic libraries.
3.5.1 Simple linking
In the simplest case, object modules from separate files in a directory of a program are combined to form an executable program.
For our test program, using the commands
we compile the source files into the object files. Next, these object files are linked into the executable program test:
3.5.2 Linking with a static library
A static library is a collection of object files created by the compiler. The names of static libraries usually begin with 1 ib (prefix) and end with. a (suffix). The ar utility is used to work with the contents of a static library.
For example, after compilation of the file int. cpp via the command we obtain the object file int. o.
The ar command with the -r option is employed to create a new library from object files. In our case, we include the object file int. o into the library libint. a using the following command:
Further, we compile test. cpp with the static library libint. a:
A short form of library names can be used applying the option -1:
In the above case, we clearly indicate that the current directory contains the library.
The modules of a static library are included into a program only if they have functions and data called in the module that, in turn, is already used in linking. Therefore, linking with a static library generates a smaller executable file than linking with separate object files.
3.5.3 Linking with a shared library
A shared (dynamic) library is a collection of object files in which references to variables and function calls are relative rather than absolute. This allows loading and executing shared modules dynamically during the start and execution of a program. Here we have a small size of executable files, and moreover, several programs can simultaneously employ object codes from one shared library.
To build a shared library (suffix. so), it is necessary to prepare in a special way the object files which are to be included in the library. In our test program, we compile int. cpp by the following command:
Special compilation into the position-independent code is provided by the – fp i c option.
The next step is the creation of the shared library libint. so:
Compilation and creation of the shared library may be combined in the command
Further, we compile test. cpp using the shared library libint. so:
If we link a program with a static library, then all the object modules are located in a single executable file. This makes for portability of the program. In contrast, shared libraries must be available during linking and execution of the program, whereby any attempt to run the program test_d will be unsuccessful.
This message sends the dynamic linker, which cannot find the library libint. so.
The search for shared libraries is carried out in
- – the directories mentioned in the environment variable LD_LIBRARY_PATH;
- – the list of libraries from the file /etc/ld. so. conf;
- – the directory / lib;
- – the directory/usr/lib.
For example, we can add our current directory to the list /etc/ld. so. conf. After changes in the configuration file /etc/ld. so. conf, an update of the settings is provided by the command ldconfig.
In the development of programs, the debugging stage is applied to identify and correct errors. The localization of errors is performed when a program is executed with the control of the current values of variables. In the following we demonstrate the usage of GNU Debugger (GDB)11.
3.6.1 Compilation with GDB
GNU Debugger provides debugging of any applications. The full debugging is performed if the compilation and linking includes some debugging information about a source code of a program. Using a debugger, we can run any program line by line, study the variable values, and run a program until a certain prescribed point and stop at that point.
Special compiler options can be employed to set the level and type of debugging information. To generate minimal debugging information (tracing of function calls, global variables), we compile a program with the -g1 option (the first level). The second level, which corresponds to using the -g2 option (-g), generates the information about local variables and code lines. Full debugging information is available by using -g3.
It is not recommended to apply options for debugging together with options for optimization. The optimization complicates the program execution tracing.
We compile our test program with the debugging information as follows:
Further manipulations are performed with the executable file test.
3.6.2 Getting started with GDB
To debug a program using GDB, we should define the name of the program as the first argument of gdb. We can also invoke the debugger and then load a program using the command load. In our example, we run GDB using the command gdb test. The output is as follows:
Here (gdb) is a regular command prompt, which informs that the debugger is waiting for a new command.
3.6.3 Source code
The command list causes the source code of the program to be displayed:
Here ten lines of code are displayed. If the name of a function is defined in the command list, lines start from the beginning of the function. If the number of a line is given, then this line is printed first.
The gdb with the -tui option (GDB text user interface) provides advanced features for debugging. The text interface has a separate text window to test a program. Figure 3.4 demonstrates an example of using TUI.
In TUI mode we can use the PgUp key to scroll a text one page up, and the PgDn key is employed to scroll one page down. The Up key makes it possible to move one line up, the Down key is applied to move one line down. The Left and Right keys are used to move left and right, respectively.
The basic strategy for debugging by GDB is connected with the use of breakpoints for a running program and observation of internal data.
Using the break command, we can set a breakpoint referring to a function or line. If a program consists of several files, we define the name of a file with the: delimiter. For example, we set a breakpoint to the 8th line of the int. cpp file:
The clear command removes breakpoints.
We can specify several breakpoints. The info breakpoints command displays the position and description of all breakpoints and indicate their number.
We can also enable or disable certain breakpoints by their numbers (enable, disable commands).
After setting breakpoints, we run the program using the run command:
Resuming execution of the program stopped by the debugger is performed by means of the continue command.
3.6.5 Displaying data
The debugger GDB allows us to easily check the data of a running program. The display command is used to display the values of indicated variables.
We can employ the print command to display the values of expressions, as shown in the example. The ptype command prints the type of any variable.
3.6.6 How to step through a program
To execute the next line of the code, we can use the step command. In this case, execution of all machine instructions corresponding to one line of a source code continues, and the debugger passes to calling functions. A similar action is performed using the next command, but a function call is treated as one line, and the command is executed until exiting this function.
The commands nexti, stepi differ from next, step; only one assembler instruction is executed in them.
3.7 The make utility
The main purpose of the make utility is to provide automatic compilation of source codes composed of many files into object files and the following linking into executable files or libraries. On the basis of the information about the time of the last change of a single file, the make utility determines and runs the necessary programs.
3.7.1 Usage of the utility
As a rule, a developed program consists of many files. When we change a single file, the corrected version of an executable file can be obtained by recompiling all the files of a project. This is irrational because we have changed only one file. The make utility easily resolves this problem.
first searches for the file with name makefile in the current directory. If it is not found, then the file Makefile is searched. If this file also does not exist, then make stops. If for the instruction file some different name is given, e.g. proj ect, then make is run with the – f option:
The test program consists of 3 files: int. cpp, int. h, and test. cpp. In the project directory, these source files are placed into the src directory (see Figure 3.5). The executable file test and instruction file makefile are also contained in the project directory.
To compile the source files and create an executable file test in the project directory, we should go to the project directory and run the following command:
This single-command approach is acceptable for our project, if extra recompilation costs do not matter. But this is impractical for large projects with many large-size files.
3.7.3 Simple makefile
Now we create makefile for our project. This file contains sections for targets, dependencies, and commands. In makefile, they are arranged as follows: first, the name of the target is defined (usually it is the name of an executable or object file), followed by the: delimiter; secondly, the names of dependencies (files required for this target) are specified. Further, lines are connected with a list of commands, which must be done to achieve this target.
The structure of the file is
It is assumed that each command must start with a tab character.
For our project, the simplest makefile looks like this:
Further, in the project project, the make utility should be run:
3.7.4 Phony targets
When we apply the above makefile, the test, test. o, and int. o are created in the project directory. After debugging we need to clean the directory from auxiliary files. This can be done by adding the following rule:
to the makefile. No file will appear if we employ the rule clean. To delete files, we can use the command make clean. Similar targets, which are not represented as files, are called phony targets.
This rule does not work if a file named clean exists in the directory with the file makefile. The clean target does not have any dependencies and will always be considered up to date. Here this command for deleting files will never be performed. To resolve this problem, the special target. PHONY is used which explicitly declares the phony target
It seems reasonable to include in makefile the following standard phony targets:
- – all – execute all tasks to create a program,
- – install – install programs from compiled binary files,
- – clean – delete generated binary files etc.
String variables are actively used in applying makefile. Names of variables are given as upper case, as in the following:
To get the value of a variable, we need to enclose the name of the variable in parentheses with the $ character at the beginning ().
To reduce code from multiple duplication of file names, special automatic variables are employed. For instance, the $@ character is replaced by the current target, the $ character is replaced by the list of all dependencies with their directories.
The use of variables is illustrated by the following variant of our make file:
Here the VPATH variable defines the directory with the project files. Names of object files (the OBJ_FILES variable) are obtained by renaming the files of the project (the patsubst function), the CFLAGS and LDFLAGS variables define compiling and linking with inclusion of debugging information.