C/C plus plus tips

From LQWiki
Jump to navigation Jump to search

Overview

In a nutshell,..

C is ubiquitous on GNU/Linux and Unix. Back in frontier times, when computer scientists programmed mostly in assembly, C was considered a high-level language. Nowadays, most folks think of C as a low-level language, and comparatively few people (except compiler writers, OS kernel authors, and their ilk) use assembly language. C is a procedural language and is pretty easy to learn but has many pitfalls for the unwary.

C++ was originally conceived as a superset of C (back then even sometimes referred to as "a better C"). It is a multi-paradigm language but is frequently used to write object-oriented software. C and C++ are not the same language. C++ is not exactly a superset of C, but if you can program in C++, you can program in C. Speaking of supersets of C, Objective-C is an object-oriented superset of C. C++ is large, complex, and powerful -- and not for the faint of heart. It takes considerable time to master C++.

Compiled C code runs very fast. Some would say optimized C code can be as fast as hand-written assembly. Compiled C++ code may run just about as fast as C depending on how carefully the code is written. Programs written in interpreted languages like Python and Perl typically won't run nearly as fast as C or C++ programs.

If you are interested in writing system software, device drivers, compilers, operating systems, virtual machines, or graphics-intensive code (or similar software which needs to execute very quickly), you very likely want to learn and use C and/or C++. If you are just interested in GUI application software, C++ might be useful, but Python might be even better. If you are more interested in system administration, try Perl. If you want to write GUI software that will also be able to build and run on Mac OS X, have a look at GNUstep and Objective-C. If you just want to be different, maybe look into ruby, the D programming language, lisp, or any number of other languages that have implementations available for GNU/Linux.

You compile C and C++ code on GNU/Linux using the GNU Compiler Collection ("GCC"). A C program gets compiled with the gcc command, and a C++ program gets compiled with the g++ command. Note that GCC (the software package) compiles more than just C and C++. It can also build objective-c, java, ada, and fortran source code.

Comments: Early and often

Do comment your code. It rarely hurts to describe what you are doing, and more importantly, why.

Documentation is the most important part. Trying to find out how component X is supposed to be used by looking at the code alone is tedious and error-prone. This is especially important if the code might work anyway on platform Z even when you leave initfoo() out. You can be sure that someone on that platform will do so if it's not there.

Also, if you're about to do some complex task, write your intent briefly down in natural language first. This will both help you organize your thoughts and let someone else later grok what you were trying to do if the code after all doesn't. Sometimes the writing might be more clumsy than the code itself though. Use common sense.

An example of a good comment:

/*
 * Insert the values in buckets by their Nth byte. Bucket pointers are 
 * set up so that they will automatically form a concatenated list this way.
 */

Here we describe a perhaps non-obvious part of an algorithm. Very good.


However, a good code needs less comments than a bad one because it's more readable. Thus, the need to comment is often a sign of another problem:

 obscure++; /* this is pointer to the last element */

Renaming the variable does the same.

 last_el++; /* increment last element */

Now, that's pretty obvious.

 /* last_el points to the first_el+N now */

Otherwise a good one, but we could assert(last_el == first_el+N) instead.

 /*
  * Insert the values in buckets by their Nth byte. Bucket pointers are 
  * set up so that they will automatically form a concatenated list this way.
  */
 for (i=0; i<cnt; i++) for (j=0; j<i; j++) if (a[i]<a[j]) SWAP(a[i], a[j]);

Uh, that isn't really what we're doing, now is it? ;) Yes, comments get outdated when code gets replaced by better implementations etc. You need to keep them up to date. In this case, a mention that the algorithm was changed because it didn't work for negative values would be helpful.

Debugging

Let the Compiler Nitpick

When compiling with GCC or G++, always use the -Wall command-line argument to tell GCC/G++ to print extra warnings and errors. This can often help you locate potential problems in your code. But it's not enough to let the compiler spit the nitpicks at you; you have to take them seriously. Warnings are often a sign that your code is doing something it shouldn't.

Other useful options of gcc are -Wstrict-prototypes -Wmissing-prototypes. These help make sure your function calls have the correct arguments, even printf() calls.

[Examples?]

Remote Debugging with DDD and gdb

1. Make sure gdbsever and libthread_db.so.1 on the remote machine.

2. run gdbserver on target: gdbserver host:2345 /path/application

The "host" is ignored, so this can be anything. 2345 is an arbitrary TCP port not in use. 2345 ususally works, just make sure it does not conflict with other IP ports in use.

The target application on the remote machine does not require symbols (debugging information).

3. On the host (your local PC): a. ddd program (must have symbols) b. At the (gdb) prompt

   target remote xxx.xxx.xxx.xxx:2345 

Where the IP address is that of the remote machine.

That's it, remote debugging is up and running.

Optimization

Optimizing a program is often more an art than a science. But even in art, there are some rules that should be carefully followed (especially for new practitioners who aren't quite sure why the rules are important).

Optimization Rule #1 : DON'T!

You're coding along, minding your own business, when you notice a simple, quick optimization just begging to be done. Maybe it's as simple as replacing d = c * 2; with d = c << 1;. Why do an expensive multiply when you could do a cheap right-shift and come up with precisely the same result?

Why indeed? This optimization should do the right thing, but at the expense of a tiny bit of program clarity. Program clarity is a precious commodity, which isn't something that can be said for your computer's processor time. It may be that you'll end up making precisely this optimization later on. But right now, the important thing is to get the program working correctly.

Optimization Rule #2 : DON'T!

So now the program runs. It runs like a fifteen year old weiner dog. Sometimes it sits down on its haunches, staring off into space as though it's waiting for a doggie treat. So it's time to go in and replace that multiply with the bitshift, right?

Patience, grasshopper.

Before you decide if it's worth investing even one second in optimizing the program, decide how valuable your effort is going to be. The value derived from the effort is directly linked to the purpose of the program. Is it a tiny app you wrote for your own personal use, that you'll only use once a week? Is it part of a critical, high-performance application or library that will be used by millions, or a scientific number-cruncher which will take years to solve a problem? Most likely, it will be somewhere between these two extrema. To some extent, "fast enough" is a subjective judgment.

Optimization Rule #3 : DON'T!

Okay, I see you're getting impatient. You want to go in and make that aged dachshund run like a young rotweiler chasing a mailman smothered in barbecue sauce. But the bad news is, it's still not time to replace that line of code. If you go back in and start "optimizing" right now, you may end up with hundreds of little tweaks, whose overall effect is to make the code difficult to understand and impossible to maintain. Even worse, it may not speed your program up noticeably.

Before you make a single change, take advantage of a profiler (gprof (tutorial), for example). A profiler will tell you exactly where your code is spending all its time. The rationale for this is referred to as Amdahl's law : The amount of time saved by optimizing a section of code cannot be greater than the time the program spent running that code. For example, if a single for loop comprises 7% of a program's running time, then no optimization you could perform on that loop will cause more than a 7% performance increase.

Most programs spend most of their time executing a few very small areas of code. Find out which areas, and you'll likely get huge boosts from small changes.

Optimization Rule #4 : Never Assume an Optimization Works

It is important that you know exactly how the unoptimized version of your program was performing, because many optimizations don't do anything. Some "optimizations" will even slow your program.

Part of this is due to the fact that, without being told, the compiler will try to do whatever optimizations it considers "safe", and a programmer's attempts to optimize the code may interfere with the optimizations the compiler is trying to perform. The best thing is to perform one optimization, compare the new performance to the old performance, decide whether to keep it, and then move on to the next potential improvement.

Optimization Rule #5 : Three Words: Algorithms

Very frequently, the best way to speed up a program is to change the way you solve the problem. No amount of pointer arithmetic or loop unwinding is going to give you a thousand-fold improvement in runtime, but changing the algorithm frequently does just that. For example, changing a sorting algorithm from bubble sort to quicksort can make an hour-long sort take just seconds. [Need details, specifics]

[That's really 90% of optimization (sans details), but there are also things to cover like pre-caching, just-in-time calculation, optimizing for space vs. optimizing for time, etc.]


Save Compiler Optimizations for the End

Also, don't use any -O flags when debugging, it can compound current errors and make things really weird. They can possibly add new bugs too, due to the way the compiler does low-level optimization. The higher the optimization level, the riskier the optimizations that the compiler will attempt.

Once you know the program works in an unoptimized state, then it may be time to crank the -O flag up a couple of notches. Compare both performance and stability between the unoptimized and optimized versions.

Memory management

Valgrind

Many new programmers have difficulties with pointers and sudden segmentation faults due to careless usage. Valgrind is an excellent program to get more information about what is causing the problems.

new and malloc: Arch-nemises

Don't mix malloc and new; they might work together temporarily, but they're bound to cause errors. Pick which one from the start and stay with it. Valgrind will also point out this kind of error.

(Note: The same might be said for both <iostream> and <stdio.h> in the same program.)

Use Macros With Care

Macros are simply precompiler directives that instruct the compiler to replace certain strings of text with others before it attempts to compile. If used carelessly, they can make a program terribly confusing.

For example, if you wanted to be intentionally misleading, you could write:

#define CLOSE_BLOCK {
#define OPEN_BLOCK }
#define CIN cout
#define COUT cin

and then write the rest of the program as follows:

for( int i = 0; i < 100; ++i )
CLOSE_BLOCK
    for( int j = 0; j < 100; ++j )
    CLOSE_BLOCK
        CIN << data[ i ][ j ];
    OPEN_BLOCK
OPEN_BLOCK

But intentional obfuscation isn't the real danger. The real problem comes from accidental obfuscation, as in the following example.

Example: Proving the Answer

The Answer is elusive. Here is a prime example of why parentheses are needed around macro definitions.

life.c:


#include <stdio.h>

#define SIX 1+5      /* Should be (1+5) */
#define SEVEN 6+1    /* Should be (6+1) */

int main()
{
    printf("The meaning of life: %d x %d = %d\n", SIX, SEVEN, SIX * SEVEN);
    return 0;
}

Rather than evaluating (1+5) * (6+1) as we would hope, the compiler will now read the statement as 1+(5*6)+1. Result: 32 rather than 42, and thousands of years of time wasted by the galaxy's biggest supercomputer.