Algorithms are at the core of computing. To be able to write an algorithm once and for all to work with any type of sequence makes your programs both simpler and safer. The ability to customize algorithms at runtime has revolutionalized software development.
The subset of the standard C++ library known as the Standard Template Library (STL) was originally designed around generic algorithms?code that processes sequences of any type of values in a type-safe manner. The goal was to use predefined algorithms for almost every task, instead of hand-coding loops every time you need to process a collection of data.
Stream iterators
Like any good software library, the Standard C++ Library attempts to provide convenient ways to automate common tasks. We mentioned in the beginning of this tutorial that you can use generic algorithms in place of looping constructs. So far, however, our examples have still used an explicit loop to print their output. Since printing output is one of the most common tasks, you would hope for a way to automate that too.
That?s where stream iterators come in. A stream iterator allows you to use a stream as either an input or an output sequence. To eliminate the output loop in the CopyInts2.cpp program, for instance, you can do something like the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | // Uses an output stream iterator #include <algorithm> #include <cstddef> #include <iostream> #include <iterator> using namespace std; bool gt15(int x) { return 15 < x; } int main() { int a[] = {10, 20, 30}; const size_t SIZE = sizeof a / sizeof a[0]; remove_copy_if(a, a + SIZE, ostream_iterator<int>(cout, "\n"), gt15); } ///:~ |
In this example we?ve replaced the output sequence b in the third argument to remove_copy_if( ) with an output stream iterator, which is an instance of the ostream_iterator class template declared in the <iterator> header. Output stream iterators overload their copy-assignment operators to write to their stream. This particular instance of ostream_iterator is attached to the output stream cout. Every time remove_copy_if( ) assigns an integer from the sequence a to cout through this iterator, the iterator writes the integer to cout and also automatically writes an instance of the separator string found in its second argument, which in this case contains just the newline character.
It is just as easy to write to a file instead of to cout, of course. All you have to do is provide an output file stream instead of cout:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // Uses an output file stream iterator #include <algorithm> #include <cstddef> #include <fstream> #include <iterator> using namespace std; bool gt15(int x) { return 15 < x; } int main() { int a[] = {10, 20, 30}; const size_t SIZE = sizeof a / sizeof a[0]; ofstream outf("ints.out"); remove_copy_if(a, a + SIZE, ostream_iterator<int>(outf, "\n"), gt15); } ///:~ |
An input stream iterator allows an algorithm to get its input sequence from an input stream. This is accomplished by having both the constructor and operator++( ) read the next element from the underlying stream and by overloading operator*( ) to yield the value previously read. Since algorithms require two pointers to delimit an input sequence, you can construct an istream_iterator in two ways, as you can see in the program that follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // Uses an input stream iterator #include <algorithm> #include <fstream> #include <iostream> #include <iterator> #include "../require.h" using namespace std; bool gt15(int x) { return 15 < x; } int main() { ifstream inf("someInts.dat"); assure(inf, "someInts.dat"); remove_copy_if(istream_iterator<int>(inf), istream_iterator<int>(), ostream_iterator<int>(cout, "\n"), gt15); } ///:~ |
The first argument to replace_copy_if( ) in this program attaches an istream_iterator object to the input file stream containing ints. The second argument uses the default constructor of the istream_iterator class. This call constructs a special value of istream_iterator that indicates end-of-file, so that when the first iterator finally encounters the end of the physical file, it compares equal to the value istream_iterator<int>( ), allowing the algorithm to terminate correctly. Note that this example avoids using an explicit array altogether.
Algorithm complexity
Using a software library is a matter of trust. You trust the implementers to not only provide correct functionality, but you also hope that the functions execute as efficiently as possible. It?s better to write your own loops than to use algorithms that degrade performance.
To guarantee quality library implementations, the C++ standard not only specifies what an algorithm should do, but how fast it should do it and sometimes how much space it should use. Any algorithm that does not meet the performance requirements does not conform to the standard. The measure of an algorithm?s operational efficiency is called its complexity.
When possible, the standard specifies the exact number of operation counts an algorithm should use. The count_if( ) algorithm, for example, returns the number of elements in a sequence satisfying a given predicate. The following call to count_if( ), if applied to a sequence of integers similar to the examples earlier in this tutorial, yields the number of integer elements that are greater than 15:
1 | size_t n = count_if(a, a + SIZE, gt15); |
Since count_if( ) must look at every element exactly once, it is specified to make a number of comparisons exactly equal to the number of elements in the sequence. Naturally, the copy( ) algorithm has the same specification.
Other algorithms can be specified to take at most a certain number of operations. The find( ) algorithm searches through a sequence in order until it encounters an element equal to its third argument:
1 | int* p = find(a, a + SIZE, 20); |
It stops as soon as the element is found and returns a pointer to that first occurrence. If it doesn?t find one, it returns a pointer one position past the end of the sequence (a+SIZE in this example). Therefore, find is said to make at most a number of comparisons equal to the number of elements in the sequence.
Sometimes the number of operations an algorithm takes cannot be measured with such precision. In such cases, the standard specifies the algorithm?s asymptotic complexity, which is a measure of how the algorithm behaves with large sequences compared to well-known formulas. A good example is the sort( ) algorithm, which the standard says takes ?approximately n log n comparisons on average? (n is the number of elements in the sequence) . Such complexity measures give a ?feel? for the cost of an algorithm and at least give a meaningful basis for comparing algorithms. As you?ll see, the find( ) member function for the set container has logarithmic complexity, which means that the cost of searching for an element in a set will, for large sets, be proportional to the logarithm of the number of elements. This is much smaller than the number of elements for large n, so it is always better to search a set by using its find( ) member function rather than by using the generic find( ) algorithm.
Function objects
As you study some of the examples earlier in this tutorial, you will probably notice the limited utility of the function gt15( ). What if you want to use a number other than 15 as a comparison threshold? You may need a gt20( ) or gt25( ) or others as well. Having to write a separate function for each such comparison has two distasteful difficulties:
1. You may have to write a lot of functions!
2. You must know all required values when you write your application code.
The second limitation means that you can?t use runtime values to govern your searches, which is downright unacceptable. Overcoming this difficulty requires a way to pass information to predicates at runtime. For example, you would need a greater-than function that you can initialize with an arbitrary comparison value. Unfortunately, you can?t pass that value as a function parameter, because unary predicates, such as our gt15( ), are applied to each value in a sequence individually and must therefore take only one parameter.
The way out of this dilemma is, as always, to create an abstraction. In this case, we need an abstraction that can act like a function as well as store state, without disturbing the number of function parameters it accepts when used. This abstraction is called a function object .
A function object is an instance of a class that overloads operator( ), the function call operator. This operator allows an object to be used with function call syntax. As with any other object, you can initialize it via its constructors. Here is a function object that can be used in place of gt15( ):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <iostream> using namespace std; class gt_n { int value; public: gt_n(int val) : value(val) {} bool operator()(int n) { return n > value; } }; int main() { gt_n f(4); cout << f(3) << endl; // Prints 0 (for false) cout << f(5) << endl; // Prints 1 (for true) } ///:~ |
The fixed value to compare against (4) is passed when the function object f is created. The expression f(3) is then evaluated by the compiler as the following function call:
f.operator()(3);
which returns the value of the expression 3 > value, which is false when value is 4, as it is in this example.
Since such comparisons apply to types other than int, it would make sense to define gt_n( ) as a class template. It turns out you don?t have to do it yourself, though?the standard library has already done it for you. The following descriptions of function objects should not only make that topic clear, but also give you a better understanding of how the generic algorithms work.
Classification of function objects
The standard C++ library classifies function objects based on the number of arguments that their operator( ) takes and the kind of value it returns. This classification is organized according to whether a function object?s operator( ) takes zero, one, or two arguments, as the following definitions illustrate.
Generator: A type of function object that takes no arguments and returns a value of an arbitrary type. A random number generator is an example of a generator. The standard library provides one generator, the function rand( ) declared in <cstdlib>, and has some algorithms, such as generate_n( ), which apply generators to a sequence.
Unary Function: A type of function object that takes a single argument of any type and returns a value that may be of a different type (which may be void).
Binary Function: A type of function object that takes two arguments of any two (possibly distinct) types and returns a value of any type (including void).
Unary Predicate: A Unary Function that returns a bool.
Binary Predicate: A Binary Function that returns a bool.
Strict Weak Ordering: A binary predicate that allows for a more general interpretation of ?equality.? Some of the standard containers consider two elements equivalent if neither is less than the other (using operator<( )). This is important when comparing floating-point values, and objects of other types where operator==( ) is unreliable or unavailable. This notion also applies if you want to sort a sequence of data records (structs) on a subset of the struct?s fields, that comparison scheme is considered a strict weak ordering because two records with equal keys are not really ?equal? as total objects, but they are equal as far as the comparison you?re using is concerned.
In addition, certain algorithms make assumptions about the operations available for the types of objects they process. We will use the following terms to indicate these assumptions:
LessThanComparable: A class that has a less-than operator<.
Assignable: A class that has a copy-assignment operator= for its own type.
EqualityComparable: A class that has an equivalence operator== for its own type.
Adaptable function objects
Standard function adapters such as bind1st( ) and bind2nd( ) make some assumptions about the function objects they process. To illustrate, consider the following expression from the last line of the earlier CountNotEqual.cpp program:
not1(bind1st(equal_to<int>(), 20))
The bind1st( ) adapter creates a unary function object of type binder1st, which simply stores an instance of equal_to<int> and the value 20. The binder1st::operator() function needs to know its argument type and its return type; otherwise, it will not have a valid declaration. The convention to solve this problem is to expect all function objects to provide nested type definitions for these types. For unary functions, the type names are argument_type and result_type; for binary function objects they are first_argument_type, second_argument_type, and result_type. Looking at the implementation of bind1st( ) and binder1st in the <functional> header reveals these expectations. First inspect bind1st( ), as it might appear in a typical library implementation:
1 2 3 4 5 6 7 | template <class Op, class T> binder1st<Op> bind1st(const Op& f, const T& val) { typedef typename Op::first_argument_type Arg1_t; return binder1st<Op>(f, Arg1_t(val)); } |
Note that the template parameter, Op, which represents the type of the binary function being adapted by bind1st( ), must have a nested type named first_argument_type. Now notice how binder1st uses the type names in Op in its declaration of its function call operator:
1 2 3 4 | // Inside the implementation for binder1st<Op>? typename Op::result_type operator()(const typename Op::second_argument_type& x) const; |
Function objects whose classes provide these type names are called adaptable function objects.
Since these names are expected of all standard function objects as well as of any function objects you create that you want to use with the function object adapters, the <functional> header provides two templates that define these types for you: unary_function and binary_function. You simply derive from these classes while filling in the argument types as template parameters. Suppose, for example, that we want to make the function object gt_n, defined earlier in this tutorial, adaptable. All we need to do is the following:
1 2 3 4 5 6 7 8 | class gt_n : public unary_function<int, bool> { int value; public: gt_n(int val) : value(val) {} bool operator()(int n) { return n > value; } }; |
All unary_function does is to provide the appropriate type definitions, which it infers from its template parameters as you can see in its definition:
1 2 3 4 5 | template <class Arg, class Result> struct unary_function { typedef Arg argument_type; typedef Result result_type; }; |
These types become accessible through gt_n because it derives publicly from unary_function. The binary_function template behaves in a similar manner.