C and C++ Programming Resources

Custom Search

Generic containers

Vector

The vector class template is intentionally made to look like a souped-up array, since it has array-style indexing, but also can expand dynamically. The vector class template is so fundamentally useful that it was introduced in a primitive way early in this book and was used regularly in previous examples. This section will give a more in-depth look at vector.

To achieve maximally fast indexing and iteration, the vector maintains its storage as a single contiguous array of objects. This is a critical point to observe in understanding the behavior of vector. It means that indexing and iteration are lightning-fast, being basically the same as indexing and iterating over an array of objects. But it also means that inserting an object anywhere but at the end (that is, appending) is not really an acceptable operation for a vector. In addition, when a vector runs out of preallocated storage, to maintain its contiguous array it must allocate a whole new (larger) chunk of storage elsewhere and copy the objects to the new storage. This approach produces a number of unpleasant side-effects.

Cost of overflowing allocated storage

A vector starts by grabbing a block of storage, as if it?s taking a guess at how many objects you plan to put in it. As long as you don?t try to put in more objects than can be held in the initial block of storage, everything is rapid and efficient. (If you do know how many objects to expect, you can preallocate storage using reserve( ).) But eventually you will put in one too many objects, and, unbeknownst to you, the vector responds by:

1. Allocating a new, bigger piece of storage
2. Copying all the objects from the old storage to the new (using the copy-constructor)
3. Destroying all the old objects (the destructor is called for each one)
4. Releasing the old memory
For complex objects, this copy-construction and destruction can end up being expensive if you overfill your vector a lot. To see what happens when you?re filling a vector, here is the Noisy class mentioned earlier that prints out information about its creations, destructions, assignments, and copy-constructions:

// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>

class Noisy {
static long create, assign, copycons, destroy;
long id;
public:
Noisy() : id(create++) {
std::cout << "d[" << id << "]" << std::endl;
}
Noisy(const Noisy& rv) : id(rv.id) {
std::cout << "c[" << id << "]" << std::endl;
copycons++;
}
Noisy& operator=(const Noisy& rv) {
std::cout << "(" << id << ")=[" <<
rv.id << "]" << std::endl;
id = rv.id;
assign++;
return *this;
}
friend bool
operator<(const Noisy& lv, const Noisy& rv) {
return lv.id < rv.id;
}
friend bool
operator==(const Noisy& lv, const Noisy& rv) {
return lv.id == rv.id;
}
~Noisy() {
std::cout << "~[" << id << "]" << std::endl;
destroy++;
}
friend std::ostream&
operator<<(std::ostream& os, const Noisy& n) {
return os << n.id;
}
friend class NoisyReport;
};

struct NoisyGen {
Noisy operator()() { return Noisy(); }
};

// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
static NoisyReport nr;
NoisyReport() {} // Private constructor
public:
~NoisyReport() {
std::cout << "\n-------------------\n"
<< "Noisy creations: " << Noisy::create
<< "\nCopy-Constructions: "
<< Noisy::copycons
<< "\nAssignments: " << Noisy::assign
<< "\nDestructions: " << Noisy::destroy
<< std::endl;
}
};

// Because of the following definitions, this file
// can only be used in simple test situations. Move
// them to a .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;
#endif // NOISY_H ///:~

Each Noisy object has its own identifier, and static variables keep track of all the creations, assignments (using operator=), copy-constructions, and destructions. The id is initialized using the create counter inside the default constructor; the copy-constructor and assignment operator take their id values from the rvalue. Of course, with operator= the lvalue is already an initialized object, so the old value of id is printed before it is overwritten with the id from the rvalue.

To support certain operations such as sorting and searching (which are used implicitly by some of the containers), Noisy must have an operator< and operator==. These simply compare the id values. The ostream inserter follows the standard form and simply prints the id.

NoisyGen produces a function object (since it has an operator( )) that is used to automatically generate Noisy objects during testing.
NoisyReport is a singleton object, because it is responsible for printing out the results at program termination. It has a private constructor, therefore, so no additional NoisyReport objects can be created, and it has a single static instance of NoisyReport called nr. The only executable statements are in the destructor, which is called as the program exits and the static destructors are called; this destructor prints out the statistics captured by the static variables in Noisy.

The one snag to this header file is the inclusion of the definitions for the statics at the end. If you include this header in more than one place in your project, you?ll get multiple-definition errors at link time. Of course, you can put the static definitions in a separate cpp file and link it in, but that is less convenient, and since Noisy is just intended for quick-and-dirty experiments, the header file should be reasonable for most situations.

Using Noisy.h, the following program will show the behaviors that occur when a vector overflows its currently allocated storage:

//{-bor}
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
#include <cstdlib>
#include <iostream>
#include <string>
#include <vector>
#include "Noisy.h"
using namespace std;

int main(int argc, char* argv[]) {
int size = 1000;
if(argc >= 2) size = atoi(argv[1]);
vector<Noisy> vn;
Noisy n;
for(int i = 0; i < size; i++)
vn.push_back(n);
cout << "\n cleaning up \n";
} ///:~

You can use the default value of 1000, or you can use your own value by putting it on the command line.

When you run this program, you?ll see a single default constructor call (for n), then a lot of copy-constructor calls, then some destructor calls, then some more copy-constructor calls, and so on. When the vector runs out of space in the linear array of bytes it has allocated, it must (to maintain all the objects in a linear array, which is an essential part of its job) get a bigger piece of storage and move everything over, copying first and then destroying the old objects. You can imagine that if you store a lot of large and complex objects, this process could rapidly become prohibitive.

There are two solutions to this problem. The nicest one requires that you know beforehand how many objects you?re going to make. In that case, you can use reserve( ) to tell the vector how much storage to preallocate, thus eliminating all the copies and destructions and making everything very fast (especially random access to the objects with operator[ ]). Note that the use of reserve( ) is different from using the vector constructor with an integral first argument; the latter initializes each element using the default copy-constructor.

However, in the more general case you won?t know how many objects you?ll need. If vector reallocations are slowing things down, you can change sequence containers. You could use a list, but as you?ll see, the deque allows speedy insertions at either end of the sequence and never needs to copy or destroy objects as it expands its storage. The deque also allows random access with operator[ ], but it?s not quite as fast as vector?s operator[ ]. So if you?re creating all your objects in one part of the program and randomly accessing them in another, you may find yourself filling a deque and then creating a vector from the deque and using the vector for rapid indexing. Of course, you don?t want to program this way habitually; just be aware of these issues (avoid premature optimization).

There is a darker side to vector?s reallocation of memory, however. Because vector keeps its objects in a nice, neat array (allowing, for one thing, maximally fast random access), the iterators used by vector are generally just pointers. This is a good thing?of all the sequence containers, these pointers allow the fastest selection and manipulation. However, consider what happens when you?re holding onto an iterator (that is, a pointer) and then you add the one additional object that causes the vector to reallocate storage and move it elsewhere. Your pointer is now pointing off into nowhere:

// Invalidating an iterator
#include <iterator>
#include <iostream>
#include <vector>
using namespace std;

int main() {
vector<int> vi(10, 0);
ostream_iterator<int> out(cout, " ");
vector<int>::iterator i = vi.begin();
*i = 47;
copy(vi.begin(), vi.end(), out);
cout << endl;
// Force it to move memory (could also just add
// enough objects):
vi.resize(vi.capacity() + 1);
// Now i points to wrong memory:
*i = 48; // Access violation
copy(vi.begin(), vi.end(), out); // No change to vi[0]
} ///:~

This illustrates the concept of iterator invalidation. Certain operations cause internal changes to a container?s underlying data, so any iterators in effect before such changes may no longer be valid afterward. If your program is breaking mysteriously, look for places where you hold onto an iterator while adding more objects to a vector. You?ll need to get a new iterator after adding elements or use operator[ ] instead for element selections. If you combine this observation with the awareness of the potential expense for adding new objects to a vector, you may conclude that the safest way to use one is to fill it up all at once (ideally, knowing first how many objects you?ll need) and then just use it (without adding more objects) elsewhere in the program. This is the way vector has been used in the book up to this point. The standard C++ library documents which container operations invalidate iterators.

You may observe that using vector as the ?basic? container. This is a fundamental issue in containers and in data structures in general: the ?best? choice varies according to the way the container is used. The reason vector has been the ?best? choice up until now is that it looks a lot like an array and was thus familiar and easy for you to adopt. But from now on it?s also worth thinking about other issues when choosing containers.

Inserting and erasing elements
The vector is most efficient if:

5. You reserve( ) the correct amount of storage at the beginning so the vector never has to reallocate.
6. You only add and remove elements from the back end.
It is possible to insert and erase elements from the middle of a vector using an iterator, but the following program demonstrates what a bad idea this is:

// Erasing an element from a vector
//{-bor}
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
#include "Noisy.h"
using namespace std;

int main() {
vector<Noisy> v;
v.reserve(11);
cout << "11 spaces have been reserved" << endl;
generate_n(back_inserter(v), 10, NoisyGen());
ostream_iterator<Noisy> out(cout, " ");
cout << endl;
copy(v.begin(), v.end(), out);
cout << "Inserting an element:" << endl;
vector<Noisy>::iterator it =
v.begin() + v.size() / 2; // Middle
v.insert(it, Noisy());
cout << endl;
copy(v.begin(), v.end(), out);
cout << "\nErasing an element:" << endl;
// Cannot use the previous value of it:
it = v.begin() + v.size() / 2;
v.erase(it);
cout << endl;
copy(v.begin(), v.end(), out);
cout << endl;
} ///:~

When you run the program, you?ll see that the call to reserve( ) really does only allocate storage?no constructors are called. The generate_n( ) call is busy: each call to NoisyGen::operator( ) results in a construction, a copy-construction (into the vector), and a destruction of the temporary. But when an object is inserted into the vector in the middle, it must shove everything down to maintain the linear array, and, since there is enough space, it does this with the assignment operator. (If the argument of reserve( ) is 10 instead of 11, it would have to reallocate storage.) When an object is erased from the vector, the assignment operator is once again used to move everything up to cover the place that is being erased. (Notice that this requires that the assignment operator properly clean up the lvalue.) Last, the object on the end of the array is deleted.

You can imagine how enormous the overhead can become if objects are inserted and removed from the middle of a vector if the number of elements is large and the objects are complicated. It?s obviously a practice to avoid.

Pages: [Page - 1] [Page - 2] [Page - 3] [Page - 4] [Page - 5] [Page - 6] [Page - 7] [Page - 8] [Page - 9] [Page - 10] [Page - 11] [Page - 12] [Page - 13]

Tags: , ,

There are 1 Comment to this post. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response or TrackBack from your own site.

  • Jimson says:

    Sorry to disappoint you, but this is actually bull. STL containers are not meant to be used as base classes. There are no virtual functions to overwrite, so you can’t alter the functionality. Data members are private, so no luck there either – these two points are make container inheritance questionable, but the fact that no virtual destructor is offered makes it just plain wrong. Deleting a derived class via a pointer to the container base class will result in undefined behaviour.

    If you want to add data members, just use composition. If you want to add functionality, write a new algorithm, but don’t inherit from STL containers.

    You might want to look this issue up, the internet is actually full of advice on this, as many novice STL users make this mistake…


Leave a Reply

You must be logged in to post a comment.