String in C/C++ is not more than a array of characters. String encapsulates the functionality of traversing, replacing, copying of one character array to another.

One of the biggest time-wasters in C is using character arrays for string processing: keeping track of the difference between static quoted strings and arrays created on the stack and the heap, and the fact that sometimes you’re passing around a char* and sometimes you must copy the whole array.

Especially because string manipulation is so common, character arrays are a great source of misunderstandings and bugs. Despite this, creating string classes remained a common exercise for beginning C++ programmers for many years. The Standard C++ library string class solves the problem of character array manipulation once and for all, keeping track of memory even during assignments and copy-constructions. You simply don’t need to think about it.

Handling text is perhaps one of the oldest of all programming applications, so it’s not surprising that the C++ string draws heavily on the ideas and terminology that have long been used for this purpose in C and other languages. As you begin to acquaint yourself with C++ strings, this fact should be reassuring. No matter which programming idiom you choose, there are really only about three things you want to do with a string:

  • Create or modify the sequence of characters stored in the string.
  • Detect the presence or absence of elements within the string.
  • Translate between various schemes for representing string characters.

What’s in a string?

In C, a string is simply an array of characters that always includes a binary zero (often called the null terminator) as its final array element. There are significant differences between C++ strings and their C progenitors. First, and most important, C++ strings hide the physical representation of the sequence of characters they contain. You don’t have to be concerned at all about array dimensions or null terminators. A string also contains certain ‘housekeeping’ information about the size and storage location of its data. Specifically, a C++ string object knows its starting location in memory, its content, its length in characters, and the length in characters to which it can grow before the string object must resize its internal data buffer. C++ strings therefore greatly reduce the likelihood of making three of the most common and destructive C programming errors: overwriting array bounds, trying to access arrays through uninitialized or incorrectly valued pointers, and leaving pointers ‘dangling’ after an array ceases to occupy the storage that was once allocated to it.

The exact implementation of memory layout for the string class is not defined by the C++ Standard. This architecture is intended to be flexible enough to allow differing implementations by compiler vendors, yet guarantee predictable behavior for users. In particular, the exact conditions under which storage is allocated to hold data for a string object are not defined. String allocation rules were formulated to allow but not require a reference-counted implementation, but whether or not the implementation uses reference counting, the semantics must be the same. To put this a bit differently, in C, every char array occupies a unique physical region of memory. In C++, individual string objects may or may not occupy unique physical regions of memory, but if reference counting is used to avoid storing duplicate copies of data, the individual objects must look and act as though they do exclusively own unique regions of storage. For example: