Btukfyl

Question

I'm trying to iterate over the words of a string.

The string can be assumed to be composed of words separated by whitespace.

Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.

The best solution I have right now is:

#include <iostream>

#include <sstream>

#include <string>



using namespace std;



int main()

{

    string s = "Somewhere down the road";

    istringstream iss(s);



    do

    {

        string subs;

        iss >> subs;

        cout << "Substring: " << subs << endl;

    } while (iss);

}

Is there a more elegant way to do this?

Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;) — Oct 25 '08 at 9:04
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; } — Sep 29 '09 at 15:47
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e. string sub; while (iss >> sub) cout << "Substring: " << sub << 'n'; — Apr 11 '12 at 2:24
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split — Oct 31 '13 at 0:23
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability. — Mar 31 '17 at 13:22

score 1225 · Accepted Answer · 2016-06-09 17:47:05Z

up vote
1225
down vote

accepted

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>

#include <string>

#include <sstream>

#include <algorithm>

#include <iterator>



int main() {

    using namespace std;

    string sentence = "And I feel fine...";

    istringstream iss(sentence);

    copy(istream_iterator<string>(iss),

         istream_iterator<string>(),

         ostream_iterator<string>(cout, "n"));

}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;

copy(istream_iterator<string>(iss),

     istream_iterator<string>(),

     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},

                      istream_iterator<string>{}};

edited Jun 9 '16 at 17:47

community wiki

8 revs, 8 users 71%
Zunino

145

Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49

14

@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37

728

This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57

34

Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30

35

@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08

|
show 19 more comments

score 2332 · Answer 2 · 2018-02-28 23:32:54Z

up vote
2332
down vote

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>

#include <sstream>

#include <vector>

#include <iterator>



template<typename Out>

void split(const std::string &s, char delim, Out result) {

    std::stringstream ss(s);

    std::string item;

    while (std::getline(ss, item, delim)) {

        *(result++) = item;

    }

}



std::vector<std::string> split(const std::string &s, char delim) {

    std::vector<std::string> elems;

    split(s, delim, std::back_inserter(elems));

    return elems;

}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');

edited Feb 28 at 23:32

community wiki

20 revs, 15 users 43%
Evan Teran

73

In order to avoid it skipping empty tokens, do an empty() check: if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33

11

How about the delim contains two chars as ->?
– herohuyongtao
Dec 26 '13 at 8:15

7

@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11

4

@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this: f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like.
– Evan Teran
Jan 25 '14 at 17:50

6

Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04

|
show 20 more comments

score 805 · Answer 3 · 2015-08-03 23:20:33Z

up vote
805
down vote

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>

std::vector<std::string> strs;

boost::split(strs, "string to split", boost::is_any_of("t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

edited Aug 3 '15 at 23:20

community wiki

3 revs, 3 users 67%
ididak

32

Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51

40

And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12

78

strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30

28

@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23

28

as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19

|
show 15 more comments

score 333 · Answer 4 · 2018-05-19 10:01:31Z

up vote
333
down vote

#include <vector>

#include <string>

#include <sstream>



int main()

{

    std::string str("Split me by whitespaces");

    std::string buf;                 // Have a buffer string

    std::stringstream ss(str);       // Insert the string into a stream



    std::vector<std::string> tokens; // Create vector to hold our words



    while (ss >> buf)

        tokens.push_back(buf);



    return 0;

}

edited May 19 at 10:01

community wiki

2 revs, 2 users 82%
kev

52

too bad it only splits on spaces ' '...
– Offirmo
Jan 31 '13 at 18:47

You can also split on other delimiters if you use getline in the while condition e.g. to split by commas, use while(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20

add a comment |

score 172 · Answer 5 · 2016-09-19 13:00:24Z

For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT >

void tokenize(const std::string& str, ContainerT& tokens,

              const std::string& delimiters = " ", bool trimEmpty = false)

{

   std::string::size_type pos, lastPos = 0, length = str.length();



   using value_type = typename ContainerT::value_type;

   using size_type  = typename ContainerT::size_type;



   while(lastPos < length + 1)

   {

      pos = str.find_first_of(delimiters, lastPos);

      if(pos == std::string::npos)

      {

         pos = length;

      }



      if(pos != lastPos || !trimEmpty)

         tokens.push_back(value_type(str.data()+lastPos,

               (size_type)pos-lastPos ));



      lastPos = pos + 1;

   }

}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.

It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.

I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames: typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types accordingly. — Nov 28 '11 at 21:41
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely. — Aug 17 '12 at 11:51
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector ); — Aug 17 '12 at 14:23
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible. — Nov 29 '12 at 14:50
That's the correct output for when trimEmpty = true. Keep in mind that "abo" is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think str.find_first_of should change to str.find_first, but I could be wrong... can't test) — Aug 28 '15 at 15:24

score 154 · Answer 6 · 2016-10-04 22:33:35Z

Here's another solution. It's compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {

  std::vector<std::string> tokens;

  std::size_t start = 0, end = 0;

  while ((end = text.find(sep, start)) != std::string::npos) {

    tokens.push_back(text.substr(start, end - start));

    start = end + 1;

  }

  tokens.push_back(text.substr(start));

  return tokens;

}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {

    std::vector<std::string> tokens;

    std::size_t start = 0, end = 0;

    while ((end = text.find(sep, start)) != std::string::npos) {

        if (end != start) {

          tokens.push_back(text.substr(start, end - start));

        }

        start = end + 1;

    }

    if (end != start) {

       tokens.push_back(text.substr(start));

    }

    return tokens;

}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)

{

    std::vector<std::string> tokens;

    std::size_t start = text.find_first_not_of(delims), end = 0;



    while((end = text.find_first_of(delims, start)) != std::string::npos)

    {

        tokens.push_back(text.substr(start, end - start));

        start = text.find_first_not_of(delims, end);

    }

    if(start != std::string::npos)

        tokens.push_back(text.substr(start));



    return tokens;

}

The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter. — Jan 19 '12 at 2:25
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed. — Feb 6 '12 at 18:56
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics. — Jun 27 '13 at 1:20
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct) — Aug 17 '13 at 11:54
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days? — Aug 11 '15 at 15:15

score 110 · Answer 7 · 2018-04-12 11:37:30Z

up vote
110
down vote

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";

string word;



istringstream iss(line, istringstream::in);



while( iss >> word )     

{

    // Do something on `word` here...

}

edited Apr 12 at 11:37

community wiki

4 revs, 2 users 86%
gnomed

Is it possible to declare word as a char?
– abatishchev
Jun 26 '10 at 17:23

Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18

9

if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try: stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03

add a comment |

score 77 · Answer 8 · 2017-05-23 12:34:53Z

This is similar to Stack Overflow question How do I tokenize a string in C++?.

#include <iostream>

#include <string>

#include <boost/tokenizer.hpp>



using namespace std;

using namespace boost;



int main(int argc, char** argv)

{

    string text = "token  testtstring";



    char_separator<char> sep(" t");

    tokenizer<char_separator<char>> tokens(text, sep);

    for (const string& t : tokens)

    {

        cout << t << "." << endl;

    }

}

Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token? — Apr 9 at 19:47

score 66 · Answer 9 · 2017-01-08 04:33:22Z

I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.

#include <ostream>

#include <string>

#include <vector>

#include <algorithm>

#include <iterator>

using namespace std;



vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {

    vector<string> result;

    if (delim.empty()) {

        result.push_back(s);

        return result;

    }

    string::const_iterator substart = s.begin(), subend;

    while (true) {

        subend = search(substart, s.end(), delim.begin(), delim.end());

        string temp(substart, subend);

        if (keep_empty || !temp.empty()) {

            result.push_back(temp);

        }

        if (subend == s.end()) {

            break;

        }

        substart = subend + delim.size();

    }

    return result;

}



int main() {

    const vector<string> words = split("So close no matter how far", " ");

    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));

}

Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.

Finally a solution that is handling empty tokens correctly at both sides of the string — Sep 9 '15 at 20:38

score 50 · Answer 10 · 2018-04-12 11:35:55Z

The STL does not have such a method available already.

However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):

void Tokenize(const string& str,

              vector<string>& tokens,

              const string& delimiters = " ")

{

    // Skip delimiters at beginning.

    string::size_type lastPos = str.find_first_not_of(delimiters, 0);

    // Find first "non-delimiter".

    string::size_type pos     = str.find_first_of(delimiters, lastPos);



    while (string::npos != pos || string::npos != lastPos)

    {

        // Found a token, add it to the vector.

        tokens.push_back(str.substr(lastPos, pos - lastPos));

        // Skip delimiters.  Note the "not_of"

        lastPos = str.find_first_not_of(delimiters, pos);

        // Find next "non-delimiter"

        pos = str.find_first_of(delimiters, lastPos);

    }

}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution. — Oct 25 '08 at 9:16
@Nelson LaQuet: Let me guess: Because strtok is not reentrant? — Oct 25 '08 at 9:52
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string. — Oct 25 '08 at 18:19
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons. — Aug 24 '09 at 9:08
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize). — Apr 12 '15 at 23:55

score 39 · Answer 11 · 2017-05-23 22:17:34Z

up vote
39
down vote

Here is a split function that:

is generic

uses standard C++ (no boost)

accepts multiple delimiters

ignores empty tokens (can easily be changed)

template<typename T>

vector<T> 

split(const T & str, const T & delimiters) {

    vector<T> v;

    typename T::size_type start = 0;

    auto pos = str.find_first_of(delimiters, start);

    while(pos != T::npos) {

        if(pos != start) // ignore empty tokens

            v.emplace_back(str, start, pos - start);

        start = pos + 1;

        pos = str.find_first_of(delimiters, start);

    }

    if(start < str.length()) // ignore trailing delimiter

        v.emplace_back(str, start, str.length() - start); // add what's left of the string

    return v;

}

Example usage:

    vector<string> v = split<string>("Hello, there; World", ";,");

    vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");

edited May 23 '17 at 22:17

community wiki

6 revs
Marco M.

You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20

@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57

2

@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56

3

This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50

@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03

|
show 2 more comments

score 33 · Answer 12 · 2013-01-15 00:12:16Z

I have a 2 lines solution to this problem:

char sep = ' ';

std::string s="1 This is an example";



for(size_t p=0, q=0; p!=s.npos; p=q)

  std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

Then instead of printing you can put it in a vector.

score 33 · Answer 13 · 2013-09-11 08:11:28Z

Yet another flexible and fast way

template<typename Operator>

void tokenize(Operator& op, const char* input, const char* delimiters) {

  const char* s = input;

  const char* e = s;

  while (*e != 0) {

    e = s;

    while (*e != 0 && strchr(delimiters, *e) == 0) ++e;

    if (e - s > 0) {

      op(s, e - s);

    }

    s = e + 1;

  }

}

To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :

template<class ContainerType>

class Appender {

public:

  Appender(ContainerType& container) : container_(container) {;}

  void operator() (const char* s, unsigned length) { 

    container_.push_back(std::string(s,length));

  }

private:

  ContainerType& container_;

};



std::vector<std::string> strVector;

Appender v(strVector);

tokenize(v, "A number of words to be tokenized", " t");

That's it! And that's just one way to use the tokenizer, like how to just
count words:

class WordCounter {

public:

  WordCounter() : noOfWords(0) {}

  void operator() (const char*, unsigned) {

    ++noOfWords;

  }

  unsigned noOfWords;

};



WordCounter wc;

tokenize(wc, "A number of words to be counted", " t"); 

ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Nice. Regarding Appender note "Why shouldn't we inherit a class from STL classes?" — Sep 10 '13 at 12:07

score 29 · Answer 14 · 2015-06-24 09:31:50Z

Here's a simple solution that uses only the standard regex library

#include <regex>

#include <string>

#include <vector>



std::vector<string> Tokenize( const string str, const std::regex regex )

{

    using namespace std;



    std::vector<string> result;



    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );

    sregex_token_iterator reg_end;



    for ( ; it != reg_end; ++it ) {

        if ( !it->str().empty() ) //token could be empty:check

            result.emplace_back( it->str() );

    }



    return result;

}

The regex argument allows checking for multiple arguments (spaces, commas, etc.)

I usually only check to split on spaces and commas, so I also have this default function:

std::vector<string> TokenizeDefault( const string str )

{

    using namespace std;



    regex re( "[\s,]+" );



    return Tokenize( str, re );

}

The "[\s,]+" checks for spaces (\s) and commas (,).

Note, if you want to split wstring instead of string,

change all std::regex to std::wregex

change all sregex_token_iterator to wsregex_token_iterator

Note, you might also want to take the string argument by reference, depending on your compiler.

This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1 — Aug 19 '14 at 12:27
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx. — Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use R"([s,]+)". — Feb 17 at 17:42

score 24 · Answer 15 · 2012-02-09 09:32:17Z

If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.

Example code including convenient template:

#include <iostream>

#include <vector>

#include <boost/algorithm/string.hpp>



template<typename _OutputIterator>

inline void split(

    const std::string& str, 

    const std::string& delim, 

    _OutputIterator result)

{

    using namespace boost::algorithm;

    typedef split_iterator<std::string::const_iterator> It;



    for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));

            iter!=It();

            ++iter)

    {

        *(result++) = boost::copy_range<std::string>(*iter);

    }

}



int main(int argc, char* argv)

{

    using namespace std;



    vector<string> splitted;

    split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));



    // or directly to console, for example

    split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));

    return 0;

}

score 23 · Answer 16 · 2018-04-12 11:42:30Z

Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().

Here's an example:

#include <iostream>

#include <string>



int main()

{

    std::string s("Somewhere down the road");

    std::string::size_type prev_pos = 0, pos = 0;



    while( (pos = s.find(' ', pos)) != std::string::npos )

    {

        std::string substring( s.substr(prev_pos, pos-prev_pos) );



        std::cout << substring << 'n';



        prev_pos = ++pos;

    }



    std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word

    std::cout << substring << 'n';



    return 0;

}

This only works for single character delimiters. A simple change lets it work with multicharacter: prev_pos = pos += delimiter.length(); — Feb 5 '16 at 14:48

score 18 · Answer 17 · 2014-05-02 14:49:30Z

up vote
18
down vote

There is a function named strtok.

#include<string>

using namespace std;



vector<string> split(char* str,const char* delim)

{

    char* saveptr;

    char* token = strtok_r(str,delim,&saveptr);



    vector<string> result;



    while(token != NULL)

    {

        result.push_back(token);

        token = strtok_r(NULL,delim,&saveptr);

    }

    return result;

}

edited May 2 '14 at 14:49

community wiki

3 revs, 2 users 91%
Pratik Deoghare

3

strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07

12

Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls strtok when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27

1

as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17

4

strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04

Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50

|
show 2 more comments

score 17 · Answer 18 · 2012-10-29 16:15:47Z

Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)

#include <regex.h>

#include <string.h>

#include <vector.h>



using namespace std;



vector<string> split(string s){

    regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)

    regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );

    regex_iterator<string::iterator> rend; //iterators to iterate thru words

    vector<string> result<regex_iterator>(rit, rend);

    return result;  //iterates through the matches to fill the vector

}

Similar responses with maybe better regex approach: here, and here. — Dec 5 '14 at 23:25

score 15 · Answer 19 · 2015-06-22 17:02:21Z

up vote
15
down vote

The stringstream can be convenient if you need to parse the string by non-space symbols:

string s = "Name:JAck; Spouse:Susan; ...";

string dummy, name, spouse;



istringstream iss(s);

getline(iss, dummy, ':');

getline(iss, name, ';');

getline(iss, dummy, ':');

getline(iss, spouse, ';')

edited Jun 22 '15 at 17:02

community wiki

2 revs, 2 users 95%
lukmac

That's a good working.
– spritecodej
Jan 11 at 6:20

add a comment |

score 14 · Answer 20 · 2011-05-22 23:02:42Z

So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)

{

    std::ostringstream word;

    for (size_t n = 0; n < input.size(); ++n)

    {

        if (std::string::npos == separators.find(input[n]))

            word << input[n];

        else

        {

            if (!word.str().empty() || !remove_empty)

                lst.push_back(word.str());

            word.str("");

        }

    }

    if (!word.str().empty() || !remove_empty)

        lst.push_back(word.str());

}

A good point is that in separators you can pass more than one character.

score 13 · Answer 21 · 2014-01-07 20:28:03Z

I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.

#include <iostream>

#include <vector>

#include <string>

#include <strtk.hpp>



const char *whitespace  = " trnf";

const char *whitespace_and_punctuation  = " trnf;,=";



int main()

{

    {   // normal parsing of a string into a vector of strings

        std::string s("Somewhere down the road");

        std::vector<std::string> result;

        if( strtk::parse( s, whitespace, result ) )

        {

            for(size_t i = 0; i < result.size(); ++i )

                std::cout << result[i] << std::endl;

        }

    }



    {  // parsing a string into a vector of floats with other separators

        // besides spaces



        std::string s("3.0, 3.14; 4.0");

        std::vector<float> values;

        if( strtk::parse( s, whitespace_and_punctuation, values ) )

        {

            for(size_t i = 0; i < values.size(); ++i )

                std::cout << values[i] << std::endl;

        }

    }



    {  // parsing a string into specific variables



        std::string s("angle = 45; radius = 9.9");

        std::string w1, w2;

        float v1, v2;

        if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )

        {

            std::cout << "word " << w1 << ", value " << v1 << std::endl;

            std::cout << "word " << w2 << ", value " << v2 << std::endl;

        }

    }



    return 0;

}

The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.

score 13 · Answer 22 · 2016-07-14 20:17:10Z

Short and elegant

#include <vector>

#include <string>

using namespace std;



vector<string> split(string data, string token)

{

    vector<string> output;

    size_t pos = string::npos; // size_t to avoid improbable overflow

    do

    {

        pos = data.find(token);

        output.push_back(data.substr(0, pos));

        if (string::npos != pos)

            data = data.substr(pos + token.size());

    } while (string::npos != pos);

    return output;

}

can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)

using:

auto a = split("this!!is!!!example!string", "!!");

output:

this

is

!example!string

I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string. — Aug 1 '16 at 15:30

score 11 · Answer 23 · 2017-02-19 17:47:57Z

I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.

I'm sure there's improvements that can be made to even further improve its elegance and please do by all means

StringSplitter.hpp:

#include <vector>

#include <iostream>

#include <string.h>



using namespace std;



class StringSplit

{

private:

    void copy_fragment(char*, char*, char*);

    void copy_fragment(char*, char*, char);

    bool match_fragment(char*, char*, int);

    int untilnextdelim(char*, char);

    int untilnextdelim(char*, char*);

    void assimilate(char*, char);

    void assimilate(char*, char*);

    bool string_contains(char*, char*);

    long calc_string_size(char*);

    void copy_string(char*, char*);



public:

    vector<char*> split_cstr(char);

    vector<char*> split_cstr(char*);

    vector<string> split_string(char);

    vector<string> split_string(char*);

    char* String;

    bool do_string;

    bool keep_empty;

    vector<char*> Container;

    vector<string> ContainerS;



    StringSplit(char * in)

    {

        String = in;

    }



    StringSplit(string in)

    {

        size_t len = calc_string_size((char*)in.c_str());

        String = new char[len + 1];

        memset(String, 0, len + 1);

        copy_string(String, (char*)in.c_str());

        do_string = true;

    }



    ~StringSplit()

    {

        for (int i = 0; i < Container.size(); i++)

        {

            if (Container[i] != NULL)

            {

                delete Container[i];

            }

        }

        if (do_string)

        {

            delete String;

        }

    }

};

StringSplitter.cpp:

#include <string.h>

#include <iostream>

#include <vector>

#include "StringSplit.hpp"



using namespace std;



void StringSplit::assimilate(char*src, char delim)

{

    int until = untilnextdelim(src, delim);

    if (until > 0)

    {

        char * temp = new char[until + 1];

        memset(temp, 0, until + 1);

        copy_fragment(temp, src, delim);

        if (keep_empty || *temp != 0)

        {

            if (!do_string)

            {

                Container.push_back(temp);

            }

            else

            {

                string x = temp;

                ContainerS.push_back(x);

            }



        }

        else

        {

            delete temp;

        }

    }

}



void StringSplit::assimilate(char*src, char* delim)

{

    int until = untilnextdelim(src, delim);

    if (until > 0)

    {

        char * temp = new char[until + 1];

        memset(temp, 0, until + 1);

        copy_fragment(temp, src, delim);

        if (keep_empty || *temp != 0)

        {

            if (!do_string)

            {

                Container.push_back(temp);

            }

            else

            {

                string x = temp;

                ContainerS.push_back(x);

            }

        }

        else

        {

            delete temp;

        }

    }

}



long StringSplit::calc_string_size(char* _in)

{

    long i = 0;

    while (*_in++)

    {

        i++;

    }

    return i;

}



bool StringSplit::string_contains(char* haystack, char* needle)

{

    size_t len = calc_string_size(needle);

    size_t lenh = calc_string_size(haystack);

    while (lenh--)

    {

        if (match_fragment(haystack + lenh, needle, len))

        {

            return true;

        }

    }

    return false;

}



bool StringSplit::match_fragment(char* _src, char* cmp, int len)

{

    while (len--)

    {

        if (*(_src + len) != *(cmp + len))

        {

            return false;

        }

    }

    return true;

}



int StringSplit::untilnextdelim(char* _in, char delim)

{

    size_t len = calc_string_size(_in);

    if (*_in == delim)

    {

        _in += 1;

        return len - 1;

    }



    int c = 0;

    while (*(_in + c) != delim && c < len)

    {

        c++;

    }



    return c;

}



int StringSplit::untilnextdelim(char* _in, char* delim)

{

    int s = calc_string_size(delim);

    int c = 1 + s;



    if (!string_contains(_in, delim))

    {

        return calc_string_size(_in);

    }

    else if (match_fragment(_in, delim, s))

    {

        _in += s;

        return calc_string_size(_in);

    }



    while (!match_fragment(_in + c, delim, s))

    {

        c++;

    }



    return c;

}



void StringSplit::copy_fragment(char* dest, char* src, char delim)

{

    if (*src == delim)

    {

        src++;

    }



    int c = 0;

    while (*(src + c) != delim && *(src + c))

    {

        *(dest + c) = *(src + c);

        c++;

    }

    *(dest + c) = 0;

}



void StringSplit::copy_string(char* dest, char* src)

{

    int i = 0;

    while (*(src + i))

    {

        *(dest + i) = *(src + i);

        i++;

    }

}



void StringSplit::copy_fragment(char* dest, char* src, char* delim)

{

    size_t len = calc_string_size(delim);

    size_t lens = calc_string_size(src);



    if (match_fragment(src, delim, len))

    {

        src += len;

        lens -= len;

    }



    int c = 0;

    while (!match_fragment(src + c, delim, len) && (c < lens))

    {

        *(dest + c) = *(src + c);

        c++;

    }

    *(dest + c) = 0;

}



vector<char*> StringSplit::split_cstr(char Delimiter)

{

    int i = 0;

    while (*String)

    {

        if (*String != Delimiter && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (*String == Delimiter)

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return Container;

}



vector<string> StringSplit::split_string(char Delimiter)

{

    do_string = true;



    int i = 0;

    while (*String)

    {

        if (*String != Delimiter && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (*String == Delimiter)

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return ContainerS;

}



vector<char*> StringSplit::split_cstr(char* Delimiter)

{

    int i = 0;

    size_t LenDelim = calc_string_size(Delimiter);



    while(*String)

    {

        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (match_fragment(String, Delimiter, LenDelim))

        {

            assimilate(String,Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return Container;

}



vector<string> StringSplit::split_string(char* Delimiter)

{

    do_string = true;

    int i = 0;

    size_t LenDelim = calc_string_size(Delimiter);



    while (*String)

    {

        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (match_fragment(String, Delimiter, LenDelim))

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return ContainerS;

}

Examples:

int main(int argc, char*argv)

{

    StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";

    vector<char*> Split = ss.split_cstr(":CUT:");



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}

Will output:

This

is

an

example

cstring

int main(int argc, char*argv)

{

    StringSplit ss = "This:is:an:example:cstring";

    vector<char*> Split = ss.split_cstr(':');



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}



int main(int argc, char*argv)

{

    string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";

    StringSplit ss = mystring;

    vector<string> Split = ss.split_string("[SPLIT]");



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}



int main(int argc, char*argv)

{

    string mystring = "This|is|an|example|string";

    StringSplit ss = mystring;

    vector<string> Split = ss.split_string('|');



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}

To keep empty entries (by default empties will be excluded):

StringSplit ss = mystring;

ss.keep_empty = true;

vector<string> Split = ss.split_string(":DELIM:");

The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:

String Split = 

    "Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);



foreach(String X in Split)

{

    Console.Write(X);

}

I hope someone else can find this as useful as I do.

score 10 · Answer 24 · 2012-12-19 22:05:24Z

up vote
10
down vote

What about this:

#include <string>

#include <vector>



using namespace std;



vector<string> split(string str, const char delim) {

    vector<string> v;

    string tmp;



    for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {

        if(*i != delim && i != str.end()) {

            tmp += *i; 

        } else {

            v.push_back(tmp);

            tmp = ""; 

        }   

    }   



    return v;

}

edited Dec 19 '12 at 22:05

community wiki

3 revs, 3 users 89%
gibbz

This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09

add a comment |

score 9 · Answer 25 · 2010-01-08 03:27:24Z

Here's another way of doing it..

void split_string(string text,vector<string>& words)

{

  int i=0;

  char ch;

  string word;



  while(ch=text[i++])

  {

    if (isspace(ch))

    {

      if (!word.empty())

      {

        words.push_back(word);

      }

      word = "";

    }

    else

    {

      word += ch;

    }

  }

  if (!word.empty())

  {

    words.push_back(word);

  }

}

score 9 · Answer 26 · 2011-06-12 09:25:38Z

I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.

#include <iostream>

#include <string>

#include <boost/regex.hpp>



int main() {

    std::string line("A:::line::to:split");

    const boost::regex re(":+"); // one or more colons



    // -1 means find inverse matches aka split

    boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);

    boost::sregex_token_iterator end;



    for (; tokens != end; ++tokens)

        std::cout << *tokens << std::endl;

}

score 9 · Answer 27 · 2011-09-14 09:47:57Z

Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.

#include <string>

#include <list>

#include <locale> // std::isupper



template<class String>

const std::list<String> split_camel_case_string(const String &s)

{

    std::list<String> R;

    String w;



    for (String::const_iterator i = s.begin(); i < s.end(); ++i) {  {

        if (std::isupper(*i)) {

            if (w.length()) {

                R.push_back(w);

                w.clear();

            }

        }

        w += *i;

    }



    if (w.length())

        R.push_back(w);

    return R;

}

For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".

Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.

There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway: std::isupper could be passed as argument, not std::upper. Second put a typename before the String::const_iterator. — Apr 28 '15 at 7:20

score 9 · Answer 28 · 2017-12-09 21:14:38Z

up vote
9
down vote

This answer takes the string and puts it into a vector of strings. It uses the boost library.

#include <boost/algorithm/string.hpp>

std::vector<std::string> strs;

boost::split(strs, "string to split", boost::is_any_of("t "));

answered Dec 9 '17 at 21:14

community wiki

NL628

add a comment |

score 8 · Answer 29 · 2013-04-07 16:07:55Z

Get Boost ! : -)

#include <boost/algorithm/string/split.hpp>

#include <boost/algorithm/string.hpp>

#include <iostream>

#include <vector>



using namespace std;

using namespace boost;



int main(int argc, char**argv) {

    typedef vector < string > list_type;



    list_type list;

    string line;



    line = "Somewhere down the road";

    split(list, line, is_any_of(" "));



    for(int i = 0; i < list.size(); i++)

    {

        cout << list[i] << endl;

    }



    return 0;

}

This example gives the output -

Somewhere

down

the

road

score 8 · Answer 30 · 2015-04-29 15:06:31Z

The code below uses strtok() to split a string into tokens and stores the tokens in a vector.

#include <iostream>

#include <algorithm>

#include <vector>

#include <string>



using namespace std;





char one_line_string = "hello hi how are you nice weather we are having ok then bye";

char seps   = " ,tn";

char *token;







int main()

{

   vector<string> vec_String_Lines;

   token = strtok( one_line_string, seps );



   cout << "Extracting and storing data in a vector..nnn";



   while( token != NULL )

   {

      vec_String_Lines.push_back(token);

      token = strtok( NULL, seps );

   }

     cout << "Displaying end result in vector line storage..nn";



    for ( int i = 0; i < vec_String_Lines.size(); ++i)

    cout << vec_String_Lines[i] << "n";

    cout << "nnn";





return 0;

}

score 1225 · Accepted Answer · 2016-06-09 17:47:05Z

up vote
1225
down vote

accepted

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>

#include <string>

#include <sstream>

#include <algorithm>

#include <iterator>



int main() {

    using namespace std;

    string sentence = "And I feel fine...";

    istringstream iss(sentence);

    copy(istream_iterator<string>(iss),

         istream_iterator<string>(),

         ostream_iterator<string>(cout, "n"));

}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;

copy(istream_iterator<string>(iss),

     istream_iterator<string>(),

     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},

                      istream_iterator<string>{}};

edited Jun 9 '16 at 17:47

community wiki

8 revs, 8 users 71%
Zunino

145

Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49

14

@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37

728

This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57

34

Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30

35

@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08

|
show 19 more comments

score 2332 · Answer 32 · 2018-02-28 23:32:54Z

up vote
2332
down vote

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>

#include <sstream>

#include <vector>

#include <iterator>



template<typename Out>

void split(const std::string &s, char delim, Out result) {

    std::stringstream ss(s);

    std::string item;

    while (std::getline(ss, item, delim)) {

        *(result++) = item;

    }

}



std::vector<std::string> split(const std::string &s, char delim) {

    std::vector<std::string> elems;

    split(s, delim, std::back_inserter(elems));

    return elems;

}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');

edited Feb 28 at 23:32

community wiki

20 revs, 15 users 43%
Evan Teran

73

In order to avoid it skipping empty tokens, do an empty() check: if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33

11

How about the delim contains two chars as ->?
– herohuyongtao
Dec 26 '13 at 8:15

7

@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11

4

@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this: f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like.
– Evan Teran
Jan 25 '14 at 17:50

6

Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04

|
show 20 more comments

score 805 · Answer 33 · 2015-08-03 23:20:33Z

up vote
805
down vote

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>

std::vector<std::string> strs;

boost::split(strs, "string to split", boost::is_any_of("t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

edited Aug 3 '15 at 23:20

community wiki

3 revs, 3 users 67%
ididak

32

Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51

40

And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12

78

strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30

28

@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23

28

as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19

|
show 15 more comments

score 333 · Answer 34 · 2018-05-19 10:01:31Z

up vote
333
down vote

#include <vector>

#include <string>

#include <sstream>



int main()

{

    std::string str("Split me by whitespaces");

    std::string buf;                 // Have a buffer string

    std::stringstream ss(str);       // Insert the string into a stream



    std::vector<std::string> tokens; // Create vector to hold our words



    while (ss >> buf)

        tokens.push_back(buf);



    return 0;

}

edited May 19 at 10:01

community wiki

2 revs, 2 users 82%
kev

52

too bad it only splits on spaces ' '...
– Offirmo
Jan 31 '13 at 18:47

You can also split on other delimiters if you use getline in the while condition e.g. to split by commas, use while(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20

add a comment |

score 172 · Answer 35 · 2016-09-19 13:00:24Z

For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT >

void tokenize(const std::string& str, ContainerT& tokens,

              const std::string& delimiters = " ", bool trimEmpty = false)

{

   std::string::size_type pos, lastPos = 0, length = str.length();



   using value_type = typename ContainerT::value_type;

   using size_type  = typename ContainerT::size_type;



   while(lastPos < length + 1)

   {

      pos = str.find_first_of(delimiters, lastPos);

      if(pos == std::string::npos)

      {

         pos = length;

      }



      if(pos != lastPos || !trimEmpty)

         tokens.push_back(value_type(str.data()+lastPos,

               (size_type)pos-lastPos ));



      lastPos = pos + 1;

   }

}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.

It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.

I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames: typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types accordingly. — Nov 28 '11 at 21:41
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely. — Aug 17 '12 at 11:51
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector ); — Aug 17 '12 at 14:23
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible. — Nov 29 '12 at 14:50
That's the correct output for when trimEmpty = true. Keep in mind that "abo" is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think str.find_first_of should change to str.find_first, but I could be wrong... can't test) — Aug 28 '15 at 15:24

score 154 · Answer 36 · 2016-10-04 22:33:35Z

Here's another solution. It's compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {

  std::vector<std::string> tokens;

  std::size_t start = 0, end = 0;

  while ((end = text.find(sep, start)) != std::string::npos) {

    tokens.push_back(text.substr(start, end - start));

    start = end + 1;

  }

  tokens.push_back(text.substr(start));

  return tokens;

}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {

    std::vector<std::string> tokens;

    std::size_t start = 0, end = 0;

    while ((end = text.find(sep, start)) != std::string::npos) {

        if (end != start) {

          tokens.push_back(text.substr(start, end - start));

        }

        start = end + 1;

    }

    if (end != start) {

       tokens.push_back(text.substr(start));

    }

    return tokens;

}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)

{

    std::vector<std::string> tokens;

    std::size_t start = text.find_first_not_of(delims), end = 0;



    while((end = text.find_first_of(delims, start)) != std::string::npos)

    {

        tokens.push_back(text.substr(start, end - start));

        start = text.find_first_not_of(delims, end);

    }

    if(start != std::string::npos)

        tokens.push_back(text.substr(start));



    return tokens;

}

The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter. — Jan 19 '12 at 2:25
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed. — Feb 6 '12 at 18:56
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics. — Jun 27 '13 at 1:20
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct) — Aug 17 '13 at 11:54
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days? — Aug 11 '15 at 15:15

score 110 · Answer 37 · 2018-04-12 11:37:30Z

up vote
110
down vote

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";

string word;



istringstream iss(line, istringstream::in);



while( iss >> word )     

{

    // Do something on `word` here...

}

edited Apr 12 at 11:37

community wiki

4 revs, 2 users 86%
gnomed

Is it possible to declare word as a char?
– abatishchev
Jun 26 '10 at 17:23

Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18

9

if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try: stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03

add a comment |

score 77 · Answer 38 · 2017-05-23 12:34:53Z

This is similar to Stack Overflow question How do I tokenize a string in C++?.

#include <iostream>

#include <string>

#include <boost/tokenizer.hpp>



using namespace std;

using namespace boost;



int main(int argc, char** argv)

{

    string text = "token  testtstring";



    char_separator<char> sep(" t");

    tokenizer<char_separator<char>> tokens(text, sep);

    for (const string& t : tokens)

    {

        cout << t << "." << endl;

    }

}

Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token? — Apr 9 at 19:47

score 66 · Answer 39 · 2017-01-08 04:33:22Z

I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.

#include <ostream>

#include <string>

#include <vector>

#include <algorithm>

#include <iterator>

using namespace std;



vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {

    vector<string> result;

    if (delim.empty()) {

        result.push_back(s);

        return result;

    }

    string::const_iterator substart = s.begin(), subend;

    while (true) {

        subend = search(substart, s.end(), delim.begin(), delim.end());

        string temp(substart, subend);

        if (keep_empty || !temp.empty()) {

            result.push_back(temp);

        }

        if (subend == s.end()) {

            break;

        }

        substart = subend + delim.size();

    }

    return result;

}



int main() {

    const vector<string> words = split("So close no matter how far", " ");

    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));

}

Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.

Finally a solution that is handling empty tokens correctly at both sides of the string — Sep 9 '15 at 20:38

score 50 · Answer 40 · 2018-04-12 11:35:55Z

The STL does not have such a method available already.

However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):

void Tokenize(const string& str,

              vector<string>& tokens,

              const string& delimiters = " ")

{

    // Skip delimiters at beginning.

    string::size_type lastPos = str.find_first_not_of(delimiters, 0);

    // Find first "non-delimiter".

    string::size_type pos     = str.find_first_of(delimiters, lastPos);



    while (string::npos != pos || string::npos != lastPos)

    {

        // Found a token, add it to the vector.

        tokens.push_back(str.substr(lastPos, pos - lastPos));

        // Skip delimiters.  Note the "not_of"

        lastPos = str.find_first_not_of(delimiters, pos);

        // Find next "non-delimiter"

        pos = str.find_first_of(delimiters, lastPos);

    }

}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution. — Oct 25 '08 at 9:16
@Nelson LaQuet: Let me guess: Because strtok is not reentrant? — Oct 25 '08 at 9:52
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string. — Oct 25 '08 at 18:19
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons. — Aug 24 '09 at 9:08
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize). — Apr 12 '15 at 23:55

score 39 · Answer 41 · 2017-05-23 22:17:34Z

up vote
39
down vote

Here is a split function that:

is generic

uses standard C++ (no boost)

accepts multiple delimiters

ignores empty tokens (can easily be changed)

template<typename T>

vector<T> 

split(const T & str, const T & delimiters) {

    vector<T> v;

    typename T::size_type start = 0;

    auto pos = str.find_first_of(delimiters, start);

    while(pos != T::npos) {

        if(pos != start) // ignore empty tokens

            v.emplace_back(str, start, pos - start);

        start = pos + 1;

        pos = str.find_first_of(delimiters, start);

    }

    if(start < str.length()) // ignore trailing delimiter

        v.emplace_back(str, start, str.length() - start); // add what's left of the string

    return v;

}

Example usage:

    vector<string> v = split<string>("Hello, there; World", ";,");

    vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");

edited May 23 '17 at 22:17

community wiki

6 revs
Marco M.

You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20

@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57

2

@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56

3

This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50

@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03

|
show 2 more comments

score 33 · Answer 42 · 2013-01-15 00:12:16Z

I have a 2 lines solution to this problem:

char sep = ' ';

std::string s="1 This is an example";



for(size_t p=0, q=0; p!=s.npos; p=q)

  std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

Then instead of printing you can put it in a vector.

score 33 · Answer 43 · 2013-09-11 08:11:28Z

Yet another flexible and fast way

template<typename Operator>

void tokenize(Operator& op, const char* input, const char* delimiters) {

  const char* s = input;

  const char* e = s;

  while (*e != 0) {

    e = s;

    while (*e != 0 && strchr(delimiters, *e) == 0) ++e;

    if (e - s > 0) {

      op(s, e - s);

    }

    s = e + 1;

  }

}

To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :

template<class ContainerType>

class Appender {

public:

  Appender(ContainerType& container) : container_(container) {;}

  void operator() (const char* s, unsigned length) { 

    container_.push_back(std::string(s,length));

  }

private:

  ContainerType& container_;

};



std::vector<std::string> strVector;

Appender v(strVector);

tokenize(v, "A number of words to be tokenized", " t");

That's it! And that's just one way to use the tokenizer, like how to just
count words:

class WordCounter {

public:

  WordCounter() : noOfWords(0) {}

  void operator() (const char*, unsigned) {

    ++noOfWords;

  }

  unsigned noOfWords;

};



WordCounter wc;

tokenize(wc, "A number of words to be counted", " t"); 

ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Nice. Regarding Appender note "Why shouldn't we inherit a class from STL classes?" — Sep 10 '13 at 12:07

score 29 · Answer 44 · 2015-06-24 09:31:50Z

Here's a simple solution that uses only the standard regex library

#include <regex>

#include <string>

#include <vector>



std::vector<string> Tokenize( const string str, const std::regex regex )

{

    using namespace std;



    std::vector<string> result;



    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );

    sregex_token_iterator reg_end;



    for ( ; it != reg_end; ++it ) {

        if ( !it->str().empty() ) //token could be empty:check

            result.emplace_back( it->str() );

    }



    return result;

}

The regex argument allows checking for multiple arguments (spaces, commas, etc.)

I usually only check to split on spaces and commas, so I also have this default function:

std::vector<string> TokenizeDefault( const string str )

{

    using namespace std;



    regex re( "[\s,]+" );



    return Tokenize( str, re );

}

The "[\s,]+" checks for spaces (\s) and commas (,).

Note, if you want to split wstring instead of string,

change all std::regex to std::wregex

change all sregex_token_iterator to wsregex_token_iterator

Note, you might also want to take the string argument by reference, depending on your compiler.

This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1 — Aug 19 '14 at 12:27
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx. — Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use R"([s,]+)". — Feb 17 at 17:42

score 24 · Answer 45 · 2012-02-09 09:32:17Z

If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.

Example code including convenient template:

#include <iostream>

#include <vector>

#include <boost/algorithm/string.hpp>



template<typename _OutputIterator>

inline void split(

    const std::string& str, 

    const std::string& delim, 

    _OutputIterator result)

{

    using namespace boost::algorithm;

    typedef split_iterator<std::string::const_iterator> It;



    for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));

            iter!=It();

            ++iter)

    {

        *(result++) = boost::copy_range<std::string>(*iter);

    }

}



int main(int argc, char* argv)

{

    using namespace std;



    vector<string> splitted;

    split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));



    // or directly to console, for example

    split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));

    return 0;

}

score 23 · Answer 46 · 2018-04-12 11:42:30Z

Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().

Here's an example:

#include <iostream>

#include <string>



int main()

{

    std::string s("Somewhere down the road");

    std::string::size_type prev_pos = 0, pos = 0;



    while( (pos = s.find(' ', pos)) != std::string::npos )

    {

        std::string substring( s.substr(prev_pos, pos-prev_pos) );



        std::cout << substring << 'n';



        prev_pos = ++pos;

    }



    std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word

    std::cout << substring << 'n';



    return 0;

}

This only works for single character delimiters. A simple change lets it work with multicharacter: prev_pos = pos += delimiter.length(); — Feb 5 '16 at 14:48

score 18 · Answer 47 · 2014-05-02 14:49:30Z

up vote
18
down vote

There is a function named strtok.

#include<string>

using namespace std;



vector<string> split(char* str,const char* delim)

{

    char* saveptr;

    char* token = strtok_r(str,delim,&saveptr);



    vector<string> result;



    while(token != NULL)

    {

        result.push_back(token);

        token = strtok_r(NULL,delim,&saveptr);

    }

    return result;

}

edited May 2 '14 at 14:49

community wiki

3 revs, 2 users 91%
Pratik Deoghare

3

strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07

12

Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls strtok when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27

1

as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17

4

strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04

Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50

|
show 2 more comments

score 17 · Answer 48 · 2012-10-29 16:15:47Z

Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)

#include <regex.h>

#include <string.h>

#include <vector.h>



using namespace std;



vector<string> split(string s){

    regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)

    regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );

    regex_iterator<string::iterator> rend; //iterators to iterate thru words

    vector<string> result<regex_iterator>(rit, rend);

    return result;  //iterates through the matches to fill the vector

}

Similar responses with maybe better regex approach: here, and here. — Dec 5 '14 at 23:25

score 15 · Answer 49 · 2015-06-22 17:02:21Z

up vote
15
down vote

The stringstream can be convenient if you need to parse the string by non-space symbols:

string s = "Name:JAck; Spouse:Susan; ...";

string dummy, name, spouse;



istringstream iss(s);

getline(iss, dummy, ':');

getline(iss, name, ';');

getline(iss, dummy, ':');

getline(iss, spouse, ';')

edited Jun 22 '15 at 17:02

community wiki

2 revs, 2 users 95%
lukmac

That's a good working.
– spritecodej
Jan 11 at 6:20

add a comment |

score 14 · Answer 50 · 2011-05-22 23:02:42Z

So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)

{

    std::ostringstream word;

    for (size_t n = 0; n < input.size(); ++n)

    {

        if (std::string::npos == separators.find(input[n]))

            word << input[n];

        else

        {

            if (!word.str().empty() || !remove_empty)

                lst.push_back(word.str());

            word.str("");

        }

    }

    if (!word.str().empty() || !remove_empty)

        lst.push_back(word.str());

}

A good point is that in separators you can pass more than one character.

score 13 · Answer 51 · 2014-01-07 20:28:03Z

I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.

#include <iostream>

#include <vector>

#include <string>

#include <strtk.hpp>



const char *whitespace  = " trnf";

const char *whitespace_and_punctuation  = " trnf;,=";



int main()

{

    {   // normal parsing of a string into a vector of strings

        std::string s("Somewhere down the road");

        std::vector<std::string> result;

        if( strtk::parse( s, whitespace, result ) )

        {

            for(size_t i = 0; i < result.size(); ++i )

                std::cout << result[i] << std::endl;

        }

    }



    {  // parsing a string into a vector of floats with other separators

        // besides spaces



        std::string s("3.0, 3.14; 4.0");

        std::vector<float> values;

        if( strtk::parse( s, whitespace_and_punctuation, values ) )

        {

            for(size_t i = 0; i < values.size(); ++i )

                std::cout << values[i] << std::endl;

        }

    }



    {  // parsing a string into specific variables



        std::string s("angle = 45; radius = 9.9");

        std::string w1, w2;

        float v1, v2;

        if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )

        {

            std::cout << "word " << w1 << ", value " << v1 << std::endl;

            std::cout << "word " << w2 << ", value " << v2 << std::endl;

        }

    }



    return 0;

}

The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.

score 13 · Answer 52 · 2016-07-14 20:17:10Z

Short and elegant

#include <vector>

#include <string>

using namespace std;



vector<string> split(string data, string token)

{

    vector<string> output;

    size_t pos = string::npos; // size_t to avoid improbable overflow

    do

    {

        pos = data.find(token);

        output.push_back(data.substr(0, pos));

        if (string::npos != pos)

            data = data.substr(pos + token.size());

    } while (string::npos != pos);

    return output;

}

can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)

using:

auto a = split("this!!is!!!example!string", "!!");

output:

this

is

!example!string

I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string. — Aug 1 '16 at 15:30

score 11 · Answer 53 · 2017-02-19 17:47:57Z

I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.

I'm sure there's improvements that can be made to even further improve its elegance and please do by all means

StringSplitter.hpp:

#include <vector>

#include <iostream>

#include <string.h>



using namespace std;



class StringSplit

{

private:

    void copy_fragment(char*, char*, char*);

    void copy_fragment(char*, char*, char);

    bool match_fragment(char*, char*, int);

    int untilnextdelim(char*, char);

    int untilnextdelim(char*, char*);

    void assimilate(char*, char);

    void assimilate(char*, char*);

    bool string_contains(char*, char*);

    long calc_string_size(char*);

    void copy_string(char*, char*);



public:

    vector<char*> split_cstr(char);

    vector<char*> split_cstr(char*);

    vector<string> split_string(char);

    vector<string> split_string(char*);

    char* String;

    bool do_string;

    bool keep_empty;

    vector<char*> Container;

    vector<string> ContainerS;



    StringSplit(char * in)

    {

        String = in;

    }



    StringSplit(string in)

    {

        size_t len = calc_string_size((char*)in.c_str());

        String = new char[len + 1];

        memset(String, 0, len + 1);

        copy_string(String, (char*)in.c_str());

        do_string = true;

    }



    ~StringSplit()

    {

        for (int i = 0; i < Container.size(); i++)

        {

            if (Container[i] != NULL)

            {

                delete Container[i];

            }

        }

        if (do_string)

        {

            delete String;

        }

    }

};

StringSplitter.cpp:

#include <string.h>

#include <iostream>

#include <vector>

#include "StringSplit.hpp"



using namespace std;



void StringSplit::assimilate(char*src, char delim)

{

    int until = untilnextdelim(src, delim);

    if (until > 0)

    {

        char * temp = new char[until + 1];

        memset(temp, 0, until + 1);

        copy_fragment(temp, src, delim);

        if (keep_empty || *temp != 0)

        {

            if (!do_string)

            {

                Container.push_back(temp);

            }

            else

            {

                string x = temp;

                ContainerS.push_back(x);

            }



        }

        else

        {

            delete temp;

        }

    }

}



void StringSplit::assimilate(char*src, char* delim)

{

    int until = untilnextdelim(src, delim);

    if (until > 0)

    {

        char * temp = new char[until + 1];

        memset(temp, 0, until + 1);

        copy_fragment(temp, src, delim);

        if (keep_empty || *temp != 0)

        {

            if (!do_string)

            {

                Container.push_back(temp);

            }

            else

            {

                string x = temp;

                ContainerS.push_back(x);

            }

        }

        else

        {

            delete temp;

        }

    }

}



long StringSplit::calc_string_size(char* _in)

{

    long i = 0;

    while (*_in++)

    {

        i++;

    }

    return i;

}



bool StringSplit::string_contains(char* haystack, char* needle)

{

    size_t len = calc_string_size(needle);

    size_t lenh = calc_string_size(haystack);

    while (lenh--)

    {

        if (match_fragment(haystack + lenh, needle, len))

        {

            return true;

        }

    }

    return false;

}



bool StringSplit::match_fragment(char* _src, char* cmp, int len)

{

    while (len--)

    {

        if (*(_src + len) != *(cmp + len))

        {

            return false;

        }

    }

    return true;

}



int StringSplit::untilnextdelim(char* _in, char delim)

{

    size_t len = calc_string_size(_in);

    if (*_in == delim)

    {

        _in += 1;

        return len - 1;

    }



    int c = 0;

    while (*(_in + c) != delim && c < len)

    {

        c++;

    }



    return c;

}



int StringSplit::untilnextdelim(char* _in, char* delim)

{

    int s = calc_string_size(delim);

    int c = 1 + s;



    if (!string_contains(_in, delim))

    {

        return calc_string_size(_in);

    }

    else if (match_fragment(_in, delim, s))

    {

        _in += s;

        return calc_string_size(_in);

    }



    while (!match_fragment(_in + c, delim, s))

    {

        c++;

    }



    return c;

}



void StringSplit::copy_fragment(char* dest, char* src, char delim)

{

    if (*src == delim)

    {

        src++;

    }



    int c = 0;

    while (*(src + c) != delim && *(src + c))

    {

        *(dest + c) = *(src + c);

        c++;

    }

    *(dest + c) = 0;

}



void StringSplit::copy_string(char* dest, char* src)

{

    int i = 0;

    while (*(src + i))

    {

        *(dest + i) = *(src + i);

        i++;

    }

}



void StringSplit::copy_fragment(char* dest, char* src, char* delim)

{

    size_t len = calc_string_size(delim);

    size_t lens = calc_string_size(src);



    if (match_fragment(src, delim, len))

    {

        src += len;

        lens -= len;

    }



    int c = 0;

    while (!match_fragment(src + c, delim, len) && (c < lens))

    {

        *(dest + c) = *(src + c);

        c++;

    }

    *(dest + c) = 0;

}



vector<char*> StringSplit::split_cstr(char Delimiter)

{

    int i = 0;

    while (*String)

    {

        if (*String != Delimiter && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (*String == Delimiter)

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return Container;

}



vector<string> StringSplit::split_string(char Delimiter)

{

    do_string = true;



    int i = 0;

    while (*String)

    {

        if (*String != Delimiter && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (*String == Delimiter)

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return ContainerS;

}



vector<char*> StringSplit::split_cstr(char* Delimiter)

{

    int i = 0;

    size_t LenDelim = calc_string_size(Delimiter);



    while(*String)

    {

        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (match_fragment(String, Delimiter, LenDelim))

        {

            assimilate(String,Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return Container;

}



vector<string> StringSplit::split_string(char* Delimiter)

{

    do_string = true;

    int i = 0;

    size_t LenDelim = calc_string_size(Delimiter);



    while (*String)

    {

        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)

        {

            assimilate(String, Delimiter);

        }

        if (match_fragment(String, Delimiter, LenDelim))

        {

            assimilate(String, Delimiter);

        }

        i++;

        String++;

    }



    String -= i;

    delete String;



    return ContainerS;

}

Examples:

int main(int argc, char*argv)

{

    StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";

    vector<char*> Split = ss.split_cstr(":CUT:");



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}

Will output:

This

is

an

example

cstring

int main(int argc, char*argv)

{

    StringSplit ss = "This:is:an:example:cstring";

    vector<char*> Split = ss.split_cstr(':');



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}



int main(int argc, char*argv)

{

    string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";

    StringSplit ss = mystring;

    vector<string> Split = ss.split_string("[SPLIT]");



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}



int main(int argc, char*argv)

{

    string mystring = "This|is|an|example|string";

    StringSplit ss = mystring;

    vector<string> Split = ss.split_string('|');



    for (int i = 0; i < Split.size(); i++)

    {

        cout << Split[i] << endl;

    }



    return 0;

}

To keep empty entries (by default empties will be excluded):

StringSplit ss = mystring;

ss.keep_empty = true;

vector<string> Split = ss.split_string(":DELIM:");

The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:

String Split = 

    "Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);



foreach(String X in Split)

{

    Console.Write(X);

}

I hope someone else can find this as useful as I do.

score 10 · Answer 54 · 2012-12-19 22:05:24Z

up vote
10
down vote

What about this:

#include <string>

#include <vector>



using namespace std;



vector<string> split(string str, const char delim) {

    vector<string> v;

    string tmp;



    for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {

        if(*i != delim && i != str.end()) {

            tmp += *i; 

        } else {

            v.push_back(tmp);

            tmp = ""; 

        }   

    }   



    return v;

}

edited Dec 19 '12 at 22:05

community wiki

3 revs, 3 users 89%
gibbz

This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09

add a comment |

score 9 · Answer 55 · 2010-01-08 03:27:24Z

Here's another way of doing it..

void split_string(string text,vector<string>& words)

{

  int i=0;

  char ch;

  string word;



  while(ch=text[i++])

  {

    if (isspace(ch))

    {

      if (!word.empty())

      {

        words.push_back(word);

      }

      word = "";

    }

    else

    {

      word += ch;

    }

  }

  if (!word.empty())

  {

    words.push_back(word);

  }

}

score 9 · Answer 56 · 2011-06-12 09:25:38Z

I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.

#include <iostream>

#include <string>

#include <boost/regex.hpp>



int main() {

    std::string line("A:::line::to:split");

    const boost::regex re(":+"); // one or more colons



    // -1 means find inverse matches aka split

    boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);

    boost::sregex_token_iterator end;



    for (; tokens != end; ++tokens)

        std::cout << *tokens << std::endl;

}

score 9 · Answer 57 · 2011-09-14 09:47:57Z

Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.

#include <string>

#include <list>

#include <locale> // std::isupper



template<class String>

const std::list<String> split_camel_case_string(const String &s)

{

    std::list<String> R;

    String w;



    for (String::const_iterator i = s.begin(); i < s.end(); ++i) {  {

        if (std::isupper(*i)) {

            if (w.length()) {

                R.push_back(w);

                w.clear();

            }

        }

        w += *i;

    }



    if (w.length())

        R.push_back(w);

    return R;

}

For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".

Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.

There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway: std::isupper could be passed as argument, not std::upper. Second put a typename before the String::const_iterator. — Apr 28 '15 at 7:20

score 9 · Answer 58 · 2017-12-09 21:14:38Z

up vote
9
down vote

This answer takes the string and puts it into a vector of strings. It uses the boost library.

#include <boost/algorithm/string.hpp>

std::vector<std::string> strs;

boost::split(strs, "string to split", boost::is_any_of("t "));

answered Dec 9 '17 at 21:14

community wiki

NL628

add a comment |

score 8 · Answer 59 · 2013-04-07 16:07:55Z

Get Boost ! : -)

#include <boost/algorithm/string/split.hpp>

#include <boost/algorithm/string.hpp>

#include <iostream>

#include <vector>



using namespace std;

using namespace boost;



int main(int argc, char**argv) {

    typedef vector < string > list_type;



    list_type list;

    string line;



    line = "Somewhere down the road";

    split(list, line, is_any_of(" "));



    for(int i = 0; i < list.size(); i++)

    {

        cout << list[i] << endl;

    }



    return 0;

}

This example gives the output -

Somewhere

down

the

road

score 8 · Answer 60 · 2015-04-29 15:06:31Z

The code below uses strtok() to split a string into tokens and stores the tokens in a vector.

#include <iostream>

#include <algorithm>

#include <vector>

#include <string>



using namespace std;





char one_line_string = "hello hi how are you nice weather we are having ok then bye";

char seps   = " ,tn";

char *token;







int main()

{

   vector<string> vec_String_Lines;

   token = strtok( one_line_string, seps );



   cout << "Extracting and storing data in a vector..nnn";



   while( token != NULL )

   {

      vec_String_Lines.push_back(token);

      token = strtok( NULL, seps );

   }

     cout << "Displaying end result in vector line storage..nn";



    for ( int i = 0; i < vec_String_Lines.size(); ++i)

    cout << vec_String_Lines[i] << "n";

    cout << "nnn";





return 0;

}

搜尋此網誌

Btukfyl

How do I iterate over the words of a string?

74 Answers
74

protected by Blorgbeard Dec 4 '12 at 23:26

74 Answers
74

74 Answers
74

protected by Blorgbeard Dec 4 '12 at 23:26

Popular posts from this blog

Futebolista

F# list compare

Jornalista

How do I iterate over the words of a string?

74 Answers 74

protected by Blorgbeard Dec 4 '12 at 23:26

74 Answers 74

74 Answers 74

protected by Blorgbeard Dec 4 '12 at 23:26

Popular posts from this blog

Futebolista

F# list compare

Jornalista

74 Answers
74

74 Answers
74

74 Answers
74