How do I iterate over the words of a string?
up vote
2711
down vote
favorite
I'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
The best solution I have right now is:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Is there a more elegant way to do this?
c++ string split
|
show 3 more comments
up vote
2711
down vote
favorite
I'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
The best solution I have right now is:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Is there a more elegant way to do this?
c++ string split
570
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
13
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }
– pyon
Sep 29 '09 at 15:47
18
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';
– Tony Delroy
Apr 11 '12 at 2:24
8
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
5
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22
|
show 3 more comments
up vote
2711
down vote
favorite
up vote
2711
down vote
favorite
I'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
The best solution I have right now is:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Is there a more elegant way to do this?
c++ string split
I'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
The best solution I have right now is:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Is there a more elegant way to do this?
c++ string split
c++ string split
edited Oct 13 at 19:34
community wiki
27 revs, 14 users 30%
Ashwin Nanjappa
570
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
13
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }
– pyon
Sep 29 '09 at 15:47
18
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';
– Tony Delroy
Apr 11 '12 at 2:24
8
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
5
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22
|
show 3 more comments
570
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
13
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }
– pyon
Sep 29 '09 at 15:47
18
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';
– Tony Delroy
Apr 11 '12 at 2:24
8
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
5
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22
570
570
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
13
13
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }– pyon
Sep 29 '09 at 15:47
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }– pyon
Sep 29 '09 at 15:47
18
18
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.
string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';– Tony Delroy
Apr 11 '12 at 2:24
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.
string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';– Tony Delroy
Apr 11 '12 at 2:24
8
8
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
5
5
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22
|
show 3 more comments
74 Answers
74
active
oldest
votes
1 2
3
next
up vote
1225
down vote
accepted
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector directly:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
|
show 19 more comments
up vote
2332
down vote
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
73
In order to avoid it skipping empty tokens, do anempty()check:if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33
11
How about the delim contains two chars as->?
– herohuyongtao
Dec 26 '13 at 8:15
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:f(split(s, d, v))while still having the benefit of a pre-allocatedvectorif you like.
– Evan Teran
Jan 25 '14 at 17:50
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
|
show 20 more comments
up vote
805
down vote
A possible solution using Boost might be:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
See the documentation for details.
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
|
show 15 more comments
up vote
333
down vote
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
52
too bad it only splits on spaces' '...
– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you usegetlinein thewhilecondition e.g. to split by commas, usewhile(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20
add a comment |
up vote
172
down vote
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;Then to substitute out the value_type and size_types accordingly.
– aws
Nov 28 '11 at 21:41
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
That's the correct output for whentrimEmpty = true. Keep in mind that"abo"is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I thinkstr.find_first_ofshould change tostr.find_first, but I could be wrong... can't test)
– Marius
Aug 28 '15 at 15:24
|
show 5 more comments
up vote
154
down vote
Here's another solution. It's compact and reasonably efficient:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.
It can also be easily expanded to skip empty tokens:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
|
show 9 more comments
up vote
110
down vote
This is my favorite way to iterate through a string. You can do whatever you want per word.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
Is it possible to declarewordas achar?
– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03
add a comment |
up vote
77
down vote
This is similar to Stack Overflow question How do I tokenize a string in C++?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token testtstring";
char_separator<char> sep(" t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
add a comment |
up vote
66
down vote
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));
}
Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
add a comment |
up vote
50
down vote
The STL does not have such a method available already.
However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
Don't get sold on this "Elegance over performance" deal.
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
|
show 5 more comments
up vote
39
down vote
Here is a split function that:
- is generic
- uses standard C++ (no boost)
- accepts multiple delimiters
ignores empty tokens (can easily be changed)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Example usage:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
|
show 2 more comments
up vote
33
down vote
I have a 2 lines solution to this problem:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Then instead of printing you can put it in a vector.
add a comment |
up vote
33
down vote
Yet another flexible and fast way
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " t");
That's it! And that's just one way to use the tokenizer, like how to just
count words:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " t");
ASSERT( wc.noOfWords == 7 );
Limited by imagination ;)
Nice. RegardingAppendernote "Why shouldn't we inherit a class from STL classes?"
– Andreas Spindler
Sep 10 '13 at 12:07
add a comment |
up vote
29
down vote
Here's a simple solution that uses only the standard regex library
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
The regex argument allows checking for multiple arguments (spaces, commas, etc.)
I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\s,]+" );
return Tokenize( str, re );
}
The "[\s,]+" checks for spaces (\s) and commas (,).
Note, if you want to split wstring instead of string,
- change all
std::regextostd::wregex
- change all
sregex_token_iteratortowsregex_token_iterator
Note, you might also want to take the string argument by reference, depending on your compiler.
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just useR"([s,]+)".
– Sam
Feb 17 at 17:42
add a comment |
up vote
24
down vote
If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.
Example code including convenient template:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv)
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));
return 0;
}
add a comment |
up vote
23
down vote
Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().
Here's an example:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << 'n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << 'n';
return 0;
}
This only works for single character delimiters. A simple change lets it work with multicharacter:prev_pos = pos += delimiter.length();
– David Doria
Feb 5 '16 at 14:48
add a comment |
up vote
18
down vote
There is a function named strtok.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
3
strtokis from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread callsstrtokwhen another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
|
show 2 more comments
up vote
17
down vote
Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
add a comment |
up vote
15
down vote
The stringstream can be convenient if you need to parse the string by non-space symbols:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
That's a good working.
– spritecodej
Jan 11 at 6:20
add a comment |
up vote
14
down vote
So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
A good point is that in separators you can pass more than one character.
add a comment |
up vote
13
down vote
I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " trnf";
const char *whitespace_and_punctuation = " trnf;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.
add a comment |
up vote
13
down vote
Short and elegant
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)
using:
auto a = split("this!!is!!!example!string", "!!");
output:
this
is
!example!string
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
add a comment |
up vote
11
down vote
I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.
I'm sure there's improvements that can be made to even further improve its elegance and please do by all means
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete Container[i];
}
}
if (do_string)
{
delete String;
}
}
};
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
Examples:
int main(int argc, char*argv)
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Will output:
This
is
an
example
cstring
int main(int argc, char*argv)
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
To keep empty entries (by default empties will be excluded):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:
String Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
I hope someone else can find this as useful as I do.
add a comment |
up vote
10
down vote
What about this:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
add a comment |
up vote
9
down vote
Here's another way of doing it..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
add a comment |
up vote
9
down vote
I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
add a comment |
up vote
9
down vote
Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".
Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:std::isuppercould be passed as argument, notstd::upper. Second put atypenamebefore theString::const_iterator.
– Andreas Spindler
Apr 28 '15 at 7:20
add a comment |
up vote
9
down vote
This answer takes the string and puts it into a vector of strings. It uses the boost library.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
add a comment |
up vote
8
down vote
Get Boost ! : -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
This example gives the output -
Somewhere
down
the
road
add a comment |
up vote
8
down vote
The code below uses strtok() to split a string into tokens and stores the tokens in a vector.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string = "hello hi how are you nice weather we are having ok then bye";
char seps = " ,tn";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..nnn";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..nn";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "n";
cout << "nnn";
return 0;
}
add a comment |
1 2
3
next
protected by Blorgbeard Dec 4 '12 at 23:26
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
74 Answers
74
active
oldest
votes
74 Answers
74
active
oldest
votes
active
oldest
votes
active
oldest
votes
1 2
3
next
up vote
1225
down vote
accepted
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector directly:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
|
show 19 more comments
up vote
1225
down vote
accepted
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector directly:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
|
show 19 more comments
up vote
1225
down vote
accepted
up vote
1225
down vote
accepted
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector directly:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector directly:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
edited Jun 9 '16 at 17:47
community wiki
8 revs, 8 users 71%
Zunino
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
|
show 19 more comments
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
145
145
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
Is it possible to specify a delimiter for this? Like for instance splitting on commas?
– l3dx
Aug 6 '09 at 11:49
14
14
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
@Jonathan: n is not the delimiter in this case, it's the deliminer for outputting to cout.
– huy
Feb 3 '10 at 12:37
728
728
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess
Jan 10 '11 at 3:57
34
34
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.
– Jerry Coffin
Dec 19 '12 at 20:30
35
35
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau
Feb 7 '13 at 15:08
|
show 19 more comments
up vote
2332
down vote
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
73
In order to avoid it skipping empty tokens, do anempty()check:if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33
11
How about the delim contains two chars as->?
– herohuyongtao
Dec 26 '13 at 8:15
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:f(split(s, d, v))while still having the benefit of a pre-allocatedvectorif you like.
– Evan Teran
Jan 25 '14 at 17:50
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
|
show 20 more comments
up vote
2332
down vote
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
73
In order to avoid it skipping empty tokens, do anempty()check:if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33
11
How about the delim contains two chars as->?
– herohuyongtao
Dec 26 '13 at 8:15
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:f(split(s, d, v))while still having the benefit of a pre-allocatedvectorif you like.
– Evan Teran
Jan 25 '14 at 17:50
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
|
show 20 more comments
up vote
2332
down vote
up vote
2332
down vote
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
edited Feb 28 at 23:32
community wiki
20 revs, 15 users 43%
Evan Teran
73
In order to avoid it skipping empty tokens, do anempty()check:if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33
11
How about the delim contains two chars as->?
– herohuyongtao
Dec 26 '13 at 8:15
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:f(split(s, d, v))while still having the benefit of a pre-allocatedvectorif you like.
– Evan Teran
Jan 25 '14 at 17:50
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
|
show 20 more comments
73
In order to avoid it skipping empty tokens, do anempty()check:if (!item.empty()) elems.push_back(item)
– 0x499602D2
Nov 9 '13 at 22:33
11
How about the delim contains two chars as->?
– herohuyongtao
Dec 26 '13 at 8:15
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:f(split(s, d, v))while still having the benefit of a pre-allocatedvectorif you like.
– Evan Teran
Jan 25 '14 at 17:50
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
73
73
In order to avoid it skipping empty tokens, do an
empty() check: if (!item.empty()) elems.push_back(item)– 0x499602D2
Nov 9 '13 at 22:33
In order to avoid it skipping empty tokens, do an
empty() check: if (!item.empty()) elems.push_back(item)– 0x499602D2
Nov 9 '13 at 22:33
11
11
How about the delim contains two chars as
->?– herohuyongtao
Dec 26 '13 at 8:15
How about the delim contains two chars as
->?– herohuyongtao
Dec 26 '13 at 8:15
7
7
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
@herohuyongtao, this solution only works for single char delimiters.
– Evan Teran
Dec 27 '13 at 6:11
4
4
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:
f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like.– Evan Teran
Jan 25 '14 at 17:50
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:
f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like.– Evan Teran
Jan 25 '14 at 17:50
6
6
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.
– dshin
Sep 9 '15 at 19:04
|
show 20 more comments
up vote
805
down vote
A possible solution using Boost might be:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
See the documentation for details.
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
|
show 15 more comments
up vote
805
down vote
A possible solution using Boost might be:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
See the documentation for details.
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
|
show 15 more comments
up vote
805
down vote
up vote
805
down vote
A possible solution using Boost might be:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
See the documentation for details.
A possible solution using Boost might be:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
See the documentation for details.
edited Aug 3 '15 at 23:20
community wiki
3 revs, 3 users 67%
ididak
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
|
show 15 more comments
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
32
32
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.
– Tom
Mar 1 '09 at 16:51
40
40
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
And for those who don't already have boost... bcp copies over 1,000 files for this :)
– Roman Starkov
Jun 9 '10 at 20:12
78
78
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
strtok is a trap. its thread unsafe.
– tuxSlayer
Apr 23 '11 at 3:30
28
28
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
@Ian Embedded developers aren't all using boost.
– ACK_stoverflow
Jan 31 '12 at 18:23
28
28
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.
– GMasucci
May 22 '13 at 8:19
|
show 15 more comments
up vote
333
down vote
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
52
too bad it only splits on spaces' '...
– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you usegetlinein thewhilecondition e.g. to split by commas, usewhile(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20
add a comment |
up vote
333
down vote
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
52
too bad it only splits on spaces' '...
– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you usegetlinein thewhilecondition e.g. to split by commas, usewhile(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20
add a comment |
up vote
333
down vote
up vote
333
down vote
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
edited May 19 at 10:01
community wiki
2 revs, 2 users 82%
kev
52
too bad it only splits on spaces' '...
– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you usegetlinein thewhilecondition e.g. to split by commas, usewhile(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20
add a comment |
52
too bad it only splits on spaces' '...
– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you usegetlinein thewhilecondition e.g. to split by commas, usewhile(getline(ss, buff, ',')).
– Ali
Oct 6 at 20:20
52
52
too bad it only splits on spaces
' '...– Offirmo
Jan 31 '13 at 18:47
too bad it only splits on spaces
' '...– Offirmo
Jan 31 '13 at 18:47
You can also split on other delimiters if you use
getline in the while condition e.g. to split by commas, use while(getline(ss, buff, ',')).– Ali
Oct 6 at 20:20
You can also split on other delimiters if you use
getline in the while condition e.g. to split by commas, use while(getline(ss, buff, ',')).– Ali
Oct 6 at 20:20
add a comment |
up vote
172
down vote
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;Then to substitute out the value_type and size_types accordingly.
– aws
Nov 28 '11 at 21:41
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
That's the correct output for whentrimEmpty = true. Keep in mind that"abo"is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I thinkstr.find_first_ofshould change tostr.find_first, but I could be wrong... can't test)
– Marius
Aug 28 '15 at 15:24
|
show 5 more comments
up vote
172
down vote
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;Then to substitute out the value_type and size_types accordingly.
– aws
Nov 28 '11 at 21:41
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
That's the correct output for whentrimEmpty = true. Keep in mind that"abo"is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I thinkstr.find_first_ofshould change tostr.find_first, but I could be wrong... can't test)
– Marius
Aug 28 '15 at 15:24
|
show 5 more comments
up vote
172
down vote
up vote
172
down vote
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
edited Sep 19 '16 at 13:00
community wiki
11 revs, 5 users 78%
Marius
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;Then to substitute out the value_type and size_types accordingly.
– aws
Nov 28 '11 at 21:41
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
That's the correct output for whentrimEmpty = true. Keep in mind that"abo"is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I thinkstr.find_first_ofshould change tostr.find_first, but I could be wrong... can't test)
– Marius
Aug 28 '15 at 15:24
|
show 5 more comments
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;Then to substitute out the value_type and size_types accordingly.
– aws
Nov 28 '11 at 21:41
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
That's the correct output for whentrimEmpty = true. Keep in mind that"abo"is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I thinkstr.find_first_ofshould change tostr.find_first, but I could be wrong... can't test)
– Marius
Aug 28 '15 at 15:24
5
5
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:
typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types accordingly.– aws
Nov 28 '11 at 21:41
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames:
typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types accordingly.– aws
Nov 28 '11 at 21:41
10
10
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.
– Wes Miller
Aug 17 '12 at 11:51
3
3
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );
– Wes Miller
Aug 17 '12 at 14:23
2
2
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius
Nov 29 '12 at 14:50
3
3
That's the correct output for when
trimEmpty = true. Keep in mind that "abo" is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think str.find_first_of should change to str.find_first, but I could be wrong... can't test)– Marius
Aug 28 '15 at 15:24
That's the correct output for when
trimEmpty = true. Keep in mind that "abo" is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think str.find_first_of should change to str.find_first, but I could be wrong... can't test)– Marius
Aug 28 '15 at 15:24
|
show 5 more comments
up vote
154
down vote
Here's another solution. It's compact and reasonably efficient:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.
It can also be easily expanded to skip empty tokens:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
|
show 9 more comments
up vote
154
down vote
Here's another solution. It's compact and reasonably efficient:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.
It can also be easily expanded to skip empty tokens:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
|
show 9 more comments
up vote
154
down vote
up vote
154
down vote
Here's another solution. It's compact and reasonably efficient:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.
It can also be easily expanded to skip empty tokens:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
Here's another solution. It's compact and reasonably efficient:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.
It can also be easily expanded to skip empty tokens:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
edited Oct 4 '16 at 22:33
community wiki
13 revs, 7 users 48%
Alec Thomas
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
|
show 9 more comments
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
10
10
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.
– gregschlom
Jan 19 '12 at 2:25
2
2
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.
– Alec Thomas
Feb 6 '12 at 18:56
2
2
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.
– Alec Thomas
Jun 27 '13 at 1:20
6
6
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)
– Marcelo Cantos
Aug 17 '13 at 11:54
11
11
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?
– Spacen Jasset
Aug 11 '15 at 15:15
|
show 9 more comments
up vote
110
down vote
This is my favorite way to iterate through a string. You can do whatever you want per word.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
Is it possible to declarewordas achar?
– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03
add a comment |
up vote
110
down vote
This is my favorite way to iterate through a string. You can do whatever you want per word.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
Is it possible to declarewordas achar?
– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03
add a comment |
up vote
110
down vote
up vote
110
down vote
This is my favorite way to iterate through a string. You can do whatever you want per word.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
This is my favorite way to iterate through a string. You can do whatever you want per word.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
edited Apr 12 at 11:37
community wiki
4 revs, 2 users 86%
gnomed
Is it possible to declarewordas achar?
– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03
add a comment |
Is it possible to declarewordas achar?
– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner
Aug 4 '10 at 18:03
Is it possible to declare
word as a char?– abatishchev
Jun 26 '10 at 17:23
Is it possible to declare
word as a char?– abatishchev
Jun 26 '10 at 17:23
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++
– gnomed
Jun 30 '10 at 22:18
9
9
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:
stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;– Wayne Werner
Aug 4 '10 at 18:03
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:
stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;– Wayne Werner
Aug 4 '10 at 18:03
add a comment |
up vote
77
down vote
This is similar to Stack Overflow question How do I tokenize a string in C++?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token testtstring";
char_separator<char> sep(" t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
add a comment |
up vote
77
down vote
This is similar to Stack Overflow question How do I tokenize a string in C++?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token testtstring";
char_separator<char> sep(" t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
add a comment |
up vote
77
down vote
up vote
77
down vote
This is similar to Stack Overflow question How do I tokenize a string in C++?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token testtstring";
char_separator<char> sep(" t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
This is similar to Stack Overflow question How do I tokenize a string in C++?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token testtstring";
char_separator<char> sep(" t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
edited May 23 '17 at 12:34
community wiki
5 revs, 3 users 77%
Ferruccio
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
add a comment |
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?
– einpoklum
Apr 9 at 19:47
add a comment |
up vote
66
down vote
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));
}
Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
add a comment |
up vote
66
down vote
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));
}
Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
add a comment |
up vote
66
down vote
up vote
66
down vote
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));
}
Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "n"));
}
Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.
edited Jan 8 '17 at 4:33
community wiki
3 revs, 2 users 70%
Shadow2531
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
add a comment |
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
Finally a solution that is handling empty tokens correctly at both sides of the string
– fmuecke
Sep 9 '15 at 20:38
add a comment |
up vote
50
down vote
The STL does not have such a method available already.
However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
Don't get sold on this "Elegance over performance" deal.
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
|
show 5 more comments
up vote
50
down vote
The STL does not have such a method available already.
However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
Don't get sold on this "Elegance over performance" deal.
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
|
show 5 more comments
up vote
50
down vote
up vote
50
down vote
The STL does not have such a method available already.
However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
Don't get sold on this "Elegance over performance" deal.
The STL does not have such a method available already.
However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
Don't get sold on this "Elegance over performance" deal.
edited Apr 12 at 11:35
community wiki
3 revs, 2 users 82%
nlaq
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
|
show 5 more comments
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.
– Ashwin Nanjappa
Oct 25 '08 at 9:16
11
11
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?
– paercebal
Oct 25 '08 at 9:52
35
35
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.
– Evan Teran
Oct 25 '08 at 18:19
3
3
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.
– j_random_hacker
Aug 24 '09 at 9:08
2
2
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).
– Ben Voigt
Apr 12 '15 at 23:55
|
show 5 more comments
up vote
39
down vote
Here is a split function that:
- is generic
- uses standard C++ (no boost)
- accepts multiple delimiters
ignores empty tokens (can easily be changed)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Example usage:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
|
show 2 more comments
up vote
39
down vote
Here is a split function that:
- is generic
- uses standard C++ (no boost)
- accepts multiple delimiters
ignores empty tokens (can easily be changed)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Example usage:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
|
show 2 more comments
up vote
39
down vote
up vote
39
down vote
Here is a split function that:
- is generic
- uses standard C++ (no boost)
- accepts multiple delimiters
ignores empty tokens (can easily be changed)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Example usage:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
Here is a split function that:
- is generic
- uses standard C++ (no boost)
- accepts multiple delimiters
ignores empty tokens (can easily be changed)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Example usage:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
edited May 23 '17 at 22:17
community wiki
6 revs
Marco M.
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
|
show 2 more comments
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
You forgot to add to use list: "extremely inefficient"
– Xander Tulip
Mar 19 '12 at 0:20
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
@XanderTulip, can you be more constructive and explain how or why?
– Marco M.
Mar 21 '12 at 11:57
2
2
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.
– Joseph Garvin
May 7 '12 at 13:56
3
3
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.
– Mihai Bişog
Sep 5 '12 at 13:50
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. Thanks
– Marco M.
Sep 12 '12 at 13:03
|
show 2 more comments
up vote
33
down vote
I have a 2 lines solution to this problem:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Then instead of printing you can put it in a vector.
add a comment |
up vote
33
down vote
I have a 2 lines solution to this problem:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Then instead of printing you can put it in a vector.
add a comment |
up vote
33
down vote
up vote
33
down vote
I have a 2 lines solution to this problem:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Then instead of printing you can put it in a vector.
I have a 2 lines solution to this problem:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Then instead of printing you can put it in a vector.
edited Jan 15 '13 at 0:12
community wiki
2 revs, 2 users 94%
rhomu
add a comment |
add a comment |
up vote
33
down vote
Yet another flexible and fast way
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " t");
That's it! And that's just one way to use the tokenizer, like how to just
count words:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " t");
ASSERT( wc.noOfWords == 7 );
Limited by imagination ;)
Nice. RegardingAppendernote "Why shouldn't we inherit a class from STL classes?"
– Andreas Spindler
Sep 10 '13 at 12:07
add a comment |
up vote
33
down vote
Yet another flexible and fast way
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " t");
That's it! And that's just one way to use the tokenizer, like how to just
count words:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " t");
ASSERT( wc.noOfWords == 7 );
Limited by imagination ;)
Nice. RegardingAppendernote "Why shouldn't we inherit a class from STL classes?"
– Andreas Spindler
Sep 10 '13 at 12:07
add a comment |
up vote
33
down vote
up vote
33
down vote
Yet another flexible and fast way
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " t");
That's it! And that's just one way to use the tokenizer, like how to just
count words:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " t");
ASSERT( wc.noOfWords == 7 );
Limited by imagination ;)
Yet another flexible and fast way
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " t");
That's it! And that's just one way to use the tokenizer, like how to just
count words:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " t");
ASSERT( wc.noOfWords == 7 );
Limited by imagination ;)
edited Sep 11 '13 at 8:11
community wiki
2 revs
Robert
Nice. RegardingAppendernote "Why shouldn't we inherit a class from STL classes?"
– Andreas Spindler
Sep 10 '13 at 12:07
add a comment |
Nice. RegardingAppendernote "Why shouldn't we inherit a class from STL classes?"
– Andreas Spindler
Sep 10 '13 at 12:07
Nice. Regarding
Appender note "Why shouldn't we inherit a class from STL classes?"– Andreas Spindler
Sep 10 '13 at 12:07
Nice. Regarding
Appender note "Why shouldn't we inherit a class from STL classes?"– Andreas Spindler
Sep 10 '13 at 12:07
add a comment |
up vote
29
down vote
Here's a simple solution that uses only the standard regex library
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
The regex argument allows checking for multiple arguments (spaces, commas, etc.)
I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\s,]+" );
return Tokenize( str, re );
}
The "[\s,]+" checks for spaces (\s) and commas (,).
Note, if you want to split wstring instead of string,
- change all
std::regextostd::wregex
- change all
sregex_token_iteratortowsregex_token_iterator
Note, you might also want to take the string argument by reference, depending on your compiler.
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just useR"([s,]+)".
– Sam
Feb 17 at 17:42
add a comment |
up vote
29
down vote
Here's a simple solution that uses only the standard regex library
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
The regex argument allows checking for multiple arguments (spaces, commas, etc.)
I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\s,]+" );
return Tokenize( str, re );
}
The "[\s,]+" checks for spaces (\s) and commas (,).
Note, if you want to split wstring instead of string,
- change all
std::regextostd::wregex
- change all
sregex_token_iteratortowsregex_token_iterator
Note, you might also want to take the string argument by reference, depending on your compiler.
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just useR"([s,]+)".
– Sam
Feb 17 at 17:42
add a comment |
up vote
29
down vote
up vote
29
down vote
Here's a simple solution that uses only the standard regex library
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
The regex argument allows checking for multiple arguments (spaces, commas, etc.)
I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\s,]+" );
return Tokenize( str, re );
}
The "[\s,]+" checks for spaces (\s) and commas (,).
Note, if you want to split wstring instead of string,
- change all
std::regextostd::wregex
- change all
sregex_token_iteratortowsregex_token_iterator
Note, you might also want to take the string argument by reference, depending on your compiler.
Here's a simple solution that uses only the standard regex library
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
The regex argument allows checking for multiple arguments (spaces, commas, etc.)
I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\s,]+" );
return Tokenize( str, re );
}
The "[\s,]+" checks for spaces (\s) and commas (,).
Note, if you want to split wstring instead of string,
- change all
std::regextostd::wregex
- change all
sregex_token_iteratortowsregex_token_iterator
Note, you might also want to take the string argument by reference, depending on your compiler.
edited Jun 24 '15 at 9:31
community wiki
2 revs, 2 users 99%
dk123
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just useR"([s,]+)".
– Sam
Feb 17 at 17:42
add a comment |
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just useR"([s,]+)".
– Sam
Feb 17 at 17:42
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1
– mchiasson
Aug 19 '14 at 12:27
1
1
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.
– QuantumKarl
Oct 16 '15 at 15:06
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use
R"([s,]+)".– Sam
Feb 17 at 17:42
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use
R"([s,]+)".– Sam
Feb 17 at 17:42
add a comment |
up vote
24
down vote
If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.
Example code including convenient template:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv)
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));
return 0;
}
add a comment |
up vote
24
down vote
If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.
Example code including convenient template:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv)
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));
return 0;
}
add a comment |
up vote
24
down vote
up vote
24
down vote
If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.
Example code including convenient template:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv)
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));
return 0;
}
If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.
Example code including convenient template:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv)
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "n"));
return 0;
}
edited Feb 9 '12 at 9:32
community wiki
3 revs, 2 users 71%
zerm
add a comment |
add a comment |
up vote
23
down vote
Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().
Here's an example:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << 'n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << 'n';
return 0;
}
This only works for single character delimiters. A simple change lets it work with multicharacter:prev_pos = pos += delimiter.length();
– David Doria
Feb 5 '16 at 14:48
add a comment |
up vote
23
down vote
Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().
Here's an example:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << 'n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << 'n';
return 0;
}
This only works for single character delimiters. A simple change lets it work with multicharacter:prev_pos = pos += delimiter.length();
– David Doria
Feb 5 '16 at 14:48
add a comment |
up vote
23
down vote
up vote
23
down vote
Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().
Here's an example:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << 'n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << 'n';
return 0;
}
Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().
Here's an example:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << 'n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << 'n';
return 0;
}
edited Apr 12 at 11:42
community wiki
2 revs, 2 users 81%
KTC
This only works for single character delimiters. A simple change lets it work with multicharacter:prev_pos = pos += delimiter.length();
– David Doria
Feb 5 '16 at 14:48
add a comment |
This only works for single character delimiters. A simple change lets it work with multicharacter:prev_pos = pos += delimiter.length();
– David Doria
Feb 5 '16 at 14:48
This only works for single character delimiters. A simple change lets it work with multicharacter:
prev_pos = pos += delimiter.length();– David Doria
Feb 5 '16 at 14:48
This only works for single character delimiters. A simple change lets it work with multicharacter:
prev_pos = pos += delimiter.length();– David Doria
Feb 5 '16 at 14:48
add a comment |
up vote
18
down vote
There is a function named strtok.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
3
strtokis from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread callsstrtokwhen another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
|
show 2 more comments
up vote
18
down vote
There is a function named strtok.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
3
strtokis from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread callsstrtokwhen another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
|
show 2 more comments
up vote
18
down vote
up vote
18
down vote
There is a function named strtok.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
There is a function named strtok.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
edited May 2 '14 at 14:49
community wiki
3 revs, 2 users 91%
Pratik Deoghare
3
strtokis from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread callsstrtokwhen another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
|
show 2 more comments
3
strtokis from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.
– Kevin Panko
Jun 14 '10 at 14:07
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread callsstrtokwhen another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp
– Kevin Panko
Jun 14 '10 at 17:27
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
3
3
strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.– Kevin Panko
Jun 14 '10 at 14:07
strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.– Kevin Panko
Jun 14 '10 at 14:07
12
12
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls
strtok when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp– Kevin Panko
Jun 14 '10 at 17:27
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls
strtok when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.asp– Kevin Panko
Jun 14 '10 at 17:27
1
1
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
as mentioned before strtok is unsafe and even in C strtok_r is recommended for use
– systemsfault
Jul 6 '10 at 12:17
4
4
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++
– Erik Aronesty
Oct 10 '11 at 18:04
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
Updated so there can be no objections on the grounds of thread safety from C++ wonks.
– Erik Aronesty
May 2 '14 at 14:50
|
show 2 more comments
up vote
17
down vote
Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
add a comment |
up vote
17
down vote
Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
add a comment |
up vote
17
down vote
up vote
17
down vote
Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
answered Oct 29 '12 at 16:15
community wiki
AJMansfield
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
add a comment |
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
Similar responses with maybe better regex approach: here, and here.
– nobar
Dec 5 '14 at 23:25
add a comment |
up vote
15
down vote
The stringstream can be convenient if you need to parse the string by non-space symbols:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
That's a good working.
– spritecodej
Jan 11 at 6:20
add a comment |
up vote
15
down vote
The stringstream can be convenient if you need to parse the string by non-space symbols:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
That's a good working.
– spritecodej
Jan 11 at 6:20
add a comment |
up vote
15
down vote
up vote
15
down vote
The stringstream can be convenient if you need to parse the string by non-space symbols:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
The stringstream can be convenient if you need to parse the string by non-space symbols:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
edited Jun 22 '15 at 17:02
community wiki
2 revs, 2 users 95%
lukmac
That's a good working.
– spritecodej
Jan 11 at 6:20
add a comment |
That's a good working.
– spritecodej
Jan 11 at 6:20
That's a good working.
– spritecodej
Jan 11 at 6:20
That's a good working.
– spritecodej
Jan 11 at 6:20
add a comment |
up vote
14
down vote
So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
A good point is that in separators you can pass more than one character.
add a comment |
up vote
14
down vote
So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
A good point is that in separators you can pass more than one character.
add a comment |
up vote
14
down vote
up vote
14
down vote
So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
A good point is that in separators you can pass more than one character.
So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
A good point is that in separators you can pass more than one character.
edited May 22 '11 at 23:02
community wiki
3 revs, 2 users 64%
Goran
add a comment |
add a comment |
up vote
13
down vote
I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " trnf";
const char *whitespace_and_punctuation = " trnf;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.
add a comment |
up vote
13
down vote
I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " trnf";
const char *whitespace_and_punctuation = " trnf;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.
add a comment |
up vote
13
down vote
up vote
13
down vote
I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " trnf";
const char *whitespace_and_punctuation = " trnf;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.
I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " trnf";
const char *whitespace_and_punctuation = " trnf;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.
answered Jan 7 '14 at 20:28
community wiki
DannyK
add a comment |
add a comment |
up vote
13
down vote
Short and elegant
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)
using:
auto a = split("this!!is!!!example!string", "!!");
output:
this
is
!example!string
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
add a comment |
up vote
13
down vote
Short and elegant
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)
using:
auto a = split("this!!is!!!example!string", "!!");
output:
this
is
!example!string
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
add a comment |
up vote
13
down vote
up vote
13
down vote
Short and elegant
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)
using:
auto a = split("this!!is!!!example!string", "!!");
output:
this
is
!example!string
Short and elegant
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)
using:
auto a = split("this!!is!!!example!string", "!!");
output:
this
is
!example!string
edited Jul 14 '16 at 20:17
community wiki
2 revs, 2 users 98%
user1438233
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
add a comment |
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.
– Alessandro Teruzzi
Aug 1 '16 at 15:30
add a comment |
up vote
11
down vote
I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.
I'm sure there's improvements that can be made to even further improve its elegance and please do by all means
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete Container[i];
}
}
if (do_string)
{
delete String;
}
}
};
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
Examples:
int main(int argc, char*argv)
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Will output:
This
is
an
example
cstring
int main(int argc, char*argv)
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
To keep empty entries (by default empties will be excluded):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:
String Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
I hope someone else can find this as useful as I do.
add a comment |
up vote
11
down vote
I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.
I'm sure there's improvements that can be made to even further improve its elegance and please do by all means
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete Container[i];
}
}
if (do_string)
{
delete String;
}
}
};
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
Examples:
int main(int argc, char*argv)
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Will output:
This
is
an
example
cstring
int main(int argc, char*argv)
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
To keep empty entries (by default empties will be excluded):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:
String Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
I hope someone else can find this as useful as I do.
add a comment |
up vote
11
down vote
up vote
11
down vote
I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.
I'm sure there's improvements that can be made to even further improve its elegance and please do by all means
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete Container[i];
}
}
if (do_string)
{
delete String;
}
}
};
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
Examples:
int main(int argc, char*argv)
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Will output:
This
is
an
example
cstring
int main(int argc, char*argv)
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
To keep empty entries (by default empties will be excluded):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:
String Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
I hope someone else can find this as useful as I do.
I made this because I needed an easy way to split strings and c-based strings... Hopefully someone else can find it useful as well. Also it doesn't rely on tokens and you can use fields as delimiters, which is another key I needed.
I'm sure there's improvements that can be made to even further improve its elegance and please do by all means
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete Container[i];
}
}
if (do_string)
{
delete String;
}
}
};
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete String;
return ContainerS;
}
Examples:
int main(int argc, char*argv)
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Will output:
This
is
an
example
cstring
int main(int argc, char*argv)
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv)
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
To keep empty entries (by default empties will be excluded):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:
String Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
I hope someone else can find this as useful as I do.
edited Feb 19 '17 at 17:47
community wiki
2 revs, 2 users 70%
Steve Dell
add a comment |
add a comment |
up vote
10
down vote
What about this:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
add a comment |
up vote
10
down vote
What about this:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
add a comment |
up vote
10
down vote
up vote
10
down vote
What about this:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
What about this:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
edited Dec 19 '12 at 22:05
community wiki
3 revs, 3 users 89%
gibbz
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
add a comment |
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849
– Oktalist
Dec 19 '12 at 22:09
add a comment |
up vote
9
down vote
Here's another way of doing it..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
add a comment |
up vote
9
down vote
Here's another way of doing it..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
add a comment |
up vote
9
down vote
up vote
9
down vote
Here's another way of doing it..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
Here's another way of doing it..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
edited Jan 8 '10 at 3:27
community wiki
2 revs
user246110
add a comment |
add a comment |
up vote
9
down vote
I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
add a comment |
up vote
9
down vote
I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
add a comment |
up vote
9
down vote
up vote
9
down vote
I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
answered Jun 12 '11 at 9:25
community wiki
Marty B
add a comment |
add a comment |
up vote
9
down vote
Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".
Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:std::isuppercould be passed as argument, notstd::upper. Second put atypenamebefore theString::const_iterator.
– Andreas Spindler
Apr 28 '15 at 7:20
add a comment |
up vote
9
down vote
Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".
Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:std::isuppercould be passed as argument, notstd::upper. Second put atypenamebefore theString::const_iterator.
– Andreas Spindler
Apr 28 '15 at 7:20
add a comment |
up vote
9
down vote
up vote
9
down vote
Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".
Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.
Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".
Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.
edited Sep 14 '11 at 9:47
community wiki
2 revs
Andreas Spindler
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:std::isuppercould be passed as argument, notstd::upper. Second put atypenamebefore theString::const_iterator.
– Andreas Spindler
Apr 28 '15 at 7:20
add a comment |
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:std::isuppercould be passed as argument, notstd::upper. Second put atypenamebefore theString::const_iterator.
– Andreas Spindler
Apr 28 '15 at 7:20
1
1
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:
std::isupper could be passed as argument, not std::upper. Second put a typename before the String::const_iterator.– Andreas Spindler
Apr 28 '15 at 7:20
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway:
std::isupper could be passed as argument, not std::upper. Second put a typename before the String::const_iterator.– Andreas Spindler
Apr 28 '15 at 7:20
add a comment |
up vote
9
down vote
This answer takes the string and puts it into a vector of strings. It uses the boost library.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
add a comment |
up vote
9
down vote
This answer takes the string and puts it into a vector of strings. It uses the boost library.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
add a comment |
up vote
9
down vote
up vote
9
down vote
This answer takes the string and puts it into a vector of strings. It uses the boost library.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
This answer takes the string and puts it into a vector of strings. It uses the boost library.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("t "));
answered Dec 9 '17 at 21:14
community wiki
NL628
add a comment |
add a comment |
up vote
8
down vote
Get Boost ! : -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
This example gives the output -
Somewhere
down
the
road
add a comment |
up vote
8
down vote
Get Boost ! : -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
This example gives the output -
Somewhere
down
the
road
add a comment |
up vote
8
down vote
up vote
8
down vote
Get Boost ! : -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
This example gives the output -
Somewhere
down
the
road
Get Boost ! : -)
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
This example gives the output -
Somewhere
down
the
road
answered Apr 7 '13 at 16:07
community wiki
Aleksey Bykov
add a comment |
add a comment |
up vote
8
down vote
The code below uses strtok() to split a string into tokens and stores the tokens in a vector.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string = "hello hi how are you nice weather we are having ok then bye";
char seps = " ,tn";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..nnn";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..nn";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "n";
cout << "nnn";
return 0;
}
add a comment |
up vote
8
down vote
The code below uses strtok() to split a string into tokens and stores the tokens in a vector.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string = "hello hi how are you nice weather we are having ok then bye";
char seps = " ,tn";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..nnn";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..nn";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "n";
cout << "nnn";
return 0;
}
add a comment |
up vote
8
down vote
up vote
8
down vote
The code below uses strtok() to split a string into tokens and stores the tokens in a vector.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string = "hello hi how are you nice weather we are having ok then bye";
char seps = " ,tn";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..nnn";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..nn";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "n";
cout << "nnn";
return 0;
}
The code below uses strtok() to split a string into tokens and stores the tokens in a vector.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string = "hello hi how are you nice weather we are having ok then bye";
char seps = " ,tn";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..nnn";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..nn";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "n";
cout << "nnn";
return 0;
}
edited Apr 29 '15 at 15:06
community wiki
2 revs, 2 users 99%
Software_Developer
add a comment |
add a comment |
1 2
3
next
protected by Blorgbeard Dec 4 '12 at 23:26
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
570
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)
– nlaq
Oct 25 '08 at 9:04
13
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }– pyon
Sep 29 '09 at 15:47
18
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e.
string sub; while (iss >> sub) cout << "Substring: " << sub << 'n';– Tony Delroy
Apr 11 '12 at 2:24
8
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split
– hB0
Oct 31 '13 at 0:23
5
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.
– Matt
Mar 31 '17 at 13:22