Combine rows while importing txt file if they don't meet a condition
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
I have a large pipe delimitated file, Approx 2 million rows (2gs). The file is supposed to have 418 columns (|) per row, however, many of the rows have been unnecessarily split causing problems when importing the data.
When importing I would like to combine rows until the count of pipes in the subsequent row == 418.
Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.
For example, the highlighted rows should be combined into one (rows with 0 still contain information
I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.
I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.
%%time
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for index, line in enumerate(fh):
if index<20:
if line.count('|')!=418:
incorrect.append(line)
python
|
show 1 more comment
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
I have a large pipe delimitated file, Approx 2 million rows (2gs). The file is supposed to have 418 columns (|) per row, however, many of the rows have been unnecessarily split causing problems when importing the data.
When importing I would like to combine rows until the count of pipes in the subsequent row == 418.
Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.
For example, the highlighted rows should be combined into one (rows with 0 still contain information
I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.
I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.
%%time
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for index, line in enumerate(fh):
if index<20:
if line.count('|')!=418:
incorrect.append(line)
python
I suggest that you use thecsv
package. You can specify what delimiter to use.
– Code-Apprentice
Nov 27 '18 at 19:26
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51
|
show 1 more comment
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
I have a large pipe delimitated file, Approx 2 million rows (2gs). The file is supposed to have 418 columns (|) per row, however, many of the rows have been unnecessarily split causing problems when importing the data.
When importing I would like to combine rows until the count of pipes in the subsequent row == 418.
Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.
For example, the highlighted rows should be combined into one (rows with 0 still contain information
I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.
I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.
%%time
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for index, line in enumerate(fh):
if index<20:
if line.count('|')!=418:
incorrect.append(line)
python
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
I have a large pipe delimitated file, Approx 2 million rows (2gs). The file is supposed to have 418 columns (|) per row, however, many of the rows have been unnecessarily split causing problems when importing the data.
When importing I would like to combine rows until the count of pipes in the subsequent row == 418.
Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.
For example, the highlighted rows should be combined into one (rows with 0 still contain information
I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.
I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.
%%time
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for index, line in enumerate(fh):
if index<20:
if line.count('|')!=418:
incorrect.append(line)
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns
python
python
edited Nov 27 '18 at 19:22
dustinos3
4591717
4591717
asked Nov 27 '18 at 18:51
schlafdataschlafdata
112
112
I suggest that you use thecsv
package. You can specify what delimiter to use.
– Code-Apprentice
Nov 27 '18 at 19:26
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51
|
show 1 more comment
I suggest that you use thecsv
package. You can specify what delimiter to use.
– Code-Apprentice
Nov 27 '18 at 19:26
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51
I suggest that you use the
csv
package. You can specify what delimiter to use.– Code-Apprentice
Nov 27 '18 at 19:26
I suggest that you use the
csv
package. You can specify what delimiter to use.– Code-Apprentice
Nov 27 '18 at 19:26
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51
|
show 1 more comment
1 Answer
1
active
oldest
votes
It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53506319%2fcombine-rows-while-importing-txt-file-if-they-dont-meet-a-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
add a comment |
It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
add a comment |
It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
correct =
incorrect =
with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:
for line in fh:
if line.count('|')==418:
correct.append(line)
if line.count('|')!=418:
incorrect.append(line)
test_1 = ' '.join(incorrect)
i = iter(test_1.split('|'))
span = 418
words = test_1.split("|")
combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]
edited Nov 28 '18 at 15:28
answered Nov 27 '18 at 23:32
schlafdataschlafdata
112
112
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
add a comment |
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.
– Tomothy32
Nov 28 '18 at 1:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53506319%2fcombine-rows-while-importing-txt-file-if-they-dont-meet-a-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I suggest that you use the
csv
package. You can specify what delimiter to use.– Code-Apprentice
Nov 27 '18 at 19:26
Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.
– schlafdata
Nov 27 '18 at 19:31
Can you give an example of a few rows of input? Are there any text fields that contain new line characters?
– Code-Apprentice
Nov 27 '18 at 21:13
Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor
– schlafdata
Nov 27 '18 at 21:50
That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n
– schlafdata
Nov 27 '18 at 21:51