Combine rows while importing txt file if they don't meet a condition

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

I have a large pipe delimitated file, Approx 2 million rows (2gs). The file is supposed to have 418 columns (|) per row, however, many of the rows have been unnecessarily split causing problems when importing the data.

When importing I would like to combine rows until the count of pipes in the subsequent row == 418.

Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.

For example, the highlighted rows should be combined into one (rows with 0 still contain information

I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.

I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.

%%time



correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for index, line in enumerate(fh):

        if index<20:

            if line.count('|')!=418:

                incorrect.append(line)

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

I suggest that you use the csv package. You can specify what delimiter to use.

– Code-Apprentice
Nov 27 '18 at 19:26

Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.

– schlafdata
Nov 27 '18 at 19:31

Can you give an example of a few rows of input? Are there any text fields that contain new line characters?

– Code-Apprentice
Nov 27 '18 at 21:13

Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor

– schlafdata
Nov 27 '18 at 21:50

That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n

– schlafdata
Nov 27 '18 at 21:51

|
show 1 more comment

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

When importing I would like to combine rows until the count of pipes in the subsequent row == 418.

Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.

For example, the highlighted rows should be combined into one (rows with 0 still contain information

I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.

I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.

%%time



correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for index, line in enumerate(fh):

        if index<20:

            if line.count('|')!=418:

                incorrect.append(line)

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

I suggest that you use the csv package. You can specify what delimiter to use.

– Code-Apprentice
Nov 27 '18 at 19:26

Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.

– schlafdata
Nov 27 '18 at 19:31

Can you give an example of a few rows of input? Are there any text fields that contain new line characters?

– Code-Apprentice
Nov 27 '18 at 21:13

Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor

– schlafdata
Nov 27 '18 at 21:50

That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n

– schlafdata
Nov 27 '18 at 21:51

|
show 1 more comment

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

When importing I would like to combine rows until the count of pipes in the subsequent row == 418.

Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.

For example, the highlighted rows should be combined into one (rows with 0 still contain information

I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.

I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.

%%time



correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for index, line in enumerate(fh):

        if index<20:

            if line.count('|')!=418:

                incorrect.append(line)

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

When importing I would like to combine rows until the count of pipes in the subsequent row == 418.

Most of the problems occur at column 90 and are followed by a row with 328. Others are split at 90, followed by a few rows of 0 and then 328. Ideally, all of these rows would be combined into one.

For example, the highlighted rows should be combined into one (rows with 0 still contain information

I thought about appending the incorrect rows to a list, and then combining them, but at 1 second per row, this would take approximately 26 days to complete.

I also tried to combine the lines before appending, but am afraid I will run into the same efficiency problems.

%%time



correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for index, line in enumerate(fh):

        if index<20:

            if line.count('|')!=418:

                incorrect.append(line)

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

ROW_ID|Quote Number|Status|Status Reason ADT|Name|Account|Alias ADT|......etc 418 Columns

python

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

edited Nov 27 '18 at 19:22

dustinos3

4591717

edited Nov 27 '18 at 19:22

dustinos3

4591717

edited Nov 27 '18 at 19:22

dustinos3

4591717

asked Nov 27 '18 at 18:51

schlafdata

112

asked Nov 27 '18 at 18:51

schlafdata

112

asked Nov 27 '18 at 18:51

schlafdata

112

I suggest that you use the csv package. You can specify what delimiter to use.

– Code-Apprentice
Nov 27 '18 at 19:26

Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.

– schlafdata
Nov 27 '18 at 19:31

Can you give an example of a few rows of input? Are there any text fields that contain new line characters?

– Code-Apprentice
Nov 27 '18 at 21:13

Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor

– schlafdata
Nov 27 '18 at 21:50

That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n

– schlafdata
Nov 27 '18 at 21:51

|
show 1 more comment

I suggest that you use the csv package. You can specify what delimiter to use.

– Code-Apprentice
Nov 27 '18 at 19:26

Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.

– schlafdata
Nov 27 '18 at 19:31

Can you give an example of a few rows of input? Are there any text fields that contain new line characters?

– Code-Apprentice
Nov 27 '18 at 21:13

Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor

– schlafdata
Nov 27 '18 at 21:50

That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n

– schlafdata
Nov 27 '18 at 21:51

I suggest that you use the csv package. You can specify what delimiter to use.

– Code-Apprentice
Nov 27 '18 at 19:26

Yes, I am using import csv package, and I can specify the Pipe delimiter. The problem is not all of the rows contain the correct number of delimiters.

– schlafdata
Nov 27 '18 at 19:31

Can you give an example of a few rows of input? Are there any text fields that contain new line characters?

– Code-Apprentice
Nov 27 '18 at 21:13

Install 2 EBKs, KP, install HUBs at teller and Access bar , use HUBs removed from Tellers. TCR connection . OT Labor

– schlafdata
Nov 27 '18 at 21:50

That is the end of a line that was cut off. Wish I could post more examples but the information is sensitive, there are no n

– schlafdata
Nov 27 '18 at 21:51

|
show 1 more comment

1 Answer
1

active

oldest

votes

It turns on when you use Enumerate it takes drastically longer. I removed that part of my code, appended all of the problem rows to a list. It took 34 seconds Vs the 26 days I anticipated. Then I joined the entire list, and split the list every 418 pipes

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53506319%2fcombine-rows-while-importing-txt-file-if-they-dont-meet-a-condition%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

add a comment |

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

add a comment |

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

correct = 

incorrect = 



with open('C:/Users/jschlajo/Desktop/export_all_quotes_compass.txt', 'r') as fh:

    for line in fh:

        if line.count('|')==418:

                correct.append(line)

        if line.count('|')!=418:

                incorrect.append(line)

                

                

test_1 = ' '.join(incorrect)                

i = iter(test_1.split('|'))





span = 418

words = test_1.split("|")

combined = ["|".join(words[i:i+span]) for i in range(0, len(words), span)]

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

edited Nov 28 '18 at 15:28

answered Nov 27 '18 at 23:32

schlafdata

112

answered Nov 27 '18 at 23:32

schlafdata

112

answered Nov 27 '18 at 23:32

schlafdata

112

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

add a comment |

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

Just a note for the future, please do not post images of code. Instead, paste it into your question/answer and format it as a code block.

– Tomothy32
Nov 28 '18 at 1:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl