Get historical spelling corrected
Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.
import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs
nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)
for word in tokens:
rule1 = word.replace('y', 'i')
# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)
my sample input: Or puis que Dieu est ainsi descendu
ànous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
àMoyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.
here is the output
lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French
python python-3.x
add a comment |
Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.
import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs
nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)
for word in tokens:
rule1 = word.replace('y', 'i')
# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)
my sample input: Or puis que Dieu est ainsi descendu
ànous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
àMoyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.
here is the output
lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French
python python-3.x
1
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
1
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14
add a comment |
Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.
import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs
nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)
for word in tokens:
rule1 = word.replace('y', 'i')
# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)
my sample input: Or puis que Dieu est ainsi descendu
ànous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
àMoyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.
here is the output
lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French
python python-3.x
Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.
import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs
nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)
for word in tokens:
rule1 = word.replace('y', 'i')
# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)
my sample input: Or puis que Dieu est ainsi descendu
ànous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
àMoyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.
here is the output
lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French
python python-3.x
python python-3.x
edited Nov 27 '18 at 14:51
Ivan Kolesnikov
1,26111032
1,26111032
asked Nov 27 '18 at 10:57
TimatTimat
184
184
1
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
1
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14
add a comment |
1
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
1
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14
1
1
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
1
1
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14
add a comment |
1 Answer
1
active
oldest
votes
Use re.sub on the entire text.
One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.
import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri
Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.
To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.
Regex patterns explained:
re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.
r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.
r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).
Alternatively
Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.
For this you could use the pattern:
sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri
Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like(?=w)uu(?=w)if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.
– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53498130%2fget-historical-spelling-corrected%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use re.sub on the entire text.
One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.
import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri
Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.
To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.
Regex patterns explained:
re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.
r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.
r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).
Alternatively
Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.
For this you could use the pattern:
sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri
Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like(?=w)uu(?=w)if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.
– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
add a comment |
Use re.sub on the entire text.
One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.
import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri
Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.
To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.
Regex patterns explained:
re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.
r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.
r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).
Alternatively
Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.
For this you could use the pattern:
sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri
Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like(?=w)uu(?=w)if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.
– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
add a comment |
Use re.sub on the entire text.
One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.
import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri
Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.
To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.
Regex patterns explained:
re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.
r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.
r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).
Alternatively
Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.
For this you could use the pattern:
sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri
Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.
Use re.sub on the entire text.
One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.
import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri
Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.
To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.
Regex patterns explained:
re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.
r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.
r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).
Alternatively
Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.
For this you could use the pattern:
sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri
Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.
answered Nov 27 '18 at 11:38
BilkokuyaBilkokuya
781616
781616
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like(?=w)uu(?=w)if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.
– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
add a comment |
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like(?=w)uu(?=w)if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.
– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.
– Timat
Nov 27 '18 at 12:00
1
1
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like
(?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.– Bilkokuya
Nov 27 '18 at 12:06
@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like
(?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.– Bilkokuya
Nov 27 '18 at 12:06
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
Thank you so much it is solved.
– Timat
Nov 27 '18 at 12:27
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53498130%2fget-historical-spelling-corrected%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.
– Alex_P
Nov 27 '18 at 11:00
@Timat that should go in your question. :-)
– TrebuchetMS
Nov 27 '18 at 11:06
1
@Timat Please add your code in the post itself, not in comments.
– Mayank Porwal
Nov 27 '18 at 11:06
can you also add sample input?
– planetmaker
Nov 27 '18 at 11:14