Get historical spelling corrected












1















Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.



import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs

nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)

for word in tokens:
rule1 = word.replace('y', 'i')

# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)


my sample input: Or puis que Dieu est ainsi descendu
à nous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
à Moyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.



here is the output



lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French









share|improve this question




















  • 1





    Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

    – Alex_P
    Nov 27 '18 at 11:00











  • @Timat that should go in your question. :-)

    – TrebuchetMS
    Nov 27 '18 at 11:06






  • 1





    @Timat Please add your code in the post itself, not in comments.

    – Mayank Porwal
    Nov 27 '18 at 11:06











  • can you also add sample input?

    – planetmaker
    Nov 27 '18 at 11:14
















1















Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.



import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs

nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)

for word in tokens:
rule1 = word.replace('y', 'i')

# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)


my sample input: Or puis que Dieu est ainsi descendu
à nous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
à Moyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.



here is the output



lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French









share|improve this question




















  • 1





    Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

    – Alex_P
    Nov 27 '18 at 11:00











  • @Timat that should go in your question. :-)

    – TrebuchetMS
    Nov 27 '18 at 11:06






  • 1





    @Timat Please add your code in the post itself, not in comments.

    – Mayank Porwal
    Nov 27 '18 at 11:06











  • can you also add sample input?

    – planetmaker
    Nov 27 '18 at 11:14














1












1








1


1






Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.



import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs

nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)

for word in tokens:
rule1 = word.replace('y', 'i')

# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)


my sample input: Or puis que Dieu est ainsi descendu
à nous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
à Moyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.



here is the output



lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French









share|improve this question
















Hello everyone I am posting this concern for my first time. I am writing a python script to make a program which will return standard words form. I base on rules to transform a historical text(spelling normalization). Here, the code does not work properly. It merely displays the modified word but not the entire file. Please, I ask for ideas on how to solve.



import re, string, unicodedata
from nltk.corpus import stopwords
import spacy
import codecs

nlp = spacy.load('fr')
with codecs.open(r'/home/m16/fatkab/RD_project/corpus.txt', encoding='utf8')as f:
word =f.read()
tokens = re.split(r'W+', word)
print (tokens)

for word in tokens:
rule1 = word.replace('y', 'i')

# to avoid modifying y as a word itself:
if word.endswith ('y')and len(word) >= 2:
print(rule1)


my sample input: Or puis que Dieu est ainsi descendu
à nous,qu'il luy a pleu de nous communiquer
ainsi sa bonté : n'est ce pas raison que nous
soyons du tout siens? Et d'autant qu'il nous a tendu
la main pour nous racheter, ne faut-il pas que
nous soyons son heritage, quand il nous a acquis
par sa vertu? Le peuple donc s'il eust eu vn grain
de prudence , deuoit bien se ranger en toute humilité
pour receuoir la doctrine qui luy estoit
preschee par Moyse. Et mesme quelle authorite
meritoit la Loy , qui estoit ainsi approuuee par
tant de miracles?Car Dieu ne commande pas simplement
à Moyse de parler, apres l'auoir choisi
pour son prophete:mais il le tire en la montagne,
il le separe de la compagnie des hommes,afin que
quand il viendra mettre en auant la Loy,qu'on le
tienne comme vn Ange,& non point comme vne creature mortelle.



here is the output



lui
lui
lui
ai
oui
Loi
lui
foi
Loi
hui
soi
lui
lui
lui
ci
Loi
soi
lui
ai
lui
lui
doi
quoi
soi
ai
lui
lui
soi
# the language is French






python python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 14:51









Ivan Kolesnikov

1,26111032




1,26111032










asked Nov 27 '18 at 10:57









TimatTimat

184




184








  • 1





    Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

    – Alex_P
    Nov 27 '18 at 11:00











  • @Timat that should go in your question. :-)

    – TrebuchetMS
    Nov 27 '18 at 11:06






  • 1





    @Timat Please add your code in the post itself, not in comments.

    – Mayank Porwal
    Nov 27 '18 at 11:06











  • can you also add sample input?

    – planetmaker
    Nov 27 '18 at 11:14














  • 1





    Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

    – Alex_P
    Nov 27 '18 at 11:00











  • @Timat that should go in your question. :-)

    – TrebuchetMS
    Nov 27 '18 at 11:06






  • 1





    @Timat Please add your code in the post itself, not in comments.

    – Mayank Porwal
    Nov 27 '18 at 11:06











  • can you also add sample input?

    – planetmaker
    Nov 27 '18 at 11:14








1




1





Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

– Alex_P
Nov 27 '18 at 11:00





Please add your code, your attempts and finally your error message or at least incorrect output. With your question we cannot reproduce your problem.

– Alex_P
Nov 27 '18 at 11:00













@Timat that should go in your question. :-)

– TrebuchetMS
Nov 27 '18 at 11:06





@Timat that should go in your question. :-)

– TrebuchetMS
Nov 27 '18 at 11:06




1




1





@Timat Please add your code in the post itself, not in comments.

– Mayank Porwal
Nov 27 '18 at 11:06





@Timat Please add your code in the post itself, not in comments.

– Mayank Porwal
Nov 27 '18 at 11:06













can you also add sample input?

– planetmaker
Nov 27 '18 at 11:14





can you also add sample input?

– planetmaker
Nov 27 '18 at 11:14












1 Answer
1






active

oldest

votes


















2














Use re.sub on the entire text.



One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.



import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri


Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.



To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.



Regex patterns explained:



re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.



r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.



r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).





Alternatively



Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.



For this you could use the pattern:



sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri


Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.






share|improve this answer
























  • This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

    – Timat
    Nov 27 '18 at 12:00






  • 1





    @Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

    – Bilkokuya
    Nov 27 '18 at 12:06













  • Thank you so much it is solved.

    – Timat
    Nov 27 '18 at 12:27











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53498130%2fget-historical-spelling-corrected%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Use re.sub on the entire text.



One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.



import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri


Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.



To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.



Regex patterns explained:



re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.



r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.



r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).





Alternatively



Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.



For this you could use the pattern:



sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri


Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.






share|improve this answer
























  • This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

    – Timat
    Nov 27 '18 at 12:00






  • 1





    @Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

    – Bilkokuya
    Nov 27 '18 at 12:06













  • Thank you so much it is solved.

    – Timat
    Nov 27 '18 at 12:27
















2














Use re.sub on the entire text.



One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.



import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri


Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.



To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.



Regex patterns explained:



re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.



r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.



r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).





Alternatively



Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.



For this you could use the pattern:



sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri


Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.






share|improve this answer
























  • This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

    – Timat
    Nov 27 '18 at 12:00






  • 1





    @Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

    – Bilkokuya
    Nov 27 '18 at 12:06













  • Thank you so much it is solved.

    – Timat
    Nov 27 '18 at 12:27














2












2








2







Use re.sub on the entire text.



One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.



import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri


Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.



To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.



Regex patterns explained:



re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.



r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.



r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).





Alternatively



Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.



For this you could use the pattern:



sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri


Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.






share|improve this answer













Use re.sub on the entire text.



One major benefit of regex is that you can run a rule across large amounts of text - without having to manually tokenise and rebuild the output.



import re
text = "ouy you are the best luy guy in the try"
sub_pattern = re.compile(r"y(W+|$)")
print(re.sub(sub_pattern, r"i1", text))
# oui you are the best lui gui in the tri


Here we use the re.sub functionality to replace each match of the pattern with our replacement, across the entire file.



To maintain the spaces between the lines - we use the backreference 1 in the replacement pattern. This adds the text from capture group (1) in the match, back into the output.



Regex patterns explained:



re.compile - if you're using the same regex over and over, compiling it once saves the machine having to keep re-computing it. In this case, it's just used to separate that regex onto it's own line for clarity.



r"y(W+|$)" - the r tells python to treat the string as raw, that is backslashes will not escape characters incorrectly. To match the "y"s at the end of strings, the rule is "a 'y' followed by non-word characters, or the end of the string ($)". This is the pattern we use to match all the "incorrect" 'y' endings in the input. Note that the whitespace is captured in a group () so we can use it in the backreference later.



r"i1"1 - First we want to replace the matched y+whitespace with an "i" as per your rules. Then, we need to ensure we put the whitespace back in - which we do with the backreference 1 which adds whatever content was captured by group1 in our pattern (W+|$).





Alternatively



Instead of capturing the whitespace, replacing it and adding it back in. We can also use a non-capturing group in the original pattern - so we only capture the "y" and replace it.



For this you could use the pattern:



sub_pattern = re.compile(r"y(?=W+|$)")
print(re.sub(sub_pattern, r"i", text))
# oui you are the best lui gui in the tri


Note that the whitespace matching pattern is now prepended with ?= which denotes it is a non-capturing lookahead. This means it will check that these characters exist after the "y" but it does not remove them from the string during the replacement. As such, the replacement only needs to replace with "i" as the whitespace will not be modified.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 27 '18 at 11:38









BilkokuyaBilkokuya

781616




781616













  • This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

    – Timat
    Nov 27 '18 at 12:00






  • 1





    @Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

    – Bilkokuya
    Nov 27 '18 at 12:06













  • Thank you so much it is solved.

    – Timat
    Nov 27 '18 at 12:27



















  • This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

    – Timat
    Nov 27 '18 at 12:00






  • 1





    @Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

    – Bilkokuya
    Nov 27 '18 at 12:06













  • Thank you so much it is solved.

    – Timat
    Nov 27 '18 at 12:27

















This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

– Timat
Nov 27 '18 at 12:00





This is very useful! thank so much for your great assistance, however, I have a question regarding other modifications.How to use regex when the character to be changed is located in the middle of the word and also concerns many words from different lemmas such us 'sauuage,gouuernement ,inuoque etc where I need to turn one 'u' into 'v'. since I am not good at regex I was proceeding individually.

– Timat
Nov 27 '18 at 12:00




1




1





@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

– Bilkokuya
Nov 27 '18 at 12:06







@Timat Often the easiest solution will be to create a number of separate rules that solve individual/specific issues (such as ending y -> i) and then running them one after another (rather than trying to make a single regex pattern to solve everything). For your 'uu' rule for example you might simply replace all 'uu' with 'v', or even check something like (?=w)uu(?=w) if you want to ensure the 'uu' has at least one letter before&after it. If you're still unsure, please just ask as a separate question, and mark this as accepted if it has solved the issue you originally posted.

– Bilkokuya
Nov 27 '18 at 12:06















Thank you so much it is solved.

– Timat
Nov 27 '18 at 12:27





Thank you so much it is solved.

– Timat
Nov 27 '18 at 12:27




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53498130%2fget-historical-spelling-corrected%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lallio

Futebolista

Jornalista