How to use JohnSnowLabs NLP Spell correction module NorvigSweetingModel?

up vote
0
down vote

favorite

I was going through the JohnSnowLabs SpellChecker here.

I found the Norvig's algorithm implementation there, and the example section has just the following two lines:

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

NorvigSweetingModel.pretrained()

Can anyone please help me on how to apply this pretrained model on my dataframe (df)below for spell correcting the "names" column.

+----------------+---+------------+

|           names|age|       color|

+----------------+---+------------+

|      [abc, cde]| 19|    red, abc|

|[eefg, efa, efb]|192|efg, efz efz|

+----------------+---+------------+

I have tried to do it as follows:

val schk = NorvigSweetingModel.pretrained().setInputCols("names").setOutputCol("Corrected")



val cdf = schk.transform(df)

But the above code gave me the following error:

java.lang.IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SPELL_a1f11bacb851. Received inputCols: names. Make sure such columns have following annotator types: token

  at scala.Predef$.require(Predef.scala:224)

  at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:51)

  ... 49 elided

Thanks.

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

add a comment |

up vote
0
down vote

favorite

I was going through the JohnSnowLabs SpellChecker here.

I found the Norvig's algorithm implementation there, and the example section has just the following two lines:

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

NorvigSweetingModel.pretrained()

Can anyone please help me on how to apply this pretrained model on my dataframe (df)below for spell correcting the "names" column.

+----------------+---+------------+

|           names|age|       color|

+----------------+---+------------+

|      [abc, cde]| 19|    red, abc|

|[eefg, efa, efb]|192|efg, efz efz|

+----------------+---+------------+

I have tried to do it as follows:

val schk = NorvigSweetingModel.pretrained().setInputCols("names").setOutputCol("Corrected")



val cdf = schk.transform(df)

But the above code gave me the following error:

java.lang.IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SPELL_a1f11bacb851. Received inputCols: names. Make sure such columns have following annotator types: token

  at scala.Predef$.require(Predef.scala:224)

  at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:51)

  ... 49 elided

Thanks.

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

add a comment |

up vote
0
down vote

favorite

I was going through the JohnSnowLabs SpellChecker here.

I found the Norvig's algorithm implementation there, and the example section has just the following two lines:

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

NorvigSweetingModel.pretrained()

Can anyone please help me on how to apply this pretrained model on my dataframe (df)below for spell correcting the "names" column.

+----------------+---+------------+

|           names|age|       color|

+----------------+---+------------+

|      [abc, cde]| 19|    red, abc|

|[eefg, efa, efb]|192|efg, efz efz|

+----------------+---+------------+

I have tried to do it as follows:

val schk = NorvigSweetingModel.pretrained().setInputCols("names").setOutputCol("Corrected")



val cdf = schk.transform(df)

But the above code gave me the following error:

java.lang.IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SPELL_a1f11bacb851. Received inputCols: names. Make sure such columns have following annotator types: token

  at scala.Predef$.require(Predef.scala:224)

  at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:51)

  ... 49 elided

Thanks.

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

I was going through the JohnSnowLabs SpellChecker here.

I found the Norvig's algorithm implementation there, and the example section has just the following two lines:

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

NorvigSweetingModel.pretrained()

Can anyone please help me on how to apply this pretrained model on my dataframe (df)below for spell correcting the "names" column.

+----------------+---+------------+

|           names|age|       color|

+----------------+---+------------+

|      [abc, cde]| 19|    red, abc|

|[eefg, efa, efb]|192|efg, efz efz|

+----------------+---+------------+

I have tried to do it as follows:

val schk = NorvigSweetingModel.pretrained().setInputCols("names").setOutputCol("Corrected")



val cdf = schk.transform(df)

But the above code gave me the following error:

java.lang.IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SPELL_a1f11bacb851. Received inputCols: names. Make sure such columns have following annotator types: token

  at scala.Predef$.require(Predef.scala:224)

  at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:51)

  ... 49 elided

Thanks.

scala apache-spark nlp apache-spark-ml johnsnowlabs-spark-nlp

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

edited Nov 28 at 5:13

Community♦

edited Nov 28 at 5:13

Community♦

edited Nov 28 at 5:13

Community♦

asked Nov 21 at 18:15

user3243499

72011125

asked Nov 21 at 18:15

user3243499

72011125

asked Nov 21 at 18:15

user3243499

72011125

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

spark-nlp are designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.

The exception already tells you that input to the NorvigSweetingModel should be tokenized:

Make sure such columns have following annotator types: token

If I am not mistaken, at minimum you'll have assemble documents and tokenized here.

import com.johnsnowlabs.nlp.DocumentAssembler

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

import com.johnsnowlabs.nlp.annotators.Tokenizer

import org.apache.spark.ml.Pipeline



val df = Seq(Seq("abc", "cde"), Seq("eefg", "efa", "efb")).toDF("names")



val nlpPipeline = new Pipeline().setStages(Array(

  new DocumentAssembler().setInputCol("names").setOutputCol("document"),

  new Tokenizer().setInputCols("document").setOutputCol("tokens"),

  NorvigSweetingModel.pretrained().setInputCols("tokens").setOutputCol("corrected")

))

A Pipeline like this, can be applied on your data with small adjustment - input data has to be string not array<string>*:

val result = df

  .transform(_.withColumn("names", concat_ws(" ", $"names")))

  .transform(df => nlpPipeline.fit(df).transform(df))

result.show()

+------------+--------------------+--------------------+--------------------+

|       names|            document|              tokens|           corrected|

+------------+--------------------+--------------------+--------------------+

|     abc cde|[[document, 0, 6,...|[[token, 0, 2, ab...|[[token, 0, 2, ab...|

|eefg efa efb|[[document, 0, 11...|[[token, 0, 3, ee...|[[token, 0, 3, ee...|

+------------+--------------------+--------------------+--------------------+

If you want an output that can be exported you should extend your Pipeline with Finisher.

import com.johnsnowlabs.nlp.Finisher



new Finisher().setInputCols("corrected").transform(result).show

 +------------+------------------+

 |       names|finished_corrected|

 +------------+------------------+

 |     abc cde|        [abc, cde]|

 |eefg efa efb|  [eefg, efa, efb]|

 +------------+------------------+

* According to the docs DocumentAssembler

can read either a String column or an Array[String]

but it doesn't look like it works in practice in 1.7.3:

df.transform(df => nlpPipeline.fit(df).transform(df)).show()

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(names)' due to data type mismatch: argument 1 requires string type, however, '`names`' is of array<string> type.;;

'Project [names#62, UDF(names#62) AS document#343]

+- AnalysisBarrier

      +- Project [value#60 AS names#62]

         +- LocalRelation [value#60]

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418267%2fhow-to-use-johnsnowlabs-nlp-spell-correction-module-norvigsweetingmodel%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

spark-nlp are designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.

The exception already tells you that input to the NorvigSweetingModel should be tokenized:

Make sure such columns have following annotator types: token

If I am not mistaken, at minimum you'll have assemble documents and tokenized here.

import com.johnsnowlabs.nlp.DocumentAssembler

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

import com.johnsnowlabs.nlp.annotators.Tokenizer

import org.apache.spark.ml.Pipeline



val df = Seq(Seq("abc", "cde"), Seq("eefg", "efa", "efb")).toDF("names")



val nlpPipeline = new Pipeline().setStages(Array(

  new DocumentAssembler().setInputCol("names").setOutputCol("document"),

  new Tokenizer().setInputCols("document").setOutputCol("tokens"),

  NorvigSweetingModel.pretrained().setInputCols("tokens").setOutputCol("corrected")

))

A Pipeline like this, can be applied on your data with small adjustment - input data has to be string not array<string>*:

val result = df

  .transform(_.withColumn("names", concat_ws(" ", $"names")))

  .transform(df => nlpPipeline.fit(df).transform(df))

result.show()

+------------+--------------------+--------------------+--------------------+

|       names|            document|              tokens|           corrected|

+------------+--------------------+--------------------+--------------------+

|     abc cde|[[document, 0, 6,...|[[token, 0, 2, ab...|[[token, 0, 2, ab...|

|eefg efa efb|[[document, 0, 11...|[[token, 0, 3, ee...|[[token, 0, 3, ee...|

+------------+--------------------+--------------------+--------------------+

If you want an output that can be exported you should extend your Pipeline with Finisher.

import com.johnsnowlabs.nlp.Finisher



new Finisher().setInputCols("corrected").transform(result).show

 +------------+------------------+

 |       names|finished_corrected|

 +------------+------------------+

 |     abc cde|        [abc, cde]|

 |eefg efa efb|  [eefg, efa, efb]|

 +------------+------------------+

* According to the docs DocumentAssembler

can read either a String column or an Array[String]

but it doesn't look like it works in practice in 1.7.3:

df.transform(df => nlpPipeline.fit(df).transform(df)).show()

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(names)' due to data type mismatch: argument 1 requires string type, however, '`names`' is of array<string> type.;;

'Project [names#62, UDF(names#62) AS document#343]

+- AnalysisBarrier

      +- Project [value#60 AS names#62]

         +- LocalRelation [value#60]

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

add a comment |

up vote
1
down vote

accepted

spark-nlp are designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.

The exception already tells you that input to the NorvigSweetingModel should be tokenized:

Make sure such columns have following annotator types: token

If I am not mistaken, at minimum you'll have assemble documents and tokenized here.

import com.johnsnowlabs.nlp.DocumentAssembler

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

import com.johnsnowlabs.nlp.annotators.Tokenizer

import org.apache.spark.ml.Pipeline



val df = Seq(Seq("abc", "cde"), Seq("eefg", "efa", "efb")).toDF("names")



val nlpPipeline = new Pipeline().setStages(Array(

  new DocumentAssembler().setInputCol("names").setOutputCol("document"),

  new Tokenizer().setInputCols("document").setOutputCol("tokens"),

  NorvigSweetingModel.pretrained().setInputCols("tokens").setOutputCol("corrected")

))

A Pipeline like this, can be applied on your data with small adjustment - input data has to be string not array<string>*:

val result = df

  .transform(_.withColumn("names", concat_ws(" ", $"names")))

  .transform(df => nlpPipeline.fit(df).transform(df))

result.show()

+------------+--------------------+--------------------+--------------------+

|       names|            document|              tokens|           corrected|

+------------+--------------------+--------------------+--------------------+

|     abc cde|[[document, 0, 6,...|[[token, 0, 2, ab...|[[token, 0, 2, ab...|

|eefg efa efb|[[document, 0, 11...|[[token, 0, 3, ee...|[[token, 0, 3, ee...|

+------------+--------------------+--------------------+--------------------+

If you want an output that can be exported you should extend your Pipeline with Finisher.

import com.johnsnowlabs.nlp.Finisher



new Finisher().setInputCols("corrected").transform(result).show

 +------------+------------------+

 |       names|finished_corrected|

 +------------+------------------+

 |     abc cde|        [abc, cde]|

 |eefg efa efb|  [eefg, efa, efb]|

 +------------+------------------+

* According to the docs DocumentAssembler

can read either a String column or an Array[String]

but it doesn't look like it works in practice in 1.7.3:

df.transform(df => nlpPipeline.fit(df).transform(df)).show()

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(names)' due to data type mismatch: argument 1 requires string type, however, '`names`' is of array<string> type.;;

'Project [names#62, UDF(names#62) AS document#343]

+- AnalysisBarrier

      +- Project [value#60 AS names#62]

         +- LocalRelation [value#60]

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

add a comment |

up vote
1
down vote

accepted

spark-nlp are designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.

The exception already tells you that input to the NorvigSweetingModel should be tokenized:

Make sure such columns have following annotator types: token

If I am not mistaken, at minimum you'll have assemble documents and tokenized here.

import com.johnsnowlabs.nlp.DocumentAssembler

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

import com.johnsnowlabs.nlp.annotators.Tokenizer

import org.apache.spark.ml.Pipeline



val df = Seq(Seq("abc", "cde"), Seq("eefg", "efa", "efb")).toDF("names")



val nlpPipeline = new Pipeline().setStages(Array(

  new DocumentAssembler().setInputCol("names").setOutputCol("document"),

  new Tokenizer().setInputCols("document").setOutputCol("tokens"),

  NorvigSweetingModel.pretrained().setInputCols("tokens").setOutputCol("corrected")

))

A Pipeline like this, can be applied on your data with small adjustment - input data has to be string not array<string>*:

val result = df

  .transform(_.withColumn("names", concat_ws(" ", $"names")))

  .transform(df => nlpPipeline.fit(df).transform(df))

result.show()

+------------+--------------------+--------------------+--------------------+

|       names|            document|              tokens|           corrected|

+------------+--------------------+--------------------+--------------------+

|     abc cde|[[document, 0, 6,...|[[token, 0, 2, ab...|[[token, 0, 2, ab...|

|eefg efa efb|[[document, 0, 11...|[[token, 0, 3, ee...|[[token, 0, 3, ee...|

+------------+--------------------+--------------------+--------------------+

If you want an output that can be exported you should extend your Pipeline with Finisher.

import com.johnsnowlabs.nlp.Finisher



new Finisher().setInputCols("corrected").transform(result).show

 +------------+------------------+

 |       names|finished_corrected|

 +------------+------------------+

 |     abc cde|        [abc, cde]|

 |eefg efa efb|  [eefg, efa, efb]|

 +------------+------------------+

* According to the docs DocumentAssembler

can read either a String column or an Array[String]

but it doesn't look like it works in practice in 1.7.3:

df.transform(df => nlpPipeline.fit(df).transform(df)).show()

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(names)' due to data type mismatch: argument 1 requires string type, however, '`names`' is of array<string> type.;;

'Project [names#62, UDF(names#62) AS document#343]

+- AnalysisBarrier

      +- Project [value#60 AS names#62]

         +- LocalRelation [value#60]

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

spark-nlp are designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.

The exception already tells you that input to the NorvigSweetingModel should be tokenized:

Make sure such columns have following annotator types: token

If I am not mistaken, at minimum you'll have assemble documents and tokenized here.

import com.johnsnowlabs.nlp.DocumentAssembler

import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel

import com.johnsnowlabs.nlp.annotators.Tokenizer

import org.apache.spark.ml.Pipeline



val df = Seq(Seq("abc", "cde"), Seq("eefg", "efa", "efb")).toDF("names")



val nlpPipeline = new Pipeline().setStages(Array(

  new DocumentAssembler().setInputCol("names").setOutputCol("document"),

  new Tokenizer().setInputCols("document").setOutputCol("tokens"),

  NorvigSweetingModel.pretrained().setInputCols("tokens").setOutputCol("corrected")

))

A Pipeline like this, can be applied on your data with small adjustment - input data has to be string not array<string>*:

val result = df

  .transform(_.withColumn("names", concat_ws(" ", $"names")))

  .transform(df => nlpPipeline.fit(df).transform(df))

result.show()

+------------+--------------------+--------------------+--------------------+

|       names|            document|              tokens|           corrected|

+------------+--------------------+--------------------+--------------------+

|     abc cde|[[document, 0, 6,...|[[token, 0, 2, ab...|[[token, 0, 2, ab...|

|eefg efa efb|[[document, 0, 11...|[[token, 0, 3, ee...|[[token, 0, 3, ee...|

+------------+--------------------+--------------------+--------------------+

If you want an output that can be exported you should extend your Pipeline with Finisher.

import com.johnsnowlabs.nlp.Finisher



new Finisher().setInputCols("corrected").transform(result).show

 +------------+------------------+

 |       names|finished_corrected|

 +------------+------------------+

 |     abc cde|        [abc, cde]|

 |eefg efa efb|  [eefg, efa, efb]|

 +------------+------------------+

* According to the docs DocumentAssembler

can read either a String column or an Array[String]

but it doesn't look like it works in practice in 1.7.3:

df.transform(df => nlpPipeline.fit(df).transform(df)).show()

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(names)' due to data type mismatch: argument 1 requires string type, however, '`names`' is of array<string> type.;;

'Project [names#62, UDF(names#62) AS document#343]

+- AnalysisBarrier

      +- Project [value#60 AS names#62]

         +- LocalRelation [value#60]

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

edited Nov 21 at 23:26

answered Nov 21 at 19:04

user10465355

1,035310

answered Nov 21 at 19:04

user10465355

1,035310

answered Nov 21 at 19:04

user10465355

1,035310

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

add a comment |

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

How to get the spell corrected values. Values under "corrected" comes as [[token, 0, 3, eefg, [sentence -> 1]], [token, 5, 7, efa, [sentence -> 1]], [token, 9, 11, efb, [sentence -> 1]]]
– user3243499
Nov 21 at 19:21

Does each of this list items, like [sentence ->1], have any standard structure/meaning?
– user3243499
Nov 21 at 19:23

@user3243499 How to get the spell corrected values - Please check the Finisher part.
– user10465355
Nov 21 at 20:18

Does each of this list items, like [sentence ->1], have any standard structure/meaning? - it is metadata. It is map<string,string> so structure is not fixed, but in this case it contains information about the sentence form the document.
– user10465355
Nov 21 at 20:21

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl