Solr Queries With Dashes












2















I am currently using solr edismax to do searches on our website. What I'm looking to do, is essentially have dashes get ignored.



So if I search the words, "wi-fi adapter". And I have a document, with a title, "wifi adapter". I'll get no results.



I am currently using solr.MappingCharFilterFactory to map dashes to spaces. This is what my text_general fieldtype looks like in my schema.



  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
</analyzer>
</fieldType>


My mapping.txt contains the line..



"-" => " "


So what this rule does, is it converts the dashes to a space.



So if I search "wi fi adapter", it will always show the same results as "wi fi adapter", but won't show results for "wifi adapter".



Is there any way to treat dashes like this? Essentially I'd want to treat "wifi adapter", "wi-fi adapter", and "wi fi adapter" the same.



-Paul










share|improve this question



























    2















    I am currently using solr edismax to do searches on our website. What I'm looking to do, is essentially have dashes get ignored.



    So if I search the words, "wi-fi adapter". And I have a document, with a title, "wifi adapter". I'll get no results.



    I am currently using solr.MappingCharFilterFactory to map dashes to spaces. This is what my text_general fieldtype looks like in my schema.



      <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
    <tokenizer class="solr.ClassicTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.ClassicTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
    <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    </analyzer>
    </fieldType>


    My mapping.txt contains the line..



    "-" => " "


    So what this rule does, is it converts the dashes to a space.



    So if I search "wi fi adapter", it will always show the same results as "wi fi adapter", but won't show results for "wifi adapter".



    Is there any way to treat dashes like this? Essentially I'd want to treat "wifi adapter", "wi-fi adapter", and "wi fi adapter" the same.



    -Paul










    share|improve this question

























      2












      2








      2








      I am currently using solr edismax to do searches on our website. What I'm looking to do, is essentially have dashes get ignored.



      So if I search the words, "wi-fi adapter". And I have a document, with a title, "wifi adapter". I'll get no results.



      I am currently using solr.MappingCharFilterFactory to map dashes to spaces. This is what my text_general fieldtype looks like in my schema.



        <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      </analyzer>
      </fieldType>


      My mapping.txt contains the line..



      "-" => " "


      So what this rule does, is it converts the dashes to a space.



      So if I search "wi fi adapter", it will always show the same results as "wi fi adapter", but won't show results for "wifi adapter".



      Is there any way to treat dashes like this? Essentially I'd want to treat "wifi adapter", "wi-fi adapter", and "wi fi adapter" the same.



      -Paul










      share|improve this question














      I am currently using solr edismax to do searches on our website. What I'm looking to do, is essentially have dashes get ignored.



      So if I search the words, "wi-fi adapter". And I have a document, with a title, "wifi adapter". I'll get no results.



      I am currently using solr.MappingCharFilterFactory to map dashes to spaces. This is what my text_general fieldtype looks like in my schema.



        <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      </analyzer>
      </fieldType>


      My mapping.txt contains the line..



      "-" => " "


      So what this rule does, is it converts the dashes to a space.



      So if I search "wi fi adapter", it will always show the same results as "wi fi adapter", but won't show results for "wifi adapter".



      Is there any way to treat dashes like this? Essentially I'd want to treat "wifi adapter", "wi-fi adapter", and "wi fi adapter" the same.



      -Paul







      solr solrnet






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 27 '18 at 18:49









      PaulPaul

      3271414




      3271414
























          1 Answer
          1






          active

          oldest

          votes


















          3














          You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.



          The WordDelimiterGraphFilterFactory has many attributes.



          generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"



          preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"



          catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"



          So in your case it would be like



          <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
          <!-- Splits words based on whitespace characters -->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments -->
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>

          <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          </fieldType>


          The more information on it would be found at Fiters available in solr






          share|improve this answer


























          • This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

            – Nikolay
            Nov 28 '18 at 12:57











          • @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

            – Abhijit Bashetti
            Nov 28 '18 at 13:54











          • Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

            – Nikolay
            Nov 28 '18 at 18:18













          • This helped a lot, thank you.

            – Paul
            Dec 4 '18 at 16:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53506280%2fsolr-queries-with-dashes%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.



          The WordDelimiterGraphFilterFactory has many attributes.



          generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"



          preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"



          catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"



          So in your case it would be like



          <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
          <!-- Splits words based on whitespace characters -->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments -->
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>

          <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          </fieldType>


          The more information on it would be found at Fiters available in solr






          share|improve this answer


























          • This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

            – Nikolay
            Nov 28 '18 at 12:57











          • @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

            – Abhijit Bashetti
            Nov 28 '18 at 13:54











          • Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

            – Nikolay
            Nov 28 '18 at 18:18













          • This helped a lot, thank you.

            – Paul
            Dec 4 '18 at 16:31
















          3














          You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.



          The WordDelimiterGraphFilterFactory has many attributes.



          generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"



          preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"



          catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"



          So in your case it would be like



          <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
          <!-- Splits words based on whitespace characters -->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments -->
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>

          <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          </fieldType>


          The more information on it would be found at Fiters available in solr






          share|improve this answer


























          • This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

            – Nikolay
            Nov 28 '18 at 12:57











          • @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

            – Abhijit Bashetti
            Nov 28 '18 at 13:54











          • Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

            – Nikolay
            Nov 28 '18 at 18:18













          • This helped a lot, thank you.

            – Paul
            Dec 4 '18 at 16:31














          3












          3








          3







          You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.



          The WordDelimiterGraphFilterFactory has many attributes.



          generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"



          preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"



          catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"



          So in your case it would be like



          <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
          <!-- Splits words based on whitespace characters -->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments -->
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>

          <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          </fieldType>


          The more information on it would be found at Fiters available in solr






          share|improve this answer















          You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.



          The WordDelimiterGraphFilterFactory has many attributes.



          generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"



          preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"



          catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"



          So in your case it would be like



          <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
          <!-- Splits words based on whitespace characters -->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments -->
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>

          <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          </fieldType>


          The more information on it would be found at Fiters available in solr







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 28 '18 at 11:54

























          answered Nov 28 '18 at 4:08









          Abhijit BashettiAbhijit Bashetti

          4,35652034




          4,35652034













          • This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

            – Nikolay
            Nov 28 '18 at 12:57











          • @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

            – Abhijit Bashetti
            Nov 28 '18 at 13:54











          • Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

            – Nikolay
            Nov 28 '18 at 18:18













          • This helped a lot, thank you.

            – Paul
            Dec 4 '18 at 16:31



















          • This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

            – Nikolay
            Nov 28 '18 at 12:57











          • @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

            – Abhijit Bashetti
            Nov 28 '18 at 13:54











          • Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

            – Nikolay
            Nov 28 '18 at 18:18













          • This helped a lot, thank you.

            – Paul
            Dec 4 '18 at 16:31

















          This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

          – Nikolay
          Nov 28 '18 at 12:57





          This is a correct answer, of course, and this is a preferred way to do it. But, original question might be related to phrase queries that can be converted to span queries for some complex cases, so be aware of issues.apache.org/jira/projects/LUCENE/issues/LUCENE-7398 .

          – Nikolay
          Nov 28 '18 at 12:57













          @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

          – Abhijit Bashetti
          Nov 28 '18 at 13:54





          @Nikolay : I will check again from my side if anything can be done...waiting for the Paul for his feedback...

          – Abhijit Bashetti
          Nov 28 '18 at 13:54













          Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

          – Nikolay
          Nov 28 '18 at 18:18







          Your answer is perfect, I just added minor warning for Paul on Lucene bug that break this in some rare cases.

          – Nikolay
          Nov 28 '18 at 18:18















          This helped a lot, thank you.

          – Paul
          Dec 4 '18 at 16:31





          This helped a lot, thank you.

          – Paul
          Dec 4 '18 at 16:31




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53506280%2fsolr-queries-with-dashes%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

          Calculate evaluation metrics using cross_val_predict sklearn

          Insert data from modal to MySQL (multiple modal on website)