Why I see fielddata when doc_value is enabled in Aggregations?












0















Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.



However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)



Is this the correct behavior? if true, how can I prevent fielddata creation?



I'm using elasticsearch 6.5 and This is my mapping



{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"codec": "best_compression"
}
},
"mappings": {
"_doc": {
"properties": {
"time": {
"type": "date",
"format": "epoch_millis"
},
"session_token": {
"type": "keyword"
},
"session_ref": {
"type": "keyword"
},
"session_id": {
"type": "long"
},
"src": {
"type": "ip"
},
"version": {
"type": "byte"
}
}
}
}
}


this is a sample aggregation which causes fielddata to get loaded



GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"size": 100
}
}
}
}


and here is fielddata status after aggregation



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
}
}


and here is segments stat



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},
"total" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
}
}









share|improve this question

























  • What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

    – Val
    Nov 28 '18 at 10:41











  • @Val the value is "doc_values_memory_in_bytes" : 4472

    – user3473830
    Nov 28 '18 at 11:18
















0















Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.



However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)



Is this the correct behavior? if true, how can I prevent fielddata creation?



I'm using elasticsearch 6.5 and This is my mapping



{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"codec": "best_compression"
}
},
"mappings": {
"_doc": {
"properties": {
"time": {
"type": "date",
"format": "epoch_millis"
},
"session_token": {
"type": "keyword"
},
"session_ref": {
"type": "keyword"
},
"session_id": {
"type": "long"
},
"src": {
"type": "ip"
},
"version": {
"type": "byte"
}
}
}
}
}


this is a sample aggregation which causes fielddata to get loaded



GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"size": 100
}
}
}
}


and here is fielddata status after aggregation



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
}
}


and here is segments stat



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},
"total" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
}
}









share|improve this question

























  • What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

    – Val
    Nov 28 '18 at 10:41











  • @Val the value is "doc_values_memory_in_bytes" : 4472

    – user3473830
    Nov 28 '18 at 11:18














0












0








0


1






Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.



However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)



Is this the correct behavior? if true, how can I prevent fielddata creation?



I'm using elasticsearch 6.5 and This is my mapping



{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"codec": "best_compression"
}
},
"mappings": {
"_doc": {
"properties": {
"time": {
"type": "date",
"format": "epoch_millis"
},
"session_token": {
"type": "keyword"
},
"session_ref": {
"type": "keyword"
},
"session_id": {
"type": "long"
},
"src": {
"type": "ip"
},
"version": {
"type": "byte"
}
}
}
}
}


this is a sample aggregation which causes fielddata to get loaded



GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"size": 100
}
}
}
}


and here is fielddata status after aggregation



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
}
}


and here is segments stat



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},
"total" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
}
}









share|improve this question
















Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.



However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)



Is this the correct behavior? if true, how can I prevent fielddata creation?



I'm using elasticsearch 6.5 and This is my mapping



{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"codec": "best_compression"
}
},
"mappings": {
"_doc": {
"properties": {
"time": {
"type": "date",
"format": "epoch_millis"
},
"session_token": {
"type": "keyword"
},
"session_ref": {
"type": "keyword"
},
"session_id": {
"type": "long"
},
"src": {
"type": "ip"
},
"version": {
"type": "byte"
}
}
}
}
}


this is a sample aggregation which causes fielddata to get loaded



GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"size": 100
}
}
}
}


and here is fielddata status after aggregation



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
}
}


and here is segments stat



"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},
"total" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
}
}






elasticsearch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 '18 at 11:18







user3473830

















asked Nov 28 '18 at 10:28









user3473830user3473830

6,00041943




6,00041943













  • What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

    – Val
    Nov 28 '18 at 10:41











  • @Val the value is "doc_values_memory_in_bytes" : 4472

    – user3473830
    Nov 28 '18 at 11:18



















  • What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

    – Val
    Nov 28 '18 at 10:41











  • @Val the value is "doc_values_memory_in_bytes" : 4472

    – user3473830
    Nov 28 '18 at 11:18

















What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

– Val
Nov 28 '18 at 10:41





What value do you have in the ...segments.doc_values_memory_in_bytes of your index stats response?

– Val
Nov 28 '18 at 10:41













@Val the value is "doc_values_memory_in_bytes" : 4472

– user3473830
Nov 28 '18 at 11:18





@Val the value is "doc_values_memory_in_bytes" : 4472

– user3473830
Nov 28 '18 at 11:18












1 Answer
1






active

oldest

votes


















0














Apparently Global Ordinals memory usage are shown in fielddata.



Global Ordinals can be set to either eager or lazy in mapping, the former would force loading them during refresh time and the latter during query time (default)



to prevent using Global Ordinals in term aggregations we can use "execution_hint": "map" which in my case would be:



GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"execution_hint": "map"
"size": 100
}
}
}
}


although it comes with its own caveats, uses more memory for query execution and runs slower.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53517281%2fwhy-i-see-fielddata-when-doc-value-is-enabled-in-aggregations%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Apparently Global Ordinals memory usage are shown in fielddata.



    Global Ordinals can be set to either eager or lazy in mapping, the former would force loading them during refresh time and the latter during query time (default)



    to prevent using Global Ordinals in term aggregations we can use "execution_hint": "map" which in my case would be:



    GET test_ind/_search?size=0
    {
    "aggs" : {
    "by_token":{
    "terms":{
    "field": "token",
    "execution_hint": "map"
    "size": 100
    }
    }
    }
    }


    although it comes with its own caveats, uses more memory for query execution and runs slower.






    share|improve this answer




























      0














      Apparently Global Ordinals memory usage are shown in fielddata.



      Global Ordinals can be set to either eager or lazy in mapping, the former would force loading them during refresh time and the latter during query time (default)



      to prevent using Global Ordinals in term aggregations we can use "execution_hint": "map" which in my case would be:



      GET test_ind/_search?size=0
      {
      "aggs" : {
      "by_token":{
      "terms":{
      "field": "token",
      "execution_hint": "map"
      "size": 100
      }
      }
      }
      }


      although it comes with its own caveats, uses more memory for query execution and runs slower.






      share|improve this answer


























        0












        0








        0







        Apparently Global Ordinals memory usage are shown in fielddata.



        Global Ordinals can be set to either eager or lazy in mapping, the former would force loading them during refresh time and the latter during query time (default)



        to prevent using Global Ordinals in term aggregations we can use "execution_hint": "map" which in my case would be:



        GET test_ind/_search?size=0
        {
        "aggs" : {
        "by_token":{
        "terms":{
        "field": "token",
        "execution_hint": "map"
        "size": 100
        }
        }
        }
        }


        although it comes with its own caveats, uses more memory for query execution and runs slower.






        share|improve this answer













        Apparently Global Ordinals memory usage are shown in fielddata.



        Global Ordinals can be set to either eager or lazy in mapping, the former would force loading them during refresh time and the latter during query time (default)



        to prevent using Global Ordinals in term aggregations we can use "execution_hint": "map" which in my case would be:



        GET test_ind/_search?size=0
        {
        "aggs" : {
        "by_token":{
        "terms":{
        "field": "token",
        "execution_hint": "map"
        "size": 100
        }
        }
        }
        }


        although it comes with its own caveats, uses more memory for query execution and runs slower.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 29 '18 at 11:05









        user3473830user3473830

        6,00041943




        6,00041943
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53517281%2fwhy-i-see-fielddata-when-doc-value-is-enabled-in-aggregations%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)