Python Elasticsearch Not Accepting Body of Data












0















I'm basically trying to index data from a dataframe coming from a csv file.



I created an indices successfully.



es.indices.create(index='hash_test', ignore=400)


And added a baseline indexes with the columns and sample data contained in my dataframe



       es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
'FILENAME': '6.js',
'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
'DATE': '2018-11-15'})


Which ran fine.



Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.



def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
dictionary = {'Â':''}
es = Elasticsearch('http://*.*.*.*:9200/')

for i,df in enumerate(csvfile):
rand_id = uuid.uuid4();
df.replace(dictionary, regex=True, inplace=True)
df.columns = df.columns.str.replace('Â', '')
records=df.where(pd.notnull(df),None).T.to_dict()
list_records=[records[it] for it in records]
json_data = str(''.join(str(v) for v in list_records))
try:
es.index(index_name, doc_type, rand_id, json_data)
except:
print("error!")
pass


I had to do some parsing of the dataframe as a weird character was in every row and column (Â).



When I print the values I want to index



print(index_name, doc_type, rand_id, json_data)


I get exactly what I want



hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}


However I get the following error when I run it;



RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')


Which is attempting to PUT the following data:



{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}


It completely ignores the rand_id parameter completely, and when I do the following:



es.index(index_name, doc_type, json_data, rand_id)


It ignores the json_data paramter.....



{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}


I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.



Cheers in advance for any help.










share|improve this question



























    0















    I'm basically trying to index data from a dataframe coming from a csv file.



    I created an indices successfully.



    es.indices.create(index='hash_test', ignore=400)


    And added a baseline indexes with the columns and sample data contained in my dataframe



           es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
    'FILENAME': '6.js',
    'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
    'DATE': '2018-11-15'})


    Which ran fine.



    Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.



    def index_data(data_path, chunksize, index_name, doc_type):
    f = open(data_path)
    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
    dictionary = {'Â':''}
    es = Elasticsearch('http://*.*.*.*:9200/')

    for i,df in enumerate(csvfile):
    rand_id = uuid.uuid4();
    df.replace(dictionary, regex=True, inplace=True)
    df.columns = df.columns.str.replace('Â', '')
    records=df.where(pd.notnull(df),None).T.to_dict()
    list_records=[records[it] for it in records]
    json_data = str(''.join(str(v) for v in list_records))
    try:
    es.index(index_name, doc_type, rand_id, json_data)
    except:
    print("error!")
    pass


    I had to do some parsing of the dataframe as a weird character was in every row and column (Â).



    When I print the values I want to index



    print(index_name, doc_type, rand_id, json_data)


    I get exactly what I want



    hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}


    However I get the following error when I run it;



    RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')


    Which is attempting to PUT the following data:



    {"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}


    It completely ignores the rand_id parameter completely, and when I do the following:



    es.index(index_name, doc_type, json_data, rand_id)


    It ignores the json_data paramter.....



    {"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}


    I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.



    Cheers in advance for any help.










    share|improve this question

























      0












      0








      0








      I'm basically trying to index data from a dataframe coming from a csv file.



      I created an indices successfully.



      es.indices.create(index='hash_test', ignore=400)


      And added a baseline indexes with the columns and sample data contained in my dataframe



             es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
      'FILENAME': '6.js',
      'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
      'DATE': '2018-11-15'})


      Which ran fine.



      Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.



      def index_data(data_path, chunksize, index_name, doc_type):
      f = open(data_path)
      csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
      dictionary = {'Â':''}
      es = Elasticsearch('http://*.*.*.*:9200/')

      for i,df in enumerate(csvfile):
      rand_id = uuid.uuid4();
      df.replace(dictionary, regex=True, inplace=True)
      df.columns = df.columns.str.replace('Â', '')
      records=df.where(pd.notnull(df),None).T.to_dict()
      list_records=[records[it] for it in records]
      json_data = str(''.join(str(v) for v in list_records))
      try:
      es.index(index_name, doc_type, rand_id, json_data)
      except:
      print("error!")
      pass


      I had to do some parsing of the dataframe as a weird character was in every row and column (Â).



      When I print the values I want to index



      print(index_name, doc_type, rand_id, json_data)


      I get exactly what I want



      hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}


      However I get the following error when I run it;



      RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')


      Which is attempting to PUT the following data:



      {"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}


      It completely ignores the rand_id parameter completely, and when I do the following:



      es.index(index_name, doc_type, json_data, rand_id)


      It ignores the json_data paramter.....



      {"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}


      I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.



      Cheers in advance for any help.










      share|improve this question














      I'm basically trying to index data from a dataframe coming from a csv file.



      I created an indices successfully.



      es.indices.create(index='hash_test', ignore=400)


      And added a baseline indexes with the columns and sample data contained in my dataframe



             es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
      'FILENAME': '6.js',
      'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
      'DATE': '2018-11-15'})


      Which ran fine.



      Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.



      def index_data(data_path, chunksize, index_name, doc_type):
      f = open(data_path)
      csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
      dictionary = {'Â':''}
      es = Elasticsearch('http://*.*.*.*:9200/')

      for i,df in enumerate(csvfile):
      rand_id = uuid.uuid4();
      df.replace(dictionary, regex=True, inplace=True)
      df.columns = df.columns.str.replace('Â', '')
      records=df.where(pd.notnull(df),None).T.to_dict()
      list_records=[records[it] for it in records]
      json_data = str(''.join(str(v) for v in list_records))
      try:
      es.index(index_name, doc_type, rand_id, json_data)
      except:
      print("error!")
      pass


      I had to do some parsing of the dataframe as a weird character was in every row and column (Â).



      When I print the values I want to index



      print(index_name, doc_type, rand_id, json_data)


      I get exactly what I want



      hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}


      However I get the following error when I run it;



      RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')


      Which is attempting to PUT the following data:



      {"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}


      It completely ignores the rand_id parameter completely, and when I do the following:



      es.index(index_name, doc_type, json_data, rand_id)


      It ignores the json_data paramter.....



      {"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}


      I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.



      Cheers in advance for any help.







      python elasticsearch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 28 '18 at 17:03









      F.TerrieF.Terrie

      133




      133
























          1 Answer
          1






          active

          oldest

          votes


















          0














          So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.



          The below code clears this up and indexes my dataframe into my Elasticsearch instance.



          def index_data(data_path, chunksize, index_name, doc_type):
          f = open(data_path)
          csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
          es = Elasticsearch('http://*.*.*.*:9200/')

          for i,df in enumerate(csvfile):
          rand_id = uuid.uuid4(); #create a random id
          data=df.to_json(orient='records', lines=True)
          try:
          es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
          except TransportError as e:
          print(e.info)





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524608%2fpython-elasticsearch-not-accepting-body-of-data%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.



            The below code clears this up and indexes my dataframe into my Elasticsearch instance.



            def index_data(data_path, chunksize, index_name, doc_type):
            f = open(data_path)
            csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
            es = Elasticsearch('http://*.*.*.*:9200/')

            for i,df in enumerate(csvfile):
            rand_id = uuid.uuid4(); #create a random id
            data=df.to_json(orient='records', lines=True)
            try:
            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
            except TransportError as e:
            print(e.info)





            share|improve this answer




























              0














              So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.



              The below code clears this up and indexes my dataframe into my Elasticsearch instance.



              def index_data(data_path, chunksize, index_name, doc_type):
              f = open(data_path)
              csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
              es = Elasticsearch('http://*.*.*.*:9200/')

              for i,df in enumerate(csvfile):
              rand_id = uuid.uuid4(); #create a random id
              data=df.to_json(orient='records', lines=True)
              try:
              es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
              except TransportError as e:
              print(e.info)





              share|improve this answer


























                0












                0








                0







                So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.



                The below code clears this up and indexes my dataframe into my Elasticsearch instance.



                def index_data(data_path, chunksize, index_name, doc_type):
                f = open(data_path)
                csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
                es = Elasticsearch('http://*.*.*.*:9200/')

                for i,df in enumerate(csvfile):
                rand_id = uuid.uuid4(); #create a random id
                data=df.to_json(orient='records', lines=True)
                try:
                es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
                except TransportError as e:
                print(e.info)





                share|improve this answer













                So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.



                The below code clears this up and indexes my dataframe into my Elasticsearch instance.



                def index_data(data_path, chunksize, index_name, doc_type):
                f = open(data_path)
                csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
                es = Elasticsearch('http://*.*.*.*:9200/')

                for i,df in enumerate(csvfile):
                rand_id = uuid.uuid4(); #create a random id
                data=df.to_json(orient='records', lines=True)
                try:
                es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
                except TransportError as e:
                print(e.info)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 29 '18 at 11:27









                F.TerrieF.Terrie

                133




                133
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524608%2fpython-elasticsearch-not-accepting-body-of-data%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)