Python Elasticsearch Not Accepting Body of Data

I'm basically trying to index data from a dataframe coming from a csv file.

I created an indices successfully.

es.indices.create(index='hash_test', ignore=400)

And added a baseline indexes with the columns and sample data contained in my dataframe

       es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={

         'FILENAME': '6.js', 

         'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 

         'DATE': '2018-11-15'})

Which ran fine.

Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    dictionary = {'Â':''}

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4();

        df.replace(dictionary, regex=True, inplace=True)

        df.columns = df.columns.str.replace('Â', '')

        records=df.where(pd.notnull(df),None).T.to_dict()

        list_records=[records[it] for it in records]

        json_data = str(''.join(str(v) for v in list_records))

        try:

            es.index(index_name, doc_type, rand_id, json_data)

        except:

            print("error!")

            pass

I had to do some parsing of the dataframe as a weird character was in every row and column (Â).

When I print the values I want to index

print(index_name, doc_type, rand_id, json_data)

I get exactly what I want

hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}

However I get the following error when I run it;

RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')

Which is attempting to PUT the following data:

{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}

It completely ignores the rand_id parameter completely, and when I do the following:

es.index(index_name, doc_type, json_data, rand_id)

It ignores the json_data paramter.....

{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}

I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.

Cheers in advance for any help.

asked Nov 28 '18 at 17:03

F.Terrie

133

add a comment |

I'm basically trying to index data from a dataframe coming from a csv file.

I created an indices successfully.

es.indices.create(index='hash_test', ignore=400)

And added a baseline indexes with the columns and sample data contained in my dataframe

       es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={

         'FILENAME': '6.js', 

         'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 

         'DATE': '2018-11-15'})

Which ran fine.

Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    dictionary = {'Â':''}

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4();

        df.replace(dictionary, regex=True, inplace=True)

        df.columns = df.columns.str.replace('Â', '')

        records=df.where(pd.notnull(df),None).T.to_dict()

        list_records=[records[it] for it in records]

        json_data = str(''.join(str(v) for v in list_records))

        try:

            es.index(index_name, doc_type, rand_id, json_data)

        except:

            print("error!")

            pass

I had to do some parsing of the dataframe as a weird character was in every row and column (Â).

When I print the values I want to index

print(index_name, doc_type, rand_id, json_data)

I get exactly what I want

hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}

However I get the following error when I run it;

RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')

Which is attempting to PUT the following data:

{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}

It completely ignores the rand_id parameter completely, and when I do the following:

es.index(index_name, doc_type, json_data, rand_id)

It ignores the json_data paramter.....

{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}

I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.

Cheers in advance for any help.

asked Nov 28 '18 at 17:03

F.Terrie

133

add a comment |

I'm basically trying to index data from a dataframe coming from a csv file.

I created an indices successfully.

es.indices.create(index='hash_test', ignore=400)

And added a baseline indexes with the columns and sample data contained in my dataframe

       es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={

         'FILENAME': '6.js', 

         'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 

         'DATE': '2018-11-15'})

Which ran fine.

Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    dictionary = {'Â':''}

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4();

        df.replace(dictionary, regex=True, inplace=True)

        df.columns = df.columns.str.replace('Â', '')

        records=df.where(pd.notnull(df),None).T.to_dict()

        list_records=[records[it] for it in records]

        json_data = str(''.join(str(v) for v in list_records))

        try:

            es.index(index_name, doc_type, rand_id, json_data)

        except:

            print("error!")

            pass

I had to do some parsing of the dataframe as a weird character was in every row and column (Â).

When I print the values I want to index

print(index_name, doc_type, rand_id, json_data)

I get exactly what I want

hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}

However I get the following error when I run it;

RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')

Which is attempting to PUT the following data:

{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}

It completely ignores the rand_id parameter completely, and when I do the following:

es.index(index_name, doc_type, json_data, rand_id)

It ignores the json_data paramter.....

{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}

I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.

Cheers in advance for any help.

asked Nov 28 '18 at 17:03

F.Terrie

133

I'm basically trying to index data from a dataframe coming from a csv file.

I created an indices successfully.

es.indices.create(index='hash_test', ignore=400)

And added a baseline indexes with the columns and sample data contained in my dataframe

       es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={

         'FILENAME': '6.js', 

         'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 

         'DATE': '2018-11-15'})

Which ran fine.

Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    dictionary = {'Â':''}

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4();

        df.replace(dictionary, regex=True, inplace=True)

        df.columns = df.columns.str.replace('Â', '')

        records=df.where(pd.notnull(df),None).T.to_dict()

        list_records=[records[it] for it in records]

        json_data = str(''.join(str(v) for v in list_records))

        try:

            es.index(index_name, doc_type, rand_id, json_data)

        except:

            print("error!")

            pass

I had to do some parsing of the dataframe as a weird character was in every row and column (Â).

When I print the values I want to index

print(index_name, doc_type, rand_id, json_data)

I get exactly what I want

hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}

However I get the following error when I run it;

RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')

Which is attempting to PUT the following data:

{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}

It completely ignores the rand_id parameter completely, and when I do the following:

es.index(index_name, doc_type, json_data, rand_id)

It ignores the json_data paramter.....

{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}

I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.

Cheers in advance for any help.

python elasticsearch

asked Nov 28 '18 at 17:03

F.Terrie

133

asked Nov 28 '18 at 17:03

F.Terrie

133

asked Nov 28 '18 at 17:03

F.Terrie

133

asked Nov 28 '18 at 17:03

F.Terrie

133

asked Nov 28 '18 at 17:03

F.Terrie

133

add a comment |

1 Answer
1

active

oldest

votes

So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json function in pandas.

The below code clears this up and indexes my dataframe into my Elasticsearch instance.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4(); #create a random id

        data=df.to_json(orient='records', lines=True)

        try:

            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)

        except TransportError as e:

            print(e.info)

answered Nov 29 '18 at 11:27

F.Terrie

133

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524608%2fpython-elasticsearch-not-accepting-body-of-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The below code clears this up and indexes my dataframe into my Elasticsearch instance.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4(); #create a random id

        data=df.to_json(orient='records', lines=True)

        try:

            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)

        except TransportError as e:

            print(e.info)

answered Nov 29 '18 at 11:27

F.Terrie

133

add a comment |

The below code clears this up and indexes my dataframe into my Elasticsearch instance.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4(); #create a random id

        data=df.to_json(orient='records', lines=True)

        try:

            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)

        except TransportError as e:

            print(e.info)

answered Nov 29 '18 at 11:27

F.Terrie

133

add a comment |

The below code clears this up and indexes my dataframe into my Elasticsearch instance.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4(); #create a random id

        data=df.to_json(orient='records', lines=True)

        try:

            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)

        except TransportError as e:

            print(e.info)

answered Nov 29 '18 at 11:27

F.Terrie

133

The below code clears this up and indexes my dataframe into my Elasticsearch instance.

def index_data(data_path, chunksize, index_name, doc_type):

    f = open(data_path)

    csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python") 

    es = Elasticsearch('http://*.*.*.*:9200/')



    for i,df in enumerate(csvfile):

        rand_id = uuid.uuid4(); #create a random id

        data=df.to_json(orient='records', lines=True)

        try:

            es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)

        except TransportError as e:

            print(e.info)

answered Nov 29 '18 at 11:27

F.Terrie

133

answered Nov 29 '18 at 11:27

F.Terrie

133

answered Nov 29 '18 at 11:27

F.Terrie

133

answered Nov 29 '18 at 11:27

F.Terrie

133

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

TWIxlrCz6t9FjxklgoATY22ny7WjF,TgaJhC,YvUJr VC700CklU6i0sG,9J5 t25hdTzz2lDpVRo 6

搜尋此網誌

Btukfyl