Python Elasticsearch Not Accepting Body of Data
I'm basically trying to index data from a dataframe coming from a csv file.
I created an indices successfully.
es.indices.create(index='hash_test', ignore=400)
And added a baseline indexes with the columns and sample data contained in my dataframe
es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
'FILENAME': '6.js',
'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
'DATE': '2018-11-15'})
Which ran fine.
Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
dictionary = {'Â':''}
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4();
df.replace(dictionary, regex=True, inplace=True)
df.columns = df.columns.str.replace('Â', '')
records=df.where(pd.notnull(df),None).T.to_dict()
list_records=[records[it] for it in records]
json_data = str(''.join(str(v) for v in list_records))
try:
es.index(index_name, doc_type, rand_id, json_data)
except:
print("error!")
pass
I had to do some parsing of the dataframe as a weird character was in every row and column (Â).
When I print the values I want to index
print(index_name, doc_type, rand_id, json_data)
I get exactly what I want
hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}
However I get the following error when I run it;
RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Which is attempting to PUT the following data:
{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}
It completely ignores the rand_id parameter completely, and when I do the following:
es.index(index_name, doc_type, json_data, rand_id)
It ignores the json_data paramter.....
{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}
I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.
Cheers in advance for any help.
python elasticsearch
add a comment |
I'm basically trying to index data from a dataframe coming from a csv file.
I created an indices successfully.
es.indices.create(index='hash_test', ignore=400)
And added a baseline indexes with the columns and sample data contained in my dataframe
es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
'FILENAME': '6.js',
'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
'DATE': '2018-11-15'})
Which ran fine.
Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
dictionary = {'Â':''}
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4();
df.replace(dictionary, regex=True, inplace=True)
df.columns = df.columns.str.replace('Â', '')
records=df.where(pd.notnull(df),None).T.to_dict()
list_records=[records[it] for it in records]
json_data = str(''.join(str(v) for v in list_records))
try:
es.index(index_name, doc_type, rand_id, json_data)
except:
print("error!")
pass
I had to do some parsing of the dataframe as a weird character was in every row and column (Â).
When I print the values I want to index
print(index_name, doc_type, rand_id, json_data)
I get exactly what I want
hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}
However I get the following error when I run it;
RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Which is attempting to PUT the following data:
{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}
It completely ignores the rand_id parameter completely, and when I do the following:
es.index(index_name, doc_type, json_data, rand_id)
It ignores the json_data paramter.....
{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}
I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.
Cheers in advance for any help.
python elasticsearch
add a comment |
I'm basically trying to index data from a dataframe coming from a csv file.
I created an indices successfully.
es.indices.create(index='hash_test', ignore=400)
And added a baseline indexes with the columns and sample data contained in my dataframe
es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
'FILENAME': '6.js',
'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
'DATE': '2018-11-15'})
Which ran fine.
Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
dictionary = {'Â':''}
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4();
df.replace(dictionary, regex=True, inplace=True)
df.columns = df.columns.str.replace('Â', '')
records=df.where(pd.notnull(df),None).T.to_dict()
list_records=[records[it] for it in records]
json_data = str(''.join(str(v) for v in list_records))
try:
es.index(index_name, doc_type, rand_id, json_data)
except:
print("error!")
pass
I had to do some parsing of the dataframe as a weird character was in every row and column (Â).
When I print the values I want to index
print(index_name, doc_type, rand_id, json_data)
I get exactly what I want
hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}
However I get the following error when I run it;
RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Which is attempting to PUT the following data:
{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}
It completely ignores the rand_id parameter completely, and when I do the following:
es.index(index_name, doc_type, json_data, rand_id)
It ignores the json_data paramter.....
{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}
I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.
Cheers in advance for any help.
python elasticsearch
I'm basically trying to index data from a dataframe coming from a csv file.
I created an indices successfully.
es.indices.create(index='hash_test', ignore=400)
And added a baseline indexes with the columns and sample data contained in my dataframe
es.index(index="hash_test", doc_type="hash-test", id=rand_id, body={
'FILENAME': '6.js',
'HASH': 'b4d44ed618112e41cb7e8f33bb19a414',
'DATE': '2018-11-15'})
Which ran fine.
Below is how I want to parse my dataframe into the proper format and iterate through the rows and index the data into Elasticsearch similar to the above.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
dictionary = {'Â':''}
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4();
df.replace(dictionary, regex=True, inplace=True)
df.columns = df.columns.str.replace('Â', '')
records=df.where(pd.notnull(df),None).T.to_dict()
list_records=[records[it] for it in records]
json_data = str(''.join(str(v) for v in list_records))
try:
es.index(index_name, doc_type, rand_id, json_data)
except:
print("error!")
pass
I had to do some parsing of the dataframe as a weird character was in every row and column (Â).
When I print the values I want to index
print(index_name, doc_type, rand_id, json_data)
I get exactly what I want
hash_test hash-test 51eacee2-e2b1-4886-82f5-1373ec59c640 {'FILENAME': '6.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}
However I get the following error when I run it;
RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Which is attempting to PUT the following data:
{"_index":"hash_test","_type":"hash-test","_id":"{'FILENAME': '8.js', 'HASH': 'b4d44ed618112e41cb7e8f33bb19a414', 'DATE': '2018-11-15'}","found":false}
It completely ignores the rand_id parameter completely, and when I do the following:
es.index(index_name, doc_type, json_data, rand_id)
It ignores the json_data paramter.....
{"_index":"hash_test","_type":"hash-test","_id":"93eadd1b-6859-474b-9750-b618b800b4d5","found":false}
I don't understand the differences in the output I'm getting, and I'm stumped as to how the body is ending up in the _id field when I specified the id parameter.
Cheers in advance for any help.
python elasticsearch
python elasticsearch
asked Nov 28 '18 at 17:03
F.TerrieF.Terrie
133
133
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json
function in pandas.
The below code clears this up and indexes my dataframe into my Elasticsearch instance.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4(); #create a random id
data=df.to_json(orient='records', lines=True)
try:
es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
except TransportError as e:
print(e.info)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524608%2fpython-elasticsearch-not-accepting-body-of-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json
function in pandas.
The below code clears this up and indexes my dataframe into my Elasticsearch instance.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4(); #create a random id
data=df.to_json(orient='records', lines=True)
try:
es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
except TransportError as e:
print(e.info)
add a comment |
So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json
function in pandas.
The below code clears this up and indexes my dataframe into my Elasticsearch instance.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4(); #create a random id
data=df.to_json(orient='records', lines=True)
try:
es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
except TransportError as e:
print(e.info)
add a comment |
So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json
function in pandas.
The below code clears this up and indexes my dataframe into my Elasticsearch instance.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4(); #create a random id
data=df.to_json(orient='records', lines=True)
try:
es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
except TransportError as e:
print(e.info)
So, not surprisingly, i was over complicating what I needed to do by creating a clean JSON string from my dataframe. Instead of using a dictionary and then using a list (which I imagine is the source of my errors), I learned it's much easier to just use the to_json
function in pandas.
The below code clears this up and indexes my dataframe into my Elasticsearch instance.
def index_data(data_path, chunksize, index_name, doc_type):
f = open(data_path)
csvfile = pd.read_csv(f, iterator=True, chunksize=chunksize,sep="£",encoding="utf-8-sig",index_col=0,engine="python")
es = Elasticsearch('http://*.*.*.*:9200/')
for i,df in enumerate(csvfile):
rand_id = uuid.uuid4(); #create a random id
data=df.to_json(orient='records', lines=True)
try:
es.index(index=index_name,doc_type=doc_type,id=rand_id,body=data)
except TransportError as e:
print(e.info)
answered Nov 29 '18 at 11:27
F.TerrieF.Terrie
133
133
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524608%2fpython-elasticsearch-not-accepting-body-of-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown