PySpark: Load a csv file into a dataframe with a schema containing array
I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.
I have used:
df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)
Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.
Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)
apache-spark pyspark
add a comment |
I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.
I have used:
df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)
Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.
Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)
apache-spark pyspark
1
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44
add a comment |
I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.
I have used:
df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)
Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.
Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)
apache-spark pyspark
I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.
I have used:
df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)
Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.
Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)
apache-spark pyspark
apache-spark pyspark
edited Nov 27 '18 at 14:54
Daniel Larsson
5372419
5372419
asked Nov 27 '18 at 13:28
Yuvaraj SankaranYuvaraj Sankaran
113
113
1
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44
add a comment |
1
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44
1
1
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53500819%2fpyspark-load-a-csv-file-into-a-dataframe-with-a-schema-containing-array%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53500819%2fpyspark-load-a-csv-file-into-a-dataframe-with-a-schema-containing-array%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.
– user10465355
Nov 27 '18 at 13:44