PySpark: Load a csv file into a dataframe with a schema containing array












0















I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.



I have used:



df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)


Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.



Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)










share|improve this question




















  • 1





    It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

    – user10465355
    Nov 27 '18 at 13:44
















0















I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.



I have used:



df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)


Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.



Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)










share|improve this question




















  • 1





    It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

    – user10465355
    Nov 27 '18 at 13:44














0












0








0








I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.



I have used:



df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)


Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.



Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)










share|improve this question
















I am writing a function which would return a spark dataframe. The method takes a csv file containing data as one argument and schema as another argument.



I have used:



df = sqlContext.read.format('csv').options(quote='"',escape='"').
schema(input_schema).load(filename)


Where input_schema is the schema and filename is the filelocation received as argument for the method. This works fine when the schema doesn't contain an ArrayType but its failing when the schema contains an ArrayType. It indicates array as an unknown type.



Why ArrayType is not working? How to handle ArrayType in CSV while the schema is dynamic(meaning any column could be defined as array type)







apache-spark pyspark






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 14:54









Daniel Larsson

5372419




5372419










asked Nov 27 '18 at 13:28









Yuvaraj SankaranYuvaraj Sankaran

113




113








  • 1





    It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

    – user10465355
    Nov 27 '18 at 13:44














  • 1





    It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

    – user10465355
    Nov 27 '18 at 13:44








1




1





It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

– user10465355
Nov 27 '18 at 13:44





It is not working, because complex types, including arrays, are not supported by CSV reader and writer. You have to load these as strings, and parse the content later.

– user10465355
Nov 27 '18 at 13:44












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53500819%2fpyspark-load-a-csv-file-into-a-dataframe-with-a-schema-containing-array%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53500819%2fpyspark-load-a-csv-file-into-a-dataframe-with-a-schema-containing-array%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lallio

Unable to find Lightning Node

Futebolista