Spark - python textFile creates weird rdd [duplicate]
This question already has an answer here:
How to save a spark dataframe as a text file without Rows in pyspark?
1 answer
How to restore RDD of (key,value) pairs after it has been stored/read from a text file
2 answers
I saved an RDD with
rdd.saveAsTextFile("file_dir")
When I type
rdd = sc.textFile("path/to/file_dir")
an RDD is created.
The only problem is that the RDD created isn't usable.
tail file
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1]
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1][0]
"("
How can I change the output format to something I can use?
python apache-spark pyspark
marked as duplicate by user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How to save a spark dataframe as a text file without Rows in pyspark?
1 answer
How to restore RDD of (key,value) pairs after it has been stored/read from a text file
2 answers
I saved an RDD with
rdd.saveAsTextFile("file_dir")
When I type
rdd = sc.textFile("path/to/file_dir")
an RDD is created.
The only problem is that the RDD created isn't usable.
tail file
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1]
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1][0]
"("
How can I change the output format to something I can use?
python apache-spark pyspark
marked as duplicate by user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
I'd like to be able to use different values to perform transformations. ierdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04
add a comment |
This question already has an answer here:
How to save a spark dataframe as a text file without Rows in pyspark?
1 answer
How to restore RDD of (key,value) pairs after it has been stored/read from a text file
2 answers
I saved an RDD with
rdd.saveAsTextFile("file_dir")
When I type
rdd = sc.textFile("path/to/file_dir")
an RDD is created.
The only problem is that the RDD created isn't usable.
tail file
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1]
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1][0]
"("
How can I change the output format to something I can use?
python apache-spark pyspark
This question already has an answer here:
How to save a spark dataframe as a text file without Rows in pyspark?
1 answer
How to restore RDD of (key,value) pairs after it has been stored/read from a text file
2 answers
I saved an RDD with
rdd.saveAsTextFile("file_dir")
When I type
rdd = sc.textFile("path/to/file_dir")
an RDD is created.
The only problem is that the RDD created isn't usable.
tail file
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1]
"('a', (('a1', '1'), ('a2', '2')))"
rdd.collect()[1][0]
"("
How can I change the output format to something I can use?
This question already has an answer here:
How to save a spark dataframe as a text file without Rows in pyspark?
1 answer
How to restore RDD of (key,value) pairs after it has been stored/read from a text file
2 answers
python apache-spark pyspark
python apache-spark pyspark
edited Nov 25 '18 at 10:31
TrebuchetMS
2,4501722
2,4501722
asked Nov 25 '18 at 9:38
J DoeJ Doe
13
13
marked as duplicate by user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
I'd like to be able to use different values to perform transformations. ierdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04
add a comment |
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
I'd like to be able to use different values to perform transformations. ierdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
I'd like to be able to use different values to perform transformations. ie
rdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
I'd like to be able to use different values to perform transformations. ie
rdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04
add a comment |
0
active
oldest
votes
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
In which form would you like the rdd to be? What is a useful format for you?
– Yaron
Nov 25 '18 at 10:32
I'd like to be able to use different values to perform transformations. ie
rdd.collect()[1][0] = ('a1', '1')
– J Doe
Nov 25 '18 at 11:14
It's called save as test file, not for nothing
– thebluephantom
Nov 25 '18 at 11:37
So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?
– J Doe
Nov 25 '18 at 12:04