Spark - python textFile creates weird rdd [duplicate]

Multi tool use

This question already has an answer here:

How to save a spark dataframe as a text file without Rows in pyspark?

1 answer

How to restore RDD of (key,value) pairs after it has been stored/read from a text file

2 answers

I saved an RDD with
rdd.saveAsTextFile("file_dir")

When I type
rdd = sc.textFile("path/to/file_dir") an RDD is created.

The only problem is that the RDD created isn't usable.

tail file "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1] "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1][0] "("

How can I change the output format to something I can use?

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

In which form would you like the rdd to be? What is a useful format for you?

– Yaron
Nov 25 '18 at 10:32

I'd like to be able to use different values to perform transformations. ie rdd.collect()[1][0] = ('a1', '1')

– J Doe
Nov 25 '18 at 11:14

It's called save as test file, not for nothing

– thebluephantom
Nov 25 '18 at 11:37

So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?

– J Doe
Nov 25 '18 at 12:04

add a comment |

This question already has an answer here:

How to save a spark dataframe as a text file without Rows in pyspark?

1 answer

How to restore RDD of (key,value) pairs after it has been stored/read from a text file

2 answers

I saved an RDD with
rdd.saveAsTextFile("file_dir")

When I type
rdd = sc.textFile("path/to/file_dir") an RDD is created.

The only problem is that the RDD created isn't usable.

tail file "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1] "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1][0] "("

How can I change the output format to something I can use?

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

In which form would you like the rdd to be? What is a useful format for you?

– Yaron
Nov 25 '18 at 10:32

I'd like to be able to use different values to perform transformations. ie rdd.collect()[1][0] = ('a1', '1')

– J Doe
Nov 25 '18 at 11:14

It's called save as test file, not for nothing

– thebluephantom
Nov 25 '18 at 11:37

So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?

– J Doe
Nov 25 '18 at 12:04

add a comment |

This question already has an answer here:

How to save a spark dataframe as a text file without Rows in pyspark?

1 answer

How to restore RDD of (key,value) pairs after it has been stored/read from a text file

2 answers

I saved an RDD with
rdd.saveAsTextFile("file_dir")

When I type
rdd = sc.textFile("path/to/file_dir") an RDD is created.

The only problem is that the RDD created isn't usable.

tail file "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1] "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1][0] "("

How can I change the output format to something I can use?

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

This question already has an answer here:

How to save a spark dataframe as a text file without Rows in pyspark?

1 answer

How to restore RDD of (key,value) pairs after it has been stored/read from a text file

2 answers

I saved an RDD with
rdd.saveAsTextFile("file_dir")

When I type
rdd = sc.textFile("path/to/file_dir") an RDD is created.

The only problem is that the RDD created isn't usable.

tail file "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1] "('a', (('a1', '1'), ('a2', '2')))"

rdd.collect()[1][0] "("

How can I change the output format to something I can use?

This question already has an answer here:

How to save a spark dataframe as a text file without Rows in pyspark?

1 answer

How to restore RDD of (key,value) pairs after it has been stored/read from a text file

2 answers

python apache-spark pyspark

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

edited Nov 25 '18 at 10:31

TrebuchetMS

2,4501722

asked Nov 25 '18 at 9:38

J Doe

asked Nov 25 '18 at 9:38

J Doe

asked Nov 25 '18 at 9:38

J Doe

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 12:57

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

In which form would you like the rdd to be? What is a useful format for you?

– Yaron
Nov 25 '18 at 10:32

I'd like to be able to use different values to perform transformations. ie rdd.collect()[1][0] = ('a1', '1')

– J Doe
Nov 25 '18 at 11:14

It's called save as test file, not for nothing

– thebluephantom
Nov 25 '18 at 11:37

So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?

– J Doe
Nov 25 '18 at 12:04

add a comment |

In which form would you like the rdd to be? What is a useful format for you?

– Yaron
Nov 25 '18 at 10:32

I'd like to be able to use different values to perform transformations. ie rdd.collect()[1][0] = ('a1', '1')

– J Doe
Nov 25 '18 at 11:14

It's called save as test file, not for nothing

– thebluephantom
Nov 25 '18 at 11:37

So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?

– J Doe
Nov 25 '18 at 12:04

In which form would you like the rdd to be? What is a useful format for you?

– Yaron
Nov 25 '18 at 10:32

I'd like to be able to use different values to perform transformations. ie rdd.collect()[1][0] = ('a1', '1')

– J Doe
Nov 25 '18 at 11:14

It's called save as test file, not for nothing

– thebluephantom
Nov 25 '18 at 11:37

So I have to manually convert back to a rdd? Or is there a method I can use within pyspark?

– J Doe
Nov 25 '18 at 12:04

add a comment |

0

active

oldest

votes

0

active

oldest

votes

0

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

AsOngMDflmFkLgKz,Ngx,sNV1jN9aAXSG8LvGKXl9h5Otyz68ZdwxSOa s72ATtIqgVZm1PBcd4YX

搜尋此網誌

Btukfyl

Spark - python textFile creates weird rdd [duplicate]

0

0

0

Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Command to identify the expired API token and generates the new token in shell