Spark Scala Generating Random RDD with (1's and 0's )?

up vote
0
down vote

favorite

How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.

I know I can filter and do this but it won't be random. I want it to be as random as possible

var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)

I was exploring random RDDs in spark but could find something that meets my needs .

asked yesterday

Adurthi Ashwin Swarup

3251623

add a comment |

up vote
0
down vote

favorite

How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.

I know I can filter and do this but it won't be random. I want it to be as random as possible

var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)

I was exploring random RDDs in spark but could find something that meets my needs .

asked yesterday

Adurthi Ashwin Swarup

3251623

add a comment |

up vote
0
down vote

favorite

How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.

I know I can filter and do this but it won't be random. I want it to be as random as possible

var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)

I was exploring random RDDs in spark but could find something that meets my needs .

asked yesterday

Adurthi Ashwin Swarup

3251623

How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.

I know I can filter and do this but it won't be random. I want it to be as random as possible

var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)

I was exploring random RDDs in spark but could find something that meets my needs .

scala apache-spark

asked yesterday

Adurthi Ashwin Swarup

3251623

asked yesterday

Adurthi Ashwin Swarup

3251623

asked yesterday

Adurthi Ashwin Swarup

3251623

asked yesterday

Adurthi Ashwin Swarup

3251623

asked yesterday

Adurthi Ashwin Swarup

3251623

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:

import scala.util.Random



val arraySize = 15 // Total number of elements that you want

val numberOfOnes = 10 // From that total, how many do you want to be ones

val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s

val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s

val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists

val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle

val randomRDD = sc.parallelize(randomList) // RDD creation

randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs:

val arraySize = 15

val numberOfOnes = 10



val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd

val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd

val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros

val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {

  val rng = new scala.util.Random()

  iter.map((rng.nextInt, _))

}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values



shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it.

edited yesterday

answered yesterday

Joss

439618

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409149%2fspark-scala-generating-random-rdd-with-1s-and-0s%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:

import scala.util.Random



val arraySize = 15 // Total number of elements that you want

val numberOfOnes = 10 // From that total, how many do you want to be ones

val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s

val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s

val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists

val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle

val randomRDD = sc.parallelize(randomList) // RDD creation

randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs:

val arraySize = 15

val numberOfOnes = 10



val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd

val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd

val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros

val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {

  val rng = new scala.util.Random()

  iter.map((rng.nextInt, _))

}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values



shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it.

edited yesterday

answered yesterday

Joss

439618

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

add a comment |

up vote
0
down vote

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:

import scala.util.Random



val arraySize = 15 // Total number of elements that you want

val numberOfOnes = 10 // From that total, how many do you want to be ones

val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s

val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s

val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists

val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle

val randomRDD = sc.parallelize(randomList) // RDD creation

randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs:

val arraySize = 15

val numberOfOnes = 10



val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd

val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd

val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros

val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {

  val rng = new scala.util.Random()

  iter.map((rng.nextInt, _))

}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values



shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it.

edited yesterday

answered yesterday

Joss

439618

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

add a comment |

up vote
0
down vote

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:

import scala.util.Random



val arraySize = 15 // Total number of elements that you want

val numberOfOnes = 10 // From that total, how many do you want to be ones

val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s

val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s

val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists

val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle

val randomRDD = sc.parallelize(randomList) // RDD creation

randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs:

val arraySize = 15

val numberOfOnes = 10



val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd

val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd

val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros

val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {

  val rng = new scala.util.Random()

  iter.map((rng.nextInt, _))

}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values



shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it.

edited yesterday

answered yesterday

Joss

439618

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:

import scala.util.Random



val arraySize = 15 // Total number of elements that you want

val numberOfOnes = 10 // From that total, how many do you want to be ones

val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s

val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s

val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists

val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle

val randomRDD = sc.parallelize(randomList) // RDD creation

randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs:

val arraySize = 15

val numberOfOnes = 10



val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd

val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd

val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros

val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {

  val rng = new scala.util.Random()

  iter.map((rng.nextInt, _))

}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values



shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it.

edited yesterday

answered yesterday

Joss

439618

edited yesterday

answered yesterday

Joss

439618

answered yesterday

Joss

439618

answered yesterday

Joss

439618

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

add a comment |

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago

Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
– Joss
15 hours ago

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl