Spark Scala Generating Random RDD with (1's and 0's )?
up vote
0
down vote
favorite
How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.
I know I can filter and do this but it won't be random. I want it to be as random as possible
var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)
I was exploring random RDDs in spark but could find something that meets my needs .
scala apache-spark
add a comment |
up vote
0
down vote
favorite
How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.
I know I can filter and do this but it won't be random. I want it to be as random as possible
var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)
I was exploring random RDDs in spark but could find something that meets my needs .
scala apache-spark
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.
I know I can filter and do this but it won't be random. I want it to be as random as possible
var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)
I was exploring random RDDs in spark but could find something that meets my needs .
scala apache-spark
How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.
I know I can filter and do this but it won't be random. I want it to be as random as possible
var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)
I was exploring random RDDs in spark but could find something that meets my needs .
scala apache-spark
scala apache-spark
asked yesterday
Adurthi Ashwin Swarup
3251623
3251623
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:
import scala.util.Random
val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)
Or, if you want to use only RDDs:
val arraySize = 15
val numberOfOnes = 10
val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
val rng = new scala.util.Random()
iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values
shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
Let me know if it was what you need it.
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. Thecollectis not required, I did it only to show the result. What matters it's that inshuffleResultyou have the RDD that you are looking for.
– Joss
15 hours ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:
import scala.util.Random
val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)
Or, if you want to use only RDDs:
val arraySize = 15
val numberOfOnes = 10
val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
val rng = new scala.util.Random()
iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values
shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
Let me know if it was what you need it.
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. Thecollectis not required, I did it only to show the result. What matters it's that inshuffleResultyou have the RDD that you are looking for.
– Joss
15 hours ago
add a comment |
up vote
0
down vote
Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:
import scala.util.Random
val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)
Or, if you want to use only RDDs:
val arraySize = 15
val numberOfOnes = 10
val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
val rng = new scala.util.Random()
iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values
shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
Let me know if it was what you need it.
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. Thecollectis not required, I did it only to show the result. What matters it's that inshuffleResultyou have the RDD that you are looking for.
– Joss
15 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:
import scala.util.Random
val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)
Or, if you want to use only RDDs:
val arraySize = 15
val numberOfOnes = 10
val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
val rng = new scala.util.Random()
iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values
shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
Let me know if it was what you need it.
Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:
import scala.util.Random
val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)
Or, if you want to use only RDDs:
val arraySize = 15
val numberOfOnes = 10
val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
val rng = new scala.util.Random()
iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values
shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
Let me know if it was what you need it.
edited yesterday
answered yesterday
Joss
439618
439618
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. Thecollectis not required, I did it only to show the result. What matters it's that inshuffleResultyou have the RDD that you are looking for.
– Joss
15 hours ago
add a comment |
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. Thecollectis not required, I did it only to show the result. What matters it's that inshuffleResultyou have the RDD that you are looking for.
– Joss
15 hours ago
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
– Adurthi Ashwin Swarup
17 hours ago
Then you should use my second example, which does everything on the workers. The
collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.– Joss
15 hours ago
Then you should use my second example, which does everything on the workers. The
collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.– Joss
15 hours ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409149%2fspark-scala-generating-random-rdd-with-1s-and-0s%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown