Spark Scala Generating Random RDD with (1's and 0's )?











up vote
0
down vote

favorite












How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.



I know I can filter and do this but it won't be random. I want it to be as random as possible



var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)


I was exploring random RDDs in spark but could find something that meets my needs .










share|improve this question


























    up vote
    0
    down vote

    favorite












    How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.



    I know I can filter and do this but it won't be random. I want it to be as random as possible



    var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)


    I was exploring random RDDs in spark but could find something that meets my needs .










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.



      I know I can filter and do this but it won't be random. I want it to be as random as possible



      var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)


      I was exploring random RDDs in spark but could find something that meets my needs .










      share|improve this question













      How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0.



      I know I can filter and do this but it won't be random. I want it to be as random as possible



      var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)


      I was exploring random RDDs in spark but could find something that meets my needs .







      scala apache-spark






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked yesterday









      Adurthi Ashwin Swarup

      3251623




      3251623
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:



          import scala.util.Random

          val arraySize = 15 // Total number of elements that you want
          val numberOfOnes = 10 // From that total, how many do you want to be ones
          val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
          val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
          val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
          val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
          val randomRDD = sc.parallelize(randomList) // RDD creation
          randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)


          Or, if you want to use only RDDs:



          val arraySize = 15
          val numberOfOnes = 10

          val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
          val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
          val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
          val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
          val rng = new scala.util.Random()
          iter.map((rng.nextInt, _))
          }).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

          shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)


          Let me know if it was what you need it.






          share|improve this answer























          • Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
            – Adurthi Ashwin Swarup
            17 hours ago










          • Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
            – Joss
            15 hours ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409149%2fspark-scala-generating-random-rdd-with-1s-and-0s%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:



          import scala.util.Random

          val arraySize = 15 // Total number of elements that you want
          val numberOfOnes = 10 // From that total, how many do you want to be ones
          val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
          val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
          val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
          val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
          val randomRDD = sc.parallelize(randomList) // RDD creation
          randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)


          Or, if you want to use only RDDs:



          val arraySize = 15
          val numberOfOnes = 10

          val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
          val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
          val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
          val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
          val rng = new scala.util.Random()
          iter.map((rng.nextInt, _))
          }).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

          shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)


          Let me know if it was what you need it.






          share|improve this answer























          • Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
            – Adurthi Ashwin Swarup
            17 hours ago










          • Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
            – Joss
            15 hours ago















          up vote
          0
          down vote













          Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:



          import scala.util.Random

          val arraySize = 15 // Total number of elements that you want
          val numberOfOnes = 10 // From that total, how many do you want to be ones
          val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
          val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
          val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
          val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
          val randomRDD = sc.parallelize(randomList) // RDD creation
          randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)


          Or, if you want to use only RDDs:



          val arraySize = 15
          val numberOfOnes = 10

          val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
          val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
          val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
          val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
          val rng = new scala.util.Random()
          iter.map((rng.nextInt, _))
          }).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

          shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)


          Let me know if it was what you need it.






          share|improve this answer























          • Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
            – Adurthi Ashwin Swarup
            17 hours ago










          • Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
            – Joss
            15 hours ago













          up vote
          0
          down vote










          up vote
          0
          down vote









          Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:



          import scala.util.Random

          val arraySize = 15 // Total number of elements that you want
          val numberOfOnes = 10 // From that total, how many do you want to be ones
          val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
          val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
          val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
          val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
          val randomRDD = sc.parallelize(randomList) // RDD creation
          randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)


          Or, if you want to use only RDDs:



          val arraySize = 15
          val numberOfOnes = 10

          val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
          val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
          val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
          val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
          val rng = new scala.util.Random()
          iter.map((rng.nextInt, _))
          }).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

          shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)


          Let me know if it was what you need it.






          share|improve this answer














          Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s:



          import scala.util.Random

          val arraySize = 15 // Total number of elements that you want
          val numberOfOnes = 10 // From that total, how many do you want to be ones
          val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
          val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
          val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
          val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
          val randomRDD = sc.parallelize(randomList) // RDD creation
          randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)


          Or, if you want to use only RDDs:



          val arraySize = 15
          val numberOfOnes = 10

          val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
          val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
          val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
          val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
          val rng = new scala.util.Random()
          iter.map((rng.nextInt, _))
          }).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

          shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)


          Let me know if it was what you need it.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          Joss

          439618




          439618












          • Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
            – Adurthi Ashwin Swarup
            17 hours ago










          • Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
            – Joss
            15 hours ago


















          • Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
            – Adurthi Ashwin Swarup
            17 hours ago










          • Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
            – Joss
            15 hours ago
















          Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
          – Adurthi Ashwin Swarup
          17 hours ago




          Once you collect is it not sending to driver Joss ? want to avoid move to driver and keep at executor level
          – Adurthi Ashwin Swarup
          17 hours ago












          Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
          – Joss
          15 hours ago




          Then you should use my second example, which does everything on the workers. The collect is not required, I did it only to show the result. What matters it's that in shuffleResult you have the RDD that you are looking for.
          – Joss
          15 hours ago


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53409149%2fspark-scala-generating-random-rdd-with-1s-and-0s%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Lallio

          Futebolista

          Jornalista