How does awaitTermination() help for writeStream?












1















I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener.

While going to through the documentation I stumbled upon this example:



    StreamingQuery query = wordCounts.writeStream()
.outputMode("complete")
.format("console")
.start();
query.awaitTermination();



After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.




I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.



How is it helpful when query is writing millions of records every day?



My code looks pretty simple though:



dataset
.writeStream()
.option("startingOffsets", "earliest")
.outputMode(OutputMode.Append())
.format("console")
.foreach(sink)
.trigger(Trigger.ProcessingTime(triggerInterval))
.option("checkpointLocation", checkpointLocation)
.start();









share|improve this question





























    1















    I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener.

    While going to through the documentation I stumbled upon this example:



        StreamingQuery query = wordCounts.writeStream()
    .outputMode("complete")
    .format("console")
    .start();
    query.awaitTermination();



    After this code is executed, the streaming computation will have
    started in the background. The query object is a handle to that active
    streaming query, and we have decided to wait for the termination of
    the query using awaitTermination() to prevent the process from exiting
    while the query is active.




    I understand that it waits for query to complete before terminating the process.
    What does it mean exactly? It helps to avoid data loss written by the query.



    How is it helpful when query is writing millions of records every day?



    My code looks pretty simple though:



    dataset
    .writeStream()
    .option("startingOffsets", "earliest")
    .outputMode(OutputMode.Append())
    .format("console")
    .foreach(sink)
    .trigger(Trigger.ProcessingTime(triggerInterval))
    .option("checkpointLocation", checkpointLocation)
    .start();









    share|improve this question



























      1












      1








      1


      1






      I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener.

      While going to through the documentation I stumbled upon this example:



          StreamingQuery query = wordCounts.writeStream()
      .outputMode("complete")
      .format("console")
      .start();
      query.awaitTermination();



      After this code is executed, the streaming computation will have
      started in the background. The query object is a handle to that active
      streaming query, and we have decided to wait for the termination of
      the query using awaitTermination() to prevent the process from exiting
      while the query is active.




      I understand that it waits for query to complete before terminating the process.
      What does it mean exactly? It helps to avoid data loss written by the query.



      How is it helpful when query is writing millions of records every day?



      My code looks pretty simple though:



      dataset
      .writeStream()
      .option("startingOffsets", "earliest")
      .outputMode(OutputMode.Append())
      .format("console")
      .foreach(sink)
      .trigger(Trigger.ProcessingTime(triggerInterval))
      .option("checkpointLocation", checkpointLocation)
      .start();









      share|improve this question
















      I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener.

      While going to through the documentation I stumbled upon this example:



          StreamingQuery query = wordCounts.writeStream()
      .outputMode("complete")
      .format("console")
      .start();
      query.awaitTermination();



      After this code is executed, the streaming computation will have
      started in the background. The query object is a handle to that active
      streaming query, and we have decided to wait for the termination of
      the query using awaitTermination() to prevent the process from exiting
      while the query is active.




      I understand that it waits for query to complete before terminating the process.
      What does it mean exactly? It helps to avoid data loss written by the query.



      How is it helpful when query is writing millions of records every day?



      My code looks pretty simple though:



      dataset
      .writeStream()
      .option("startingOffsets", "earliest")
      .outputMode(OutputMode.Append())
      .format("console")
      .foreach(sink)
      .trigger(Trigger.ProcessingTime(triggerInterval))
      .option("checkpointLocation", checkpointLocation)
      .start();






      apache-spark spark-structured-streaming






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 20 '18 at 20:16









      Jacek Laskowski

      46k18136275




      46k18136275










      asked Nov 28 '18 at 16:39









      Himanshu YadavHimanshu Yadav

      5,99434121227




      5,99434121227
























          2 Answers
          2






          active

          oldest

          votes


















          1














          There are quite a few questions here, but answering just the one below should answer all.




          I understand that it waits for query to complete before terminating the process. What does it mean exactly?




          A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.



          That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.



          Read up on daemon threads in What is a daemon thread in Java?






          share|improve this answer
























          • I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

            – Himanshu Yadav
            Dec 20 '18 at 21:04



















          2















          I understand that it waits for query to complete before terminating the process.
          What does it mean exactly




          Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.




          How is it helpful when query is writing millions of records every day?




          It really doesn't. It instead ensure that query is execute at all.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524198%2fhow-does-awaittermination-help-for-writestream%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            There are quite a few questions here, but answering just the one below should answer all.




            I understand that it waits for query to complete before terminating the process. What does it mean exactly?




            A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.



            That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.



            Read up on daemon threads in What is a daemon thread in Java?






            share|improve this answer
























            • I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

              – Himanshu Yadav
              Dec 20 '18 at 21:04
















            1














            There are quite a few questions here, but answering just the one below should answer all.




            I understand that it waits for query to complete before terminating the process. What does it mean exactly?




            A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.



            That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.



            Read up on daemon threads in What is a daemon thread in Java?






            share|improve this answer
























            • I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

              – Himanshu Yadav
              Dec 20 '18 at 21:04














            1












            1








            1







            There are quite a few questions here, but answering just the one below should answer all.




            I understand that it waits for query to complete before terminating the process. What does it mean exactly?




            A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.



            That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.



            Read up on daemon threads in What is a daemon thread in Java?






            share|improve this answer













            There are quite a few questions here, but answering just the one below should answer all.




            I understand that it waits for query to complete before terminating the process. What does it mean exactly?




            A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.



            That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.



            Read up on daemon threads in What is a daemon thread in Java?







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 20 '18 at 20:16









            Jacek LaskowskiJacek Laskowski

            46k18136275




            46k18136275













            • I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

              – Himanshu Yadav
              Dec 20 '18 at 21:04



















            • I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

              – Himanshu Yadav
              Dec 20 '18 at 21:04

















            I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

            – Himanshu Yadav
            Dec 20 '18 at 21:04





            I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

            – Himanshu Yadav
            Dec 20 '18 at 21:04













            2















            I understand that it waits for query to complete before terminating the process.
            What does it mean exactly




            Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.




            How is it helpful when query is writing millions of records every day?




            It really doesn't. It instead ensure that query is execute at all.






            share|improve this answer




























              2















              I understand that it waits for query to complete before terminating the process.
              What does it mean exactly




              Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.




              How is it helpful when query is writing millions of records every day?




              It really doesn't. It instead ensure that query is execute at all.






              share|improve this answer


























                2












                2








                2








                I understand that it waits for query to complete before terminating the process.
                What does it mean exactly




                Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.




                How is it helpful when query is writing millions of records every day?




                It really doesn't. It instead ensure that query is execute at all.






                share|improve this answer














                I understand that it waits for query to complete before terminating the process.
                What does it mean exactly




                Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.




                How is it helpful when query is writing millions of records every day?




                It really doesn't. It instead ensure that query is execute at all.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 28 '18 at 17:14









                user10718453user10718453

                211




                211






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524198%2fhow-does-awaittermination-help-for-writestream%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)