How does awaitTermination() help for writeStream?

I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener.

While going to through the documentation I stumbled upon this example:

    StreamingQuery query = wordCounts.writeStream()

      .outputMode("complete")

      .format("console")

      .start();

query.awaitTermination();

After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.

I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.

How is it helpful when query is writing millions of records every day?

My code looks pretty simple though:

dataset

        .writeStream()

        .option("startingOffsets", "earliest")

        .outputMode(OutputMode.Append())

        .format("console")

        .foreach(sink)

        .trigger(Trigger.ProcessingTime(triggerInterval))

        .option("checkpointLocation", checkpointLocation)

        .start();

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

add a comment |

    StreamingQuery query = wordCounts.writeStream()

      .outputMode("complete")

      .format("console")

      .start();

query.awaitTermination();

After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.

I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.

How is it helpful when query is writing millions of records every day?

My code looks pretty simple though:

dataset

        .writeStream()

        .option("startingOffsets", "earliest")

        .outputMode(OutputMode.Append())

        .format("console")

        .foreach(sink)

        .trigger(Trigger.ProcessingTime(triggerInterval))

        .option("checkpointLocation", checkpointLocation)

        .start();

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

add a comment |

    StreamingQuery query = wordCounts.writeStream()

      .outputMode("complete")

      .format("console")

      .start();

query.awaitTermination();

After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.

I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.

How is it helpful when query is writing millions of records every day?

My code looks pretty simple though:

dataset

        .writeStream()

        .option("startingOffsets", "earliest")

        .outputMode(OutputMode.Append())

        .format("console")

        .foreach(sink)

        .trigger(Trigger.ProcessingTime(triggerInterval))

        .option("checkpointLocation", checkpointLocation)

        .start();

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

    StreamingQuery query = wordCounts.writeStream()

      .outputMode("complete")

      .format("console")

      .start();

query.awaitTermination();

After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.

I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.

How is it helpful when query is writing millions of records every day?

My code looks pretty simple though:

dataset

        .writeStream()

        .option("startingOffsets", "earliest")

        .outputMode(OutputMode.Append())

        .format("console")

        .foreach(sink)

        .trigger(Trigger.ProcessingTime(triggerInterval))

        .option("checkpointLocation", checkpointLocation)

        .start();

apache-spark spark-structured-streaming

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

edited Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

asked Nov 28 '18 at 16:39

Himanshu Yadav

5,99434121227

add a comment |

2 Answers
2

active

oldest

votes

There are quite a few questions here, but answering just the one below should answer all.

I understand that it waits for query to complete before terminating the process. What does it mean exactly?

A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.

That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.

Read up on daemon threads in What is a daemon thread in Java?

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

add a comment |

I understand that it waits for query to complete before terminating the process.
What does it mean exactly

Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.

How is it helpful when query is writing millions of records every day?

It really doesn't. It instead ensure that query is execute at all.

answered Nov 28 '18 at 17:14

user10718453

211

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524198%2fhow-does-awaittermination-help-for-writestream%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There are quite a few questions here, but answering just the one below should answer all.

I understand that it waits for query to complete before terminating the process. What does it mean exactly?

That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.

Read up on daemon threads in What is a daemon thread in Java?

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

add a comment |

There are quite a few questions here, but answering just the one below should answer all.

I understand that it waits for query to complete before terminating the process. What does it mean exactly?

That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.

Read up on daemon threads in What is a daemon thread in Java?

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

add a comment |

There are quite a few questions here, but answering just the one below should answer all.

I understand that it waits for query to complete before terminating the process. What does it mean exactly?

That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.

Read up on daemon threads in What is a daemon thread in Java?

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

There are quite a few questions here, but answering just the one below should answer all.

I understand that it waits for query to complete before terminating the process. What does it mean exactly?

That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.

Read up on daemon threads in What is a daemon thread in Java?

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

answered Dec 20 '18 at 20:16

Jacek Laskowski

46k18136275

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

add a comment |

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.

– Himanshu Yadav
Dec 20 '18 at 21:04

add a comment |

I understand that it waits for query to complete before terminating the process.
What does it mean exactly

Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.

How is it helpful when query is writing millions of records every day?

It really doesn't. It instead ensure that query is execute at all.

answered Nov 28 '18 at 17:14

user10718453

211

add a comment |

I understand that it waits for query to complete before terminating the process.
What does it mean exactly

Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.

How is it helpful when query is writing millions of records every day?

It really doesn't. It instead ensure that query is execute at all.

answered Nov 28 '18 at 17:14

user10718453

211

add a comment |

I understand that it waits for query to complete before terminating the process.
What does it mean exactly

Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.

How is it helpful when query is writing millions of records every day?

It really doesn't. It instead ensure that query is execute at all.

answered Nov 28 '18 at 17:14

user10718453

211

I understand that it waits for query to complete before terminating the process.
What does it mean exactly

Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main function and exit immediately.

How is it helpful when query is writing millions of records every day?

It really doesn't. It instead ensure that query is execute at all.

answered Nov 28 '18 at 17:14

user10718453

211

answered Nov 28 '18 at 17:14

user10718453

211

answered Nov 28 '18 at 17:14

user10718453

211

answered Nov 28 '18 at 17:14

user10718453

211

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl