How does awaitTermination() help for writeStream?
I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener
.
While going to through the documentation I stumbled upon this example:
StreamingQuery query = wordCounts.writeStream()
.outputMode("complete")
.format("console")
.start();
query.awaitTermination();
After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.
I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.
How is it helpful when query is writing millions of records every day?
My code looks pretty simple though:
dataset
.writeStream()
.option("startingOffsets", "earliest")
.outputMode(OutputMode.Append())
.format("console")
.foreach(sink)
.trigger(Trigger.ProcessingTime(triggerInterval))
.option("checkpointLocation", checkpointLocation)
.start();
apache-spark spark-structured-streaming
add a comment |
I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener
.
While going to through the documentation I stumbled upon this example:
StreamingQuery query = wordCounts.writeStream()
.outputMode("complete")
.format("console")
.start();
query.awaitTermination();
After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.
I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.
How is it helpful when query is writing millions of records every day?
My code looks pretty simple though:
dataset
.writeStream()
.option("startingOffsets", "earliest")
.outputMode(OutputMode.Append())
.format("console")
.foreach(sink)
.trigger(Trigger.ProcessingTime(triggerInterval))
.option("checkpointLocation", checkpointLocation)
.start();
apache-spark spark-structured-streaming
add a comment |
I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener
.
While going to through the documentation I stumbled upon this example:
StreamingQuery query = wordCounts.writeStream()
.outputMode("complete")
.format("console")
.start();
query.awaitTermination();
After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.
I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.
How is it helpful when query is writing millions of records every day?
My code looks pretty simple though:
dataset
.writeStream()
.option("startingOffsets", "earliest")
.outputMode(OutputMode.Append())
.format("console")
.foreach(sink)
.trigger(Trigger.ProcessingTime(triggerInterval))
.option("checkpointLocation", checkpointLocation)
.start();
apache-spark spark-structured-streaming
I have a Spark Structured Streaming job, it reads the offsets from a Kafka topic and writes it to the aerospike database. Currently I am in the process making this job production ready and implementing the SparkListener
.
While going to through the documentation I stumbled upon this example:
StreamingQuery query = wordCounts.writeStream()
.outputMode("complete")
.format("console")
.start();
query.awaitTermination();
After this code is executed, the streaming computation will have
started in the background. The query object is a handle to that active
streaming query, and we have decided to wait for the termination of
the query using awaitTermination() to prevent the process from exiting
while the query is active.
I understand that it waits for query to complete before terminating the process.
What does it mean exactly? It helps to avoid data loss written by the query.
How is it helpful when query is writing millions of records every day?
My code looks pretty simple though:
dataset
.writeStream()
.option("startingOffsets", "earliest")
.outputMode(OutputMode.Append())
.format("console")
.foreach(sink)
.trigger(Trigger.ProcessingTime(triggerInterval))
.option("checkpointLocation", checkpointLocation)
.start();
apache-spark spark-structured-streaming
apache-spark spark-structured-streaming
edited Dec 20 '18 at 20:16
Jacek Laskowski
46k18136275
46k18136275
asked Nov 28 '18 at 16:39
Himanshu YadavHimanshu Yadav
5,99434121227
5,99434121227
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
There are quite a few questions here, but answering just the one below should answer all.
I understand that it waits for query to complete before terminating the process. What does it mean exactly?
A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.
That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.
Read up on daemon threads in What is a daemon thread in Java?
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
add a comment |
I understand that it waits for query to complete before terminating the process.
What does it mean exactly
Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main
function and exit immediately.
How is it helpful when query is writing millions of records every day?
It really doesn't. It instead ensure that query is execute at all.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524198%2fhow-does-awaittermination-help-for-writestream%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are quite a few questions here, but answering just the one below should answer all.
I understand that it waits for query to complete before terminating the process. What does it mean exactly?
A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.
That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.
Read up on daemon threads in What is a daemon thread in Java?
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
add a comment |
There are quite a few questions here, but answering just the one below should answer all.
I understand that it waits for query to complete before terminating the process. What does it mean exactly?
A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.
That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.
Read up on daemon threads in What is a daemon thread in Java?
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
add a comment |
There are quite a few questions here, but answering just the one below should answer all.
I understand that it waits for query to complete before terminating the process. What does it mean exactly?
A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.
That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.
Read up on daemon threads in What is a daemon thread in Java?
There are quite a few questions here, but answering just the one below should answer all.
I understand that it waits for query to complete before terminating the process. What does it mean exactly?
A streaming query runs in a separate daemon thread. In Java, daemon threads are used to allow for parallel processing until the main thread of your Spark application finishes (dies). Right after the last non-daemon thread finishes, the JVM shuts down and the entire Spark application finishes.
That's why you need to keep the main non-daemon thread waiting for the other daemon threads so they can do their work.
Read up on daemon threads in What is a daemon thread in Java?
answered Dec 20 '18 at 20:16
Jacek LaskowskiJacek Laskowski
46k18136275
46k18136275
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
add a comment |
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
I am reading your gitbook to implement spark-streaming. Unfortunately, my team is still using spark 2.2.0 and encountering many bugs.
– Himanshu Yadav
Dec 20 '18 at 21:04
add a comment |
I understand that it waits for query to complete before terminating the process.
What does it mean exactly
Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main
function and exit immediately.
How is it helpful when query is writing millions of records every day?
It really doesn't. It instead ensure that query is execute at all.
add a comment |
I understand that it waits for query to complete before terminating the process.
What does it mean exactly
Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main
function and exit immediately.
How is it helpful when query is writing millions of records every day?
It really doesn't. It instead ensure that query is execute at all.
add a comment |
I understand that it waits for query to complete before terminating the process.
What does it mean exactly
Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main
function and exit immediately.
How is it helpful when query is writing millions of records every day?
It really doesn't. It instead ensure that query is execute at all.
I understand that it waits for query to complete before terminating the process.
What does it mean exactly
Nothing more, nothing less. Since query is started in background, without explicit blocking instruction your code would simply reach the end of main
function and exit immediately.
How is it helpful when query is writing millions of records every day?
It really doesn't. It instead ensure that query is execute at all.
answered Nov 28 '18 at 17:14
user10718453user10718453
211
211
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53524198%2fhow-does-awaittermination-help-for-writestream%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown