Spark Structured Streaming getting messages for last Kafka partition
I am using Spark Structured Streaming to read from Kafka topic.
Without any partition, Spark Structired Streaming consumer can read data.
But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.
I am using latest samples and binaries from Spark Structured Streaming website.
DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Am I missing anything?
java apache-spark apache-kafka spark-structured-streaming
|
show 3 more comments
I am using Spark Structured Streaming to read from Kafka topic.
Without any partition, Spark Structired Streaming consumer can read data.
But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.
I am using latest samples and binaries from Spark Structured Streaming website.
DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Am I missing anything?
java apache-spark apache-kafka spark-structured-streaming
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
Sure, but are you using--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?
– cricket_007
Nov 25 '18 at 16:14
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
You can useGetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more
– cricket_007
Nov 25 '18 at 17:55
|
show 3 more comments
I am using Spark Structured Streaming to read from Kafka topic.
Without any partition, Spark Structired Streaming consumer can read data.
But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.
I am using latest samples and binaries from Spark Structured Streaming website.
DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Am I missing anything?
java apache-spark apache-kafka spark-structured-streaming
I am using Spark Structured Streaming to read from Kafka topic.
Without any partition, Spark Structired Streaming consumer can read data.
But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.
I am using latest samples and binaries from Spark Structured Streaming website.
DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Am I missing anything?
java apache-spark apache-kafka spark-structured-streaming
java apache-spark apache-kafka spark-structured-streaming
edited Nov 26 '18 at 18:03
cricket_007
80.8k1142110
80.8k1142110
asked Nov 24 '18 at 22:35
Ashish NijaiAshish Nijai
15019
15019
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
Sure, but are you using--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?
– cricket_007
Nov 25 '18 at 16:14
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
You can useGetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more
– cricket_007
Nov 25 '18 at 17:55
|
show 3 more comments
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
Sure, but are you using--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?
– cricket_007
Nov 25 '18 at 16:14
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
You can useGetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more
– cricket_007
Nov 25 '18 at 17:55
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
Sure, but are you using
--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?– cricket_007
Nov 25 '18 at 16:14
Sure, but are you using
--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?– cricket_007
Nov 25 '18 at 16:14
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
You can use
GetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more– cricket_007
Nov 25 '18 at 17:55
You can use
GetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more– cricket_007
Nov 25 '18 at 17:55
|
show 3 more comments
1 Answer
1
active
oldest
votes
Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.
Found reference here Spark Structured Stream get messages from only one partition of Kafka
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462941%2fspark-structured-streaming-getting-messages-for-last-kafka-partition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.
Found reference here Spark Structured Stream get messages from only one partition of Kafka
add a comment |
Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.
Found reference here Spark Structured Stream get messages from only one partition of Kafka
add a comment |
Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.
Found reference here Spark Structured Stream get messages from only one partition of Kafka
Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.
Found reference here Spark Structured Stream get messages from only one partition of Kafka
answered Nov 27 '18 at 17:01
Ashish NijaiAshish Nijai
15019
15019
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462941%2fspark-structured-streaming-getting-messages-for-last-kafka-partition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app
– cricket_007
Nov 25 '18 at 0:06
I am using Kafka console producer for pushing data manually in topic.
– Ashish Nijai
Nov 25 '18 at 5:10
Sure, but are you using
--parse-keys=true
? If not, how are you checking which partitions that your messages are going into?– cricket_007
Nov 25 '18 at 16:14
I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.
– Ashish Nijai
Nov 25 '18 at 16:42
You can use
GetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more– cricket_007
Nov 25 '18 at 17:55