Metrics collection and analysis architecture
We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.
Let's say we have 10 000 devices. They send one collection of metrics every 5 seconds. So each second we need to receive 10000/5=2000 collections. The end-user needs to see graphs of each metric in the specified period of time (1 week, month, year, etc.). So each day the system will receive 172,8 millions of records. Here come a lot of questions.
First of all, there's no need to store all data, as the user needs only graphs of the specified period, so it needs some aggregation. What database solution fits it? I believe no RDMS will handle such amount of data. Then, how to get average data of metrics to present it to the end-user?
AWS has shared time-series data processing architecture:
Very simplified I think of it this way:
- Devices push data directly to DynamoDB using HTTP API
- Metrics are stored in one table per 24 hours
- At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period. - Old tables are stored in RedShift for further applications.
Has anyone already done something similar before? Maybe there is simpler architecture?
database amazon-web-services architecture bigdata iot
add a comment |
We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.
Let's say we have 10 000 devices. They send one collection of metrics every 5 seconds. So each second we need to receive 10000/5=2000 collections. The end-user needs to see graphs of each metric in the specified period of time (1 week, month, year, etc.). So each day the system will receive 172,8 millions of records. Here come a lot of questions.
First of all, there's no need to store all data, as the user needs only graphs of the specified period, so it needs some aggregation. What database solution fits it? I believe no RDMS will handle such amount of data. Then, how to get average data of metrics to present it to the end-user?
AWS has shared time-series data processing architecture:
Very simplified I think of it this way:
- Devices push data directly to DynamoDB using HTTP API
- Metrics are stored in one table per 24 hours
- At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period. - Old tables are stored in RedShift for further applications.
Has anyone already done something similar before? Maybe there is simpler architecture?
database amazon-web-services architecture bigdata iot
add a comment |
We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.
Let's say we have 10 000 devices. They send one collection of metrics every 5 seconds. So each second we need to receive 10000/5=2000 collections. The end-user needs to see graphs of each metric in the specified period of time (1 week, month, year, etc.). So each day the system will receive 172,8 millions of records. Here come a lot of questions.
First of all, there's no need to store all data, as the user needs only graphs of the specified period, so it needs some aggregation. What database solution fits it? I believe no RDMS will handle such amount of data. Then, how to get average data of metrics to present it to the end-user?
AWS has shared time-series data processing architecture:
Very simplified I think of it this way:
- Devices push data directly to DynamoDB using HTTP API
- Metrics are stored in one table per 24 hours
- At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period. - Old tables are stored in RedShift for further applications.
Has anyone already done something similar before? Maybe there is simpler architecture?
database amazon-web-services architecture bigdata iot
We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.
Let's say we have 10 000 devices. They send one collection of metrics every 5 seconds. So each second we need to receive 10000/5=2000 collections. The end-user needs to see graphs of each metric in the specified period of time (1 week, month, year, etc.). So each day the system will receive 172,8 millions of records. Here come a lot of questions.
First of all, there's no need to store all data, as the user needs only graphs of the specified period, so it needs some aggregation. What database solution fits it? I believe no RDMS will handle such amount of data. Then, how to get average data of metrics to present it to the end-user?
AWS has shared time-series data processing architecture:
Very simplified I think of it this way:
- Devices push data directly to DynamoDB using HTTP API
- Metrics are stored in one table per 24 hours
- At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period. - Old tables are stored in RedShift for further applications.
Has anyone already done something similar before? Maybe there is simpler architecture?
database amazon-web-services architecture bigdata iot
database amazon-web-services architecture bigdata iot
edited Nov 26 '18 at 10:01
Nikita Zernov
asked Nov 26 '18 at 9:55
Nikita ZernovNikita Zernov
2,66162352
2,66162352
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase
You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.
IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.
The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.
This is proven architecture in big data analytics application.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53478569%2fmetrics-collection-and-analysis-architecture%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase
You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.
IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.
The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.
This is proven architecture in big data analytics application.
add a comment |
This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase
You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.
IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.
The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.
This is proven architecture in big data analytics application.
add a comment |
This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase
You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.
IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.
The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.
This is proven architecture in big data analytics application.
This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase
You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.
IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.
The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.
This is proven architecture in big data analytics application.
answered Dec 1 '18 at 4:14
challengerchallenger
1,0981109
1,0981109
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53478569%2fmetrics-collection-and-analysis-architecture%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown