Metrics collection and analysis architecture

We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.

Let's say we have 10 000 devices. They send one collection of metrics every 5 seconds. So each second we need to receive 10000/5=2000 collections. The end-user needs to see graphs of each metric in the specified period of time (1 week, month, year, etc.). So each day the system will receive 172,8 millions of records. Here come a lot of questions.

First of all, there's no need to store all data, as the user needs only graphs of the specified period, so it needs some aggregation. What database solution fits it? I believe no RDMS will handle such amount of data. Then, how to get average data of metrics to present it to the end-user?

AWS has shared time-series data processing architecture:
enter image description here

Very simplified I think of it this way:

Devices push data directly to DynamoDB using HTTP API

Metrics are stored in one table per 24 hours

At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period.

Old tables are stored in RedShift for further applications.

Has anyone already done something similar before? Maybe there is simpler architecture?

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

add a comment |

We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.

AWS has shared time-series data processing architecture:
enter image description here

Very simplified I think of it this way:

Devices push data directly to DynamoDB using HTTP API

Metrics are stored in one table per 24 hours

At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period.

Old tables are stored in RedShift for further applications.

Has anyone already done something similar before? Maybe there is simpler architecture?

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

add a comment |

We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.

AWS has shared time-series data processing architecture:
enter image description here

Very simplified I think of it this way:

Devices push data directly to DynamoDB using HTTP API

Metrics are stored in one table per 24 hours

At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period.

Old tables are stored in RedShift for further applications.

Has anyone already done something similar before? Maybe there is simpler architecture?

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

We are working on HomeKit-enabled IoT devices. HomeKit is designed for consumer use and does not have the ability to collect metrics (power, temperature, etc.), so we need to implement it separately.

AWS has shared time-series data processing architecture:
enter image description here

Very simplified I think of it this way:

Devices push data directly to DynamoDB using HTTP API

Metrics are stored in one table per 24 hours

At the end of the day some procedure runs on Elastic Map Reduce and
produces ready JSON files with data required to show graphs per time
period.

Old tables are stored in RedShift for further applications.

Has anyone already done something similar before? Maybe there is simpler architecture?

database amazon-web-services architecture bigdata iot

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

edited Nov 26 '18 at 10:01

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

asked Nov 26 '18 at 9:55

Nikita Zernov

2,66162352

add a comment |

1 Answer
1

active

oldest

votes

This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase

You can use Spark to read the data as stream. The steamed data can be store in HDFS file system that allows you to store large file across the Hadoop cluster. You can use map reduce algorithm to get the required data set from HDFS and store in HBase which is the Hadoop database. HDFS is distributed, scalable and big data store to store the records. Finally, you can use the query tools to query the hbase.

IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.

The reason I am suggesting this architecture is for
scalability. The input data can grow based on the number of IOT devices. In the above architecture, infrastructure is distributed and the nodes in the cluster can grow without limit.

This is proven architecture in big data analytics application.

answered Dec 1 '18 at 4:14

challenger

1,0981109

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53478569%2fmetrics-collection-and-analysis-architecture%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase

IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.

This is proven architecture in big data analytics application.

answered Dec 1 '18 at 4:14

challenger

1,0981109

add a comment |

This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase

IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.

This is proven architecture in big data analytics application.

answered Dec 1 '18 at 4:14

challenger

1,0981109

add a comment |

This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase

IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.

This is proven architecture in big data analytics application.

answered Dec 1 '18 at 4:14

challenger

1,0981109

This requires bigdata infrastructure like
1) Hadoop cluster
2) Spark
3) HDFS
4) HBase

IOT data --> Spark --> HDFS --> Map/Reduce --> HBase -- > Query Hbase.

This is proven architecture in big data analytics application.

answered Dec 1 '18 at 4:14

challenger

1,0981109

answered Dec 1 '18 at 4:14

challenger

1,0981109

answered Dec 1 '18 at 4:14

challenger

1,0981109

answered Dec 1 '18 at 4:14

challenger

1,0981109

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

AY6rTGbS9AX ap3Tb5o9S,U

搜尋此網誌

Btukfyl