Accessing files newest to oldest in a efficient way












0















So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.










share|improve this question























  • Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

    – Matt Way
    Nov 25 '18 at 22:54











  • Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

    – John Becker
    Nov 26 '18 at 0:30


















0















So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.










share|improve this question























  • Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

    – Matt Way
    Nov 25 '18 at 22:54











  • Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

    – John Becker
    Nov 26 '18 at 0:30
















0












0








0








So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.










share|improve this question














So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.







javascript node.js json performance filesystems






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 21:58









John BeckerJohn Becker

61




61













  • Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

    – Matt Way
    Nov 25 '18 at 22:54











  • Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

    – John Becker
    Nov 26 '18 at 0:30





















  • Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

    – Matt Way
    Nov 25 '18 at 22:54











  • Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

    – John Becker
    Nov 26 '18 at 0:30



















Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54





Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54













Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30







Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30














2 Answers
2






active

oldest

votes


















0














fs.stat would be useful in this case, and wouldn't require any modules in npm.


However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).


fs.stat returns an object which returns things such as when the file was edited, and when the file was created.


If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.






share|improve this answer
























  • That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

    – John Becker
    Nov 26 '18 at 1:24



















0














What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.



All performance gains are usually a trade off of mem/caching and complexity.



The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?






share|improve this answer
























  • This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

    – John Becker
    Nov 27 '18 at 5:56











  • Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

    – Matt Way
    Nov 27 '18 at 6:49













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472414%2faccessing-files-newest-to-oldest-in-a-efficient-way%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














fs.stat would be useful in this case, and wouldn't require any modules in npm.


However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).


fs.stat returns an object which returns things such as when the file was edited, and when the file was created.


If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.






share|improve this answer
























  • That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

    – John Becker
    Nov 26 '18 at 1:24
















0














fs.stat would be useful in this case, and wouldn't require any modules in npm.


However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).


fs.stat returns an object which returns things such as when the file was edited, and when the file was created.


If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.






share|improve this answer
























  • That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

    – John Becker
    Nov 26 '18 at 1:24














0












0








0







fs.stat would be useful in this case, and wouldn't require any modules in npm.


However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).


fs.stat returns an object which returns things such as when the file was edited, and when the file was created.


If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.






share|improve this answer













fs.stat would be useful in this case, and wouldn't require any modules in npm.


However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).


fs.stat returns an object which returns things such as when the file was edited, and when the file was created.


If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 26 '18 at 1:16









Sarah CrossSarah Cross

916




916













  • That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

    – John Becker
    Nov 26 '18 at 1:24



















  • That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

    – John Becker
    Nov 26 '18 at 1:24

















That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24





That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24













0














What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.



All performance gains are usually a trade off of mem/caching and complexity.



The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?






share|improve this answer
























  • This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

    – John Becker
    Nov 27 '18 at 5:56











  • Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

    – Matt Way
    Nov 27 '18 at 6:49


















0














What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.



All performance gains are usually a trade off of mem/caching and complexity.



The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?






share|improve this answer
























  • This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

    – John Becker
    Nov 27 '18 at 5:56











  • Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

    – Matt Way
    Nov 27 '18 at 6:49
















0












0








0







What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.



All performance gains are usually a trade off of mem/caching and complexity.



The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?






share|improve this answer













What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.



All performance gains are usually a trade off of mem/caching and complexity.



The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 26 '18 at 4:56









Matt WayMatt Way

22.5k75869




22.5k75869













  • This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

    – John Becker
    Nov 27 '18 at 5:56











  • Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

    – Matt Way
    Nov 27 '18 at 6:49





















  • This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

    – John Becker
    Nov 27 '18 at 5:56











  • Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

    – Matt Way
    Nov 27 '18 at 6:49



















This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56





This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56













Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49







Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472414%2faccessing-files-newest-to-oldest-in-a-efficient-way%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)