Accessing files newest to oldest in a efficient way

So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.

asked Nov 25 '18 at 21:58

John Becker

Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54

Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30

add a comment |

asked Nov 25 '18 at 21:58

John Becker

Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54

Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30

add a comment |

asked Nov 25 '18 at 21:58

John Becker

javascript node.js json performance filesystems

asked Nov 25 '18 at 21:58

John Becker

asked Nov 25 '18 at 21:58

John Becker

asked Nov 25 '18 at 21:58

John Becker

asked Nov 25 '18 at 21:58

John Becker

asked Nov 25 '18 at 21:58

John Becker

Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54

Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30

add a comment |

Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54

Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30

Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?

– Matt Way
Nov 25 '18 at 22:54

Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.

– John Becker
Nov 26 '18 at 0:30

add a comment |

2 Answers
2

active

oldest

votes

fs.stat would be useful in this case, and wouldn't require any modules in npm.

However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).

fs.stat returns an object which returns things such as when the file was edited, and when the file was created.

If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.

answered Nov 26 '18 at 1:16

Sarah Cross

916

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

add a comment |

What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.

All performance gains are usually a trade off of mem/caching and complexity.

The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472414%2faccessing-files-newest-to-oldest-in-a-efficient-way%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Nov 26 '18 at 1:16

Sarah Cross

916

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

add a comment |

answered Nov 26 '18 at 1:16

Sarah Cross

916

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

add a comment |

answered Nov 26 '18 at 1:16

Sarah Cross

916

answered Nov 26 '18 at 1:16

Sarah Cross

916

answered Nov 26 '18 at 1:16

Sarah Cross

916

answered Nov 26 '18 at 1:16

Sarah Cross

916

answered Nov 26 '18 at 1:16

Sarah Cross

916

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

add a comment |

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(

– John Becker
Nov 26 '18 at 1:24

add a comment |

All performance gains are usually a trade off of mem/caching and complexity.

The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

add a comment |

All performance gains are usually a trade off of mem/caching and complexity.

The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

add a comment |

All performance gains are usually a trade off of mem/caching and complexity.

The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

All performance gains are usually a trade off of mem/caching and complexity.

The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

answered Nov 26 '18 at 4:56

Matt Way

22.5k75869

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

add a comment |

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time

– John Becker
Nov 27 '18 at 5:56

Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.

– Matt Way
Nov 27 '18 at 6:49

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

x6Ar,WZBM5qFzUM RPYMXN RvrLb2H6 IOcFzOkkRRk R74Od9 ufYkyk an18P95sm6e nMWBnkVO86tHxy4XBtXBYk

搜尋此網誌

Btukfyl