Database for product catalogue with multiple stores, hierarchical categories
I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.
The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.
The required queries:
- Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).
- Get some subtree by given category ID for a given store with products currently available.
- Get a tree for discounted-only categories for a given store with products currently available and discounted.
- Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.
- Get the single product details with full attributes list.
- Filter products by attributes.
Current solution is built on top of Oracle Database with the following schema:
Tables: stores
, products
, categories
(hierarchy via MPTT), products_categories
(references products
and categories
), productprices
(references stores
and products
), attributes
(references stores
), productattributes
(references attributes
and products
), attributevalues
(references productattributes
).
It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.
The query in question:
SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC
We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?
Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.
database performance database-design nosql rdbms
add a comment |
I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.
The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.
The required queries:
- Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).
- Get some subtree by given category ID for a given store with products currently available.
- Get a tree for discounted-only categories for a given store with products currently available and discounted.
- Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.
- Get the single product details with full attributes list.
- Filter products by attributes.
Current solution is built on top of Oracle Database with the following schema:
Tables: stores
, products
, categories
(hierarchy via MPTT), products_categories
(references products
and categories
), productprices
(references stores
and products
), attributes
(references stores
), productattributes
(references attributes
and products
), attributevalues
(references productattributes
).
It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.
The query in question:
SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC
We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?
Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.
database performance database-design nosql rdbms
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58
add a comment |
I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.
The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.
The required queries:
- Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).
- Get some subtree by given category ID for a given store with products currently available.
- Get a tree for discounted-only categories for a given store with products currently available and discounted.
- Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.
- Get the single product details with full attributes list.
- Filter products by attributes.
Current solution is built on top of Oracle Database with the following schema:
Tables: stores
, products
, categories
(hierarchy via MPTT), products_categories
(references products
and categories
), productprices
(references stores
and products
), attributes
(references stores
), productattributes
(references attributes
and products
), attributevalues
(references productattributes
).
It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.
The query in question:
SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC
We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?
Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.
database performance database-design nosql rdbms
I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.
The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.
The required queries:
- Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).
- Get some subtree by given category ID for a given store with products currently available.
- Get a tree for discounted-only categories for a given store with products currently available and discounted.
- Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.
- Get the single product details with full attributes list.
- Filter products by attributes.
Current solution is built on top of Oracle Database with the following schema:
Tables: stores
, products
, categories
(hierarchy via MPTT), products_categories
(references products
and categories
), productprices
(references stores
and products
), attributes
(references stores
), productattributes
(references attributes
and products
), attributevalues
(references productattributes
).
It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.
The query in question:
SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC
We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?
Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.
database performance database-design nosql rdbms
database performance database-design nosql rdbms
asked Nov 27 '18 at 10:20
bossboss
1
1
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58
add a comment |
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58
add a comment |
1 Answer
1
active
oldest
votes
From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53497443%2fdatabase-for-product-catalogue-with-multiple-stores-hierarchical-categories%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.
add a comment |
From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.
add a comment |
From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.
From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.
answered Nov 28 '18 at 12:15
Boris SchegolevBoris Schegolev
3,21651629
3,21651629
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53497443%2fdatabase-for-product-catalogue-with-multiple-stores-hierarchical-categories%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.
– philipxy
Nov 28 '18 at 0:58
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461
– philipxy
Nov 28 '18 at 0:58