Database for product catalogue with multiple stores, hierarchical categories

I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.

The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.

The required queries:

Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

Get some subtree by given category ID for a given store with products currently available.

Get a tree for discounted-only categories for a given store with products currently available and discounted.

Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

Get the single product details with full attributes list.

Filter products by attributes.

Current solution is built on top of Oracle Database with the following schema:

Tables: stores, products, categories (hierarchy via MPTT), products_categories (references products and categories), productprices (references stores and products), attributes (references stores), productattributes (references attributes and products), attributevalues (references productattributes).

It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.

The query in question:

SELECT "categories".* FROM "categories"

WHERE (

  NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)

  AND "categories"."is_promo" = 0

  AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND (EXISTS(

    SELECT U0."id" FROM "productprices" U0

    INNER JOIN "products" U1 ON (U0."product_id" = U1."id")

    INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")

    WHERE (

      U2."category_id" = ("categories"."id")

      AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."store_id" = 42

      AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)

      AND U1."is_adult" = 0))

  )

)

ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC

We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?

Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.

asked Nov 27 '18 at 10:20

boss

Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58

My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58

add a comment |

I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.

The required queries:

Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

Get some subtree by given category ID for a given store with products currently available.

Get a tree for discounted-only categories for a given store with products currently available and discounted.

Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

Get the single product details with full attributes list.

Filter products by attributes.

Current solution is built on top of Oracle Database with the following schema:

The query in question:

SELECT "categories".* FROM "categories"

WHERE (

  NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)

  AND "categories"."is_promo" = 0

  AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND (EXISTS(

    SELECT U0."id" FROM "productprices" U0

    INNER JOIN "products" U1 ON (U0."product_id" = U1."id")

    INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")

    WHERE (

      U2."category_id" = ("categories"."id")

      AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."store_id" = 42

      AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)

      AND U1."is_adult" = 0))

  )

)

ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC

asked Nov 27 '18 at 10:20

boss

Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58

My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58

add a comment |

I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.

The required queries:

Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

Get some subtree by given category ID for a given store with products currently available.

Get a tree for discounted-only categories for a given store with products currently available and discounted.

Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

Get the single product details with full attributes list.

Filter products by attributes.

Current solution is built on top of Oracle Database with the following schema:

The query in question:

SELECT "categories".* FROM "categories"

WHERE (

  NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)

  AND "categories"."is_promo" = 0

  AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND (EXISTS(

    SELECT U0."id" FROM "productprices" U0

    INNER JOIN "products" U1 ON (U0."product_id" = U1."id")

    INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")

    WHERE (

      U2."category_id" = ("categories"."id")

      AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."store_id" = 42

      AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)

      AND U1."is_adult" = 0))

  )

)

ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC

asked Nov 27 '18 at 10:20

boss

I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.

The required queries:

Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

Get some subtree by given category ID for a given store with products currently available.

Get a tree for discounted-only categories for a given store with products currently available and discounted.

Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

Get the single product details with full attributes list.

Filter products by attributes.

Current solution is built on top of Oracle Database with the following schema:

The query in question:

SELECT "categories".* FROM "categories"

WHERE (

  NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)

  AND "categories"."is_promo" = 0

  AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')

  AND (EXISTS(

    SELECT U0."id" FROM "productprices" U0

    INNER JOIN "products" U1 ON (U0."product_id" = U1."id")

    INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")

    WHERE (

      U2."category_id" = ("categories"."id")

      AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."store_id" = 42

      AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')

      AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)

      AND U1."is_adult" = 0))

  )

)

ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC

database performance database-design nosql rdbms

asked Nov 27 '18 at 10:20

boss

asked Nov 27 '18 at 10:20

boss

asked Nov 27 '18 at 10:20

boss

asked Nov 27 '18 at 10:20

boss

asked Nov 27 '18 at 10:20

boss

Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58

My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58

add a comment |

Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58

My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58

Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58

My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58

add a comment |

1 Answer
1

active

oldest

votes

From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53497443%2fdatabase-for-product-catalogue-with-multiple-stores-hierarchical-categories%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

add a comment |

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

add a comment |

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

answered Nov 28 '18 at 12:15

Boris Schegolev

3,21651629

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl