Database for product catalogue with multiple stores, hierarchical categories












0















I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.



The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.



The required queries:




  • Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

  • Get some subtree by given category ID for a given store with products currently available.

  • Get a tree for discounted-only categories for a given store with products currently available and discounted.

  • Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

  • Get the single product details with full attributes list.

  • Filter products by attributes.


Current solution is built on top of Oracle Database with the following schema:



Tables: stores, products, categories (hierarchy via MPTT), products_categories (references products and categories), productprices (references stores and products), attributes (references stores), productattributes (references attributes and products), attributevalues (references productattributes).



It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.



The query in question:



SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC


We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?



Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.










share|improve this question























  • Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

    – philipxy
    Nov 28 '18 at 0:58













  • My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

    – philipxy
    Nov 28 '18 at 0:58
















0















I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.



The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.



The required queries:




  • Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

  • Get some subtree by given category ID for a given store with products currently available.

  • Get a tree for discounted-only categories for a given store with products currently available and discounted.

  • Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

  • Get the single product details with full attributes list.

  • Filter products by attributes.


Current solution is built on top of Oracle Database with the following schema:



Tables: stores, products, categories (hierarchy via MPTT), products_categories (references products and categories), productprices (references stores and products), attributes (references stores), productattributes (references attributes and products), attributevalues (references productattributes).



It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.



The query in question:



SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC


We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?



Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.










share|improve this question























  • Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

    – philipxy
    Nov 28 '18 at 0:58













  • My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

    – philipxy
    Nov 28 '18 at 0:58














0












0








0








I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.



The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.



The required queries:




  • Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

  • Get some subtree by given category ID for a given store with products currently available.

  • Get a tree for discounted-only categories for a given store with products currently available and discounted.

  • Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

  • Get the single product details with full attributes list.

  • Filter products by attributes.


Current solution is built on top of Oracle Database with the following schema:



Tables: stores, products, categories (hierarchy via MPTT), products_categories (references products and categories), productprices (references stores and products), attributes (references stores), productattributes (references attributes and products), attributevalues (references productattributes).



It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.



The query in question:



SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC


We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?



Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.










share|improve this question














I'm seeking for assistance on the decision of the best database, be that relational one or not, as well as the best schema for the given task.



The idea is the following: there is an offline store network with multiple stores (~10). Each store has multiple products (~100000), that are shared between them, but the availability and the prices/discounts differ. The products are structured into hierarchical categories (~1000). Some categories can be discount-categories. Some aren't. Some products may be adult-only. Products have various attributes.



The required queries:




  • Get full categories tree (limited by a nesting level) for a given store with products currently available (filtered by adult flag).

  • Get some subtree by given category ID for a given store with products currently available.

  • Get a tree for discounted-only categories for a given store with products currently available and discounted.

  • Get the paginated list of currently available products for a given store in a given category including subcategories, filtered by adult.

  • Get the single product details with full attributes list.

  • Filter products by attributes.


Current solution is built on top of Oracle Database with the following schema:



Tables: stores, products, categories (hierarchy via MPTT), products_categories (references products and categories), productprices (references stores and products), attributes (references stores), productattributes (references attributes and products), attributevalues (references productattributes).



It sort of works, but starts to get slow. For example the query to get the categories with available products sometimes executes for less than a second, and sometimes it takes more than 30, depending on the data currently imported in the database.



The query in question:



SELECT "categories".* FROM "categories"
WHERE (
NOT ("categories"."external_id" = '_reserved' AND "categories"."external_id" IS NOT NULL)
AND "categories"."is_promo" = 0
AND "categories"."begins_on" <= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND "categories"."ends_on" >= to_timestamp('2018-11-27', 'YYYY-MM-DD')
AND (EXISTS(
SELECT U0."id" FROM "productprices" U0
INNER JOIN "products" U1 ON (U0."product_id" = U1."id")
INNER JOIN "products_categories" U2 ON (U1."id" = U2."product_id")
WHERE (
U2."category_id" = ("categories"."id")
AND U0."updated_at" >= to_timestamp('2018-11-27 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."store_id" = 42
AND U0."begins_on" <= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND U0."ends_on" >= to_timestamp('2018-11-27 09:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND (U0."discount" IS NOT NULL OR U1."has_special_offer" = 1)
AND U1."is_adult" = 0))
)
)
ORDER BY "categories"."tree_id" ASC, "categories"."lft" ASC


We're currently about to start migrating away from Oracle Database. The primary app database is going to be Postgres, but for a catalogue part of application I'm willing to take a look at the different storage. Or maybe I should just optimize the schema/queries?



Currently there are 15 stores, with ~4 stores added yearly.
There should never be more than 200 000 products, as far as I know. With 25 stores it limits the prices table by 5 000 000 rows for a foreseeable future.







database performance database-design nosql rdbms






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 27 '18 at 10:20









bossboss

1




1













  • Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

    – philipxy
    Nov 28 '18 at 0:58













  • My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

    – philipxy
    Nov 28 '18 at 0:58



















  • Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

    – philipxy
    Nov 28 '18 at 0:58













  • My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

    – philipxy
    Nov 28 '18 at 0:58

















Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58







Optimization is so dependent on details & "best" is so vague that it's not clear how much we can help you in so general a question--it's "too broad". What do you image an answer to be like? See How to Ask & other asking help. If you want to optimize a particular query give a Minimal, Complete, and Verifiable example including EXPLAIN. (See other well-received optimization questions.) PS Look into 'temporal data' & SQL support. See Date, Darwen & Lorentzos re designs & to understand the temporal role of 6NF. (If not their operators.) Avoid Snodgrass, he doesn't undestand the relational model.

– philipxy
Nov 28 '18 at 0:58















My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58





My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless you define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". meta.stackexchange.com/q/204461

– philipxy
Nov 28 '18 at 0:58












1 Answer
1






active

oldest

votes


















0














From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53497443%2fdatabase-for-product-catalogue-with-multiple-stores-hierarchical-categories%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.






    share|improve this answer




























      0














      From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.






      share|improve this answer


























        0












        0








        0







        From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.






        share|improve this answer













        From what I know, several million rows is OK for Postgres. I would not bother migrating to some NoSQL solution. Instead, focus on optimizing (proper model, indexes, etc) and sort out the imports you mentioned (prevent long locks, large transactions, etc). The query you sent should be easy to index (use partial indexes on Postgres). That's what I would do.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 28 '18 at 12:15









        Boris SchegolevBoris Schegolev

        3,21651629




        3,21651629
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53497443%2fdatabase-for-product-catalogue-with-multiple-stores-hierarchical-categories%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)