How do I split the training dataset into training, validation and test datasets?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I have a custom data set of images and its target. I have created a training data set in PyTorch. I want to split it into 3 parts: training, validation and test. How do I do it?










share|improve this question































    1















    I have a custom data set of images and its target. I have created a training data set in PyTorch. I want to split it into 3 parts: training, validation and test. How do I do it?










    share|improve this question



























      1












      1








      1


      2






      I have a custom data set of images and its target. I have created a training data set in PyTorch. I want to split it into 3 parts: training, validation and test. How do I do it?










      share|improve this question
















      I have a custom data set of images and its target. I have created a training data set in PyTorch. I want to split it into 3 parts: training, validation and test. How do I do it?







      machine-learning dataset conv-neural-network pytorch






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 7 at 14:34









      nbro

      5,786105198




      5,786105198










      asked Nov 29 '18 at 5:22









      SherlockSherlock

      416




      416
























          1 Answer
          1






          active

          oldest

          votes


















          2














          Once you have the "master" dataset you can use data.Subset to split it.

          Here's an example for random split



          import torch
          from torch.utils import data
          import random

          master = data.Dataset( ... ) # your "master" dataset
          n = len(master) # how many total elements you have
          n_test = int( n * .05 ) # number of test/val elements
          n_train = n - 2 * n_test
          idx = list(range(n)) # indices to all elements
          random.shuffle(idx) # in-place shuffle the indices to facilitate random splitting
          train_idx = idx[:n_train]
          val_idx = idx[n_train:(n_train + n_test)]
          test_idx = idx[(n_train + n_test):]

          train_set = data.Subset(master, train_idx)
          val_set = data.Subset(master, val_idx)
          test_set = data.Subset(master, test_idx)


          This can also be achieved using data.random_split:



          train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))





          share|improve this answer


























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53532352%2fhow-do-i-split-the-training-dataset-into-training-validation-and-test-datasets%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            Once you have the "master" dataset you can use data.Subset to split it.

            Here's an example for random split



            import torch
            from torch.utils import data
            import random

            master = data.Dataset( ... ) # your "master" dataset
            n = len(master) # how many total elements you have
            n_test = int( n * .05 ) # number of test/val elements
            n_train = n - 2 * n_test
            idx = list(range(n)) # indices to all elements
            random.shuffle(idx) # in-place shuffle the indices to facilitate random splitting
            train_idx = idx[:n_train]
            val_idx = idx[n_train:(n_train + n_test)]
            test_idx = idx[(n_train + n_test):]

            train_set = data.Subset(master, train_idx)
            val_set = data.Subset(master, val_idx)
            test_set = data.Subset(master, test_idx)


            This can also be achieved using data.random_split:



            train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))





            share|improve this answer






























              2














              Once you have the "master" dataset you can use data.Subset to split it.

              Here's an example for random split



              import torch
              from torch.utils import data
              import random

              master = data.Dataset( ... ) # your "master" dataset
              n = len(master) # how many total elements you have
              n_test = int( n * .05 ) # number of test/val elements
              n_train = n - 2 * n_test
              idx = list(range(n)) # indices to all elements
              random.shuffle(idx) # in-place shuffle the indices to facilitate random splitting
              train_idx = idx[:n_train]
              val_idx = idx[n_train:(n_train + n_test)]
              test_idx = idx[(n_train + n_test):]

              train_set = data.Subset(master, train_idx)
              val_set = data.Subset(master, val_idx)
              test_set = data.Subset(master, test_idx)


              This can also be achieved using data.random_split:



              train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))





              share|improve this answer




























                2












                2








                2







                Once you have the "master" dataset you can use data.Subset to split it.

                Here's an example for random split



                import torch
                from torch.utils import data
                import random

                master = data.Dataset( ... ) # your "master" dataset
                n = len(master) # how many total elements you have
                n_test = int( n * .05 ) # number of test/val elements
                n_train = n - 2 * n_test
                idx = list(range(n)) # indices to all elements
                random.shuffle(idx) # in-place shuffle the indices to facilitate random splitting
                train_idx = idx[:n_train]
                val_idx = idx[n_train:(n_train + n_test)]
                test_idx = idx[(n_train + n_test):]

                train_set = data.Subset(master, train_idx)
                val_set = data.Subset(master, val_idx)
                test_set = data.Subset(master, test_idx)


                This can also be achieved using data.random_split:



                train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))





                share|improve this answer















                Once you have the "master" dataset you can use data.Subset to split it.

                Here's an example for random split



                import torch
                from torch.utils import data
                import random

                master = data.Dataset( ... ) # your "master" dataset
                n = len(master) # how many total elements you have
                n_test = int( n * .05 ) # number of test/val elements
                n_train = n - 2 * n_test
                idx = list(range(n)) # indices to all elements
                random.shuffle(idx) # in-place shuffle the indices to facilitate random splitting
                train_idx = idx[:n_train]
                val_idx = idx[n_train:(n_train + n_test)]
                test_idx = idx[(n_train + n_test):]

                train_set = data.Subset(master, train_idx)
                val_set = data.Subset(master, val_idx)
                test_set = data.Subset(master, test_idx)


                This can also be achieved using data.random_split:



                train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 2 '18 at 5:59

























                answered Nov 29 '18 at 6:35









                ShaiShai

                70.8k23138247




                70.8k23138247
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53532352%2fhow-do-i-split-the-training-dataset-into-training-validation-and-test-datasets%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Lallio

                    Unable to find Lightning Node

                    Futebolista