Number duplicate count











up vote
1
down vote

favorite












I have a dataframe:



df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))

sample event start end
S1 1 100 350
S1 1 20 480
S2 4 30 60
S3 2 500 700
S4 3 300 300
S4 12 200 200


I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.



For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:



 sample event start end
S1 1 100 350
S1 1 20 480
S2 4 30 60
S3 2 500 700
S4.1 3 300 300
S4.2 12 200 200


Here's what I'm trying, which instead produces S4.2 and S4.2:



df %>% 
group_by(sample) %>%
dplyr::mutate(event_count = n_distinct(event)) %>%
dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))

sample event start end event_count sample_mod
1 S1 1 100 350 1 S1
2 S1 1 20 480 1 S1
3 S2 4 30 60 1 S2
4 S3 2 500 700 1 S3
5 S4 3 300 300 2 S4.2
6 S4 12 200 200 2 S4.2


How can I modify this mid-pipe to achieve my desired output?










share|improve this question


























    up vote
    1
    down vote

    favorite












    I have a dataframe:



    df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))

    sample event start end
    S1 1 100 350
    S1 1 20 480
    S2 4 30 60
    S3 2 500 700
    S4 3 300 300
    S4 12 200 200


    I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.



    For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:



     sample event start end
    S1 1 100 350
    S1 1 20 480
    S2 4 30 60
    S3 2 500 700
    S4.1 3 300 300
    S4.2 12 200 200


    Here's what I'm trying, which instead produces S4.2 and S4.2:



    df %>% 
    group_by(sample) %>%
    dplyr::mutate(event_count = n_distinct(event)) %>%
    dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))

    sample event start end event_count sample_mod
    1 S1 1 100 350 1 S1
    2 S1 1 20 480 1 S1
    3 S2 4 30 60 1 S2
    4 S3 2 500 700 1 S3
    5 S4 3 300 300 2 S4.2
    6 S4 12 200 200 2 S4.2


    How can I modify this mid-pipe to achieve my desired output?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a dataframe:



      df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))

      sample event start end
      S1 1 100 350
      S1 1 20 480
      S2 4 30 60
      S3 2 500 700
      S4 3 300 300
      S4 12 200 200


      I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.



      For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:



       sample event start end
      S1 1 100 350
      S1 1 20 480
      S2 4 30 60
      S3 2 500 700
      S4.1 3 300 300
      S4.2 12 200 200


      Here's what I'm trying, which instead produces S4.2 and S4.2:



      df %>% 
      group_by(sample) %>%
      dplyr::mutate(event_count = n_distinct(event)) %>%
      dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))

      sample event start end event_count sample_mod
      1 S1 1 100 350 1 S1
      2 S1 1 20 480 1 S1
      3 S2 4 30 60 1 S2
      4 S3 2 500 700 1 S3
      5 S4 3 300 300 2 S4.2
      6 S4 12 200 200 2 S4.2


      How can I modify this mid-pipe to achieve my desired output?










      share|improve this question













      I have a dataframe:



      df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))

      sample event start end
      S1 1 100 350
      S1 1 20 480
      S2 4 30 60
      S3 2 500 700
      S4 3 300 300
      S4 12 200 200


      I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.



      For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:



       sample event start end
      S1 1 100 350
      S1 1 20 480
      S2 4 30 60
      S3 2 500 700
      S4.1 3 300 300
      S4.2 12 200 200


      Here's what I'm trying, which instead produces S4.2 and S4.2:



      df %>% 
      group_by(sample) %>%
      dplyr::mutate(event_count = n_distinct(event)) %>%
      dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))

      sample event start end event_count sample_mod
      1 S1 1 100 350 1 S1
      2 S1 1 20 480 1 S1
      3 S2 4 30 60 1 S2
      4 S3 2 500 700 1 S3
      5 S4 3 300 300 2 S4.2
      6 S4 12 200 200 2 S4.2


      How can I modify this mid-pipe to achieve my desired output?







      r dplyr






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 17:22









      fugu

      4,46431741




      4,46431741
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)



          df %>% 
          group_by(sample) %>%
          mutate(n = n_distinct(event)) %>%
          ungroup %>%
          mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)),
          TRUE ~ as.character(sample)))
          # A tibble: 6 x 6
          # sample event start end n sample_mod
          # <fct> <dbl> <dbl> <dbl> <int> <chr>
          #1 S1 1 100 350 1 S1
          #2 S1 1 20 480 1 S1
          #3 S2 4 30 60 1 S2
          #4 S3 2 500 700 1 S3
          #5 S4 3 300 300 2 S4
          #6 S4 12 200 200 2 S4.1





          share|improve this answer























          • But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
            – fugu
            Nov 21 at 17:27










          • @fugu Please check the output. It is not renaming S1
            – akrun
            Nov 21 at 17:28












          • @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
            – akrun
            Nov 21 at 17:29






          • 1




            @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
            – akrun
            Nov 21 at 17:34






          • 1




            That was indeed the issue (re: reproducibility)
            – fugu
            Nov 21 at 17:35


















          up vote
          2
          down vote













          library(data.table)
          setDT(df)

          df[order(event)
          , sample := {
          rid <- rleid(event)
          if(any(rid > 1)) paste0(sample, '.', rid)
          else sample }
          , by = sample]
          df
          # sample event start end
          # 1: S1 1 100 350
          # 2: S1 1 20 480
          # 3: S2 4 30 60
          # 4: S3 2 500 700
          # 5: S4.1 3 300 300
          # 6: S4.2 12 200 200


          Data used: (note stringsAsFactors = F)



          df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)


          Benchmark:



          dt <- function(df){
          setDT(df)
          df[order(event)
          , sample := {
          rid <- rleid(event)
          if(any(rid > 1)) paste0(sample, '.', rid)
          else sample }
          , by = sample]
          }

          dply <- function(df){
          df %>%
          group_by(sample) %>%
          mutate(n = n_distinct(event)) %>%
          ungroup %>%
          mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)),
          TRUE ~ as.character(sample)))
          }

          df <- rbindlist(replicate(1000, df, simplify = F))

          microbenchmark::microbenchmark(dt(df), dply(df))
          # Unit: milliseconds
          # expr min lq mean median uq max neval
          # dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176 8.306448 100
          # dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181 100





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417518%2fnumber-duplicate-count%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)



            df %>% 
            group_by(sample) %>%
            mutate(n = n_distinct(event)) %>%
            ungroup %>%
            mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)),
            TRUE ~ as.character(sample)))
            # A tibble: 6 x 6
            # sample event start end n sample_mod
            # <fct> <dbl> <dbl> <dbl> <int> <chr>
            #1 S1 1 100 350 1 S1
            #2 S1 1 20 480 1 S1
            #3 S2 4 30 60 1 S2
            #4 S3 2 500 700 1 S3
            #5 S4 3 300 300 2 S4
            #6 S4 12 200 200 2 S4.1





            share|improve this answer























            • But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
              – fugu
              Nov 21 at 17:27










            • @fugu Please check the output. It is not renaming S1
              – akrun
              Nov 21 at 17:28












            • @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
              – akrun
              Nov 21 at 17:29






            • 1




              @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
              – akrun
              Nov 21 at 17:34






            • 1




              That was indeed the issue (re: reproducibility)
              – fugu
              Nov 21 at 17:35















            up vote
            2
            down vote



            accepted










            After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)



            df %>% 
            group_by(sample) %>%
            mutate(n = n_distinct(event)) %>%
            ungroup %>%
            mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)),
            TRUE ~ as.character(sample)))
            # A tibble: 6 x 6
            # sample event start end n sample_mod
            # <fct> <dbl> <dbl> <dbl> <int> <chr>
            #1 S1 1 100 350 1 S1
            #2 S1 1 20 480 1 S1
            #3 S2 4 30 60 1 S2
            #4 S3 2 500 700 1 S3
            #5 S4 3 300 300 2 S4
            #6 S4 12 200 200 2 S4.1





            share|improve this answer























            • But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
              – fugu
              Nov 21 at 17:27










            • @fugu Please check the output. It is not renaming S1
              – akrun
              Nov 21 at 17:28












            • @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
              – akrun
              Nov 21 at 17:29






            • 1




              @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
              – akrun
              Nov 21 at 17:34






            • 1




              That was indeed the issue (re: reproducibility)
              – fugu
              Nov 21 at 17:35













            up vote
            2
            down vote



            accepted







            up vote
            2
            down vote



            accepted






            After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)



            df %>% 
            group_by(sample) %>%
            mutate(n = n_distinct(event)) %>%
            ungroup %>%
            mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)),
            TRUE ~ as.character(sample)))
            # A tibble: 6 x 6
            # sample event start end n sample_mod
            # <fct> <dbl> <dbl> <dbl> <int> <chr>
            #1 S1 1 100 350 1 S1
            #2 S1 1 20 480 1 S1
            #3 S2 4 30 60 1 S2
            #4 S3 2 500 700 1 S3
            #5 S4 3 300 300 2 S4
            #6 S4 12 200 200 2 S4.1





            share|improve this answer














            After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)



            df %>% 
            group_by(sample) %>%
            mutate(n = n_distinct(event)) %>%
            ungroup %>%
            mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)),
            TRUE ~ as.character(sample)))
            # A tibble: 6 x 6
            # sample event start end n sample_mod
            # <fct> <dbl> <dbl> <dbl> <int> <chr>
            #1 S1 1 100 350 1 S1
            #2 S1 1 20 480 1 S1
            #3 S2 4 30 60 1 S2
            #4 S3 2 500 700 1 S3
            #5 S4 3 300 300 2 S4
            #6 S4 12 200 200 2 S4.1






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 21 at 17:33

























            answered Nov 21 at 17:25









            akrun

            391k13180253




            391k13180253












            • But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
              – fugu
              Nov 21 at 17:27










            • @fugu Please check the output. It is not renaming S1
              – akrun
              Nov 21 at 17:28












            • @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
              – akrun
              Nov 21 at 17:29






            • 1




              @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
              – akrun
              Nov 21 at 17:34






            • 1




              That was indeed the issue (re: reproducibility)
              – fugu
              Nov 21 at 17:35


















            • But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
              – fugu
              Nov 21 at 17:27










            • @fugu Please check the output. It is not renaming S1
              – akrun
              Nov 21 at 17:28












            • @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
              – akrun
              Nov 21 at 17:29






            • 1




              @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
              – akrun
              Nov 21 at 17:34






            • 1




              That was indeed the issue (re: reproducibility)
              – fugu
              Nov 21 at 17:35
















            But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
            – fugu
            Nov 21 at 17:27




            But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
            – fugu
            Nov 21 at 17:27












            @fugu Please check the output. It is not renaming S1
            – akrun
            Nov 21 at 17:28






            @fugu Please check the output. It is not renaming S1
            – akrun
            Nov 21 at 17:28














            @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
            – akrun
            Nov 21 at 17:29




            @fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
            – akrun
            Nov 21 at 17:29




            1




            1




            @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
            – akrun
            Nov 21 at 17:34




            @fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
            – akrun
            Nov 21 at 17:34




            1




            1




            That was indeed the issue (re: reproducibility)
            – fugu
            Nov 21 at 17:35




            That was indeed the issue (re: reproducibility)
            – fugu
            Nov 21 at 17:35












            up vote
            2
            down vote













            library(data.table)
            setDT(df)

            df[order(event)
            , sample := {
            rid <- rleid(event)
            if(any(rid > 1)) paste0(sample, '.', rid)
            else sample }
            , by = sample]
            df
            # sample event start end
            # 1: S1 1 100 350
            # 2: S1 1 20 480
            # 3: S2 4 30 60
            # 4: S3 2 500 700
            # 5: S4.1 3 300 300
            # 6: S4.2 12 200 200


            Data used: (note stringsAsFactors = F)



            df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)


            Benchmark:



            dt <- function(df){
            setDT(df)
            df[order(event)
            , sample := {
            rid <- rleid(event)
            if(any(rid > 1)) paste0(sample, '.', rid)
            else sample }
            , by = sample]
            }

            dply <- function(df){
            df %>%
            group_by(sample) %>%
            mutate(n = n_distinct(event)) %>%
            ungroup %>%
            mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)),
            TRUE ~ as.character(sample)))
            }

            df <- rbindlist(replicate(1000, df, simplify = F))

            microbenchmark::microbenchmark(dt(df), dply(df))
            # Unit: milliseconds
            # expr min lq mean median uq max neval
            # dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176 8.306448 100
            # dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181 100





            share|improve this answer



























              up vote
              2
              down vote













              library(data.table)
              setDT(df)

              df[order(event)
              , sample := {
              rid <- rleid(event)
              if(any(rid > 1)) paste0(sample, '.', rid)
              else sample }
              , by = sample]
              df
              # sample event start end
              # 1: S1 1 100 350
              # 2: S1 1 20 480
              # 3: S2 4 30 60
              # 4: S3 2 500 700
              # 5: S4.1 3 300 300
              # 6: S4.2 12 200 200


              Data used: (note stringsAsFactors = F)



              df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)


              Benchmark:



              dt <- function(df){
              setDT(df)
              df[order(event)
              , sample := {
              rid <- rleid(event)
              if(any(rid > 1)) paste0(sample, '.', rid)
              else sample }
              , by = sample]
              }

              dply <- function(df){
              df %>%
              group_by(sample) %>%
              mutate(n = n_distinct(event)) %>%
              ungroup %>%
              mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)),
              TRUE ~ as.character(sample)))
              }

              df <- rbindlist(replicate(1000, df, simplify = F))

              microbenchmark::microbenchmark(dt(df), dply(df))
              # Unit: milliseconds
              # expr min lq mean median uq max neval
              # dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176 8.306448 100
              # dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181 100





              share|improve this answer

























                up vote
                2
                down vote










                up vote
                2
                down vote









                library(data.table)
                setDT(df)

                df[order(event)
                , sample := {
                rid <- rleid(event)
                if(any(rid > 1)) paste0(sample, '.', rid)
                else sample }
                , by = sample]
                df
                # sample event start end
                # 1: S1 1 100 350
                # 2: S1 1 20 480
                # 3: S2 4 30 60
                # 4: S3 2 500 700
                # 5: S4.1 3 300 300
                # 6: S4.2 12 200 200


                Data used: (note stringsAsFactors = F)



                df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)


                Benchmark:



                dt <- function(df){
                setDT(df)
                df[order(event)
                , sample := {
                rid <- rleid(event)
                if(any(rid > 1)) paste0(sample, '.', rid)
                else sample }
                , by = sample]
                }

                dply <- function(df){
                df %>%
                group_by(sample) %>%
                mutate(n = n_distinct(event)) %>%
                ungroup %>%
                mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)),
                TRUE ~ as.character(sample)))
                }

                df <- rbindlist(replicate(1000, df, simplify = F))

                microbenchmark::microbenchmark(dt(df), dply(df))
                # Unit: milliseconds
                # expr min lq mean median uq max neval
                # dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176 8.306448 100
                # dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181 100





                share|improve this answer














                library(data.table)
                setDT(df)

                df[order(event)
                , sample := {
                rid <- rleid(event)
                if(any(rid > 1)) paste0(sample, '.', rid)
                else sample }
                , by = sample]
                df
                # sample event start end
                # 1: S1 1 100 350
                # 2: S1 1 20 480
                # 3: S2 4 30 60
                # 4: S3 2 500 700
                # 5: S4.1 3 300 300
                # 6: S4.2 12 200 200


                Data used: (note stringsAsFactors = F)



                df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)


                Benchmark:



                dt <- function(df){
                setDT(df)
                df[order(event)
                , sample := {
                rid <- rleid(event)
                if(any(rid > 1)) paste0(sample, '.', rid)
                else sample }
                , by = sample]
                }

                dply <- function(df){
                df %>%
                group_by(sample) %>%
                mutate(n = n_distinct(event)) %>%
                ungroup %>%
                mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)),
                TRUE ~ as.character(sample)))
                }

                df <- rbindlist(replicate(1000, df, simplify = F))

                microbenchmark::microbenchmark(dt(df), dply(df))
                # Unit: milliseconds
                # expr min lq mean median uq max neval
                # dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176 8.306448 100
                # dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181 100






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 21 at 17:48

























                answered Nov 21 at 17:41









                IceCreamToucan

                7,7001616




                7,7001616






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417518%2fnumber-duplicate-count%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)