Number duplicate count

up vote
1
down vote

favorite

I have a dataframe:

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))



 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4     3   300 300

     S4    12   200 200

I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.

For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:

 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4.1     3   300 300

     S4.2    12   200 200

Here's what I'm trying, which instead produces S4.2 and S4.2:

df %>% 

    group_by(sample) %>% 

    dplyr::mutate(event_count = n_distinct(event)) %>% 

    dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))



sample event start   end event_count sample_mod

1 S1         1   100   350           1 S1        

2 S1         1    20   480           1 S1        

3 S2         4    30    60           1 S2        

4 S3         2   500   700           1 S3        

5 S4         3   300   300           2 S4.2      

6 S4        12   200   200           2 S4.2

How can I modify this mid-pipe to achieve my desired output?

asked Nov 21 at 17:22

fugu

4,46431741

add a comment |

up vote
1
down vote

favorite

I have a dataframe:

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))



 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4     3   300 300

     S4    12   200 200

I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.

For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:

 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4.1     3   300 300

     S4.2    12   200 200

Here's what I'm trying, which instead produces S4.2 and S4.2:

df %>% 

    group_by(sample) %>% 

    dplyr::mutate(event_count = n_distinct(event)) %>% 

    dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))



sample event start   end event_count sample_mod

1 S1         1   100   350           1 S1        

2 S1         1    20   480           1 S1        

3 S2         4    30    60           1 S2        

4 S3         2   500   700           1 S3        

5 S4         3   300   300           2 S4.2      

6 S4        12   200   200           2 S4.2

How can I modify this mid-pipe to achieve my desired output?

asked Nov 21 at 17:22

fugu

4,46431741

add a comment |

up vote
1
down vote

favorite

I have a dataframe:

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))



 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4     3   300 300

     S4    12   200 200

I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.

For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:

 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4.1     3   300 300

     S4.2    12   200 200

Here's what I'm trying, which instead produces S4.2 and S4.2:

df %>% 

    group_by(sample) %>% 

    dplyr::mutate(event_count = n_distinct(event)) %>% 

    dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))



sample event start   end event_count sample_mod

1 S1         1   100   350           1 S1        

2 S1         1    20   480           1 S1        

3 S2         4    30    60           1 S2        

4 S3         2   500   700           1 S3        

5 S4         3   300   300           2 S4.2      

6 S4        12   200   200           2 S4.2

How can I modify this mid-pipe to achieve my desired output?

asked Nov 21 at 17:22

fugu

4,46431741

I have a dataframe:

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200))



 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4     3   300 300

     S4    12   200 200

I want to count the number of distinct events in each sample, and mutate the sample name to reflect this.

For example sample S4 has two distinct events, 3 and 12. Here I would want to achieve this result:

 sample event start end

     S1     1   100 350

     S1     1    20 480

     S2     4    30  60

     S3     2   500 700

     S4.1     3   300 300

     S4.2    12   200 200

Here's what I'm trying, which instead produces S4.2 and S4.2:

df %>% 

    group_by(sample) %>% 

    dplyr::mutate(event_count = n_distinct(event)) %>% 

    dplyr::mutate(sample_mod = as.character(ifelse(event_count == 1, as.character(sample), paste(sample, event_count, sep = '.'))))



sample event start   end event_count sample_mod

1 S1         1   100   350           1 S1        

2 S1         1    20   480           1 S1        

3 S2         4    30    60           1 S2        

4 S3         2   500   700           1 S3        

5 S4         3   300   300           2 S4.2      

6 S4        12   200   200           2 S4.2

How can I modify this mid-pipe to achieve my desired output?

r dplyr

asked Nov 21 at 17:22

fugu

4,46431741

asked Nov 21 at 17:22

fugu

4,46431741

asked Nov 21 at 17:22

fugu

4,46431741

asked Nov 21 at 17:22

fugu

4,46431741

asked Nov 21 at 17:22

fugu

4,46431741

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)

df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

# A tibble: 6 x 6

#  sample event start   end     n sample_mod

#  <fct>  <dbl> <dbl> <dbl> <int> <chr>     

#1 S1         1   100   350     1 S1        

#2 S1         1    20   480     1 S1        

#3 S2         4    30    60     1 S2        

#4 S3         2   500   700     1 S3        

#5 S4         3   300   300     2 S4        

#6 S4        12   200   200     2 S4.1

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

1

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

1

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

|
show 1 more comment

up vote
2
down vote

library(data.table)

setDT(df)



df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

df

#    sample event start end

# 1:     S1     1   100 350

# 2:     S1     1    20 480

# 3:     S2     4    30  60

# 4:     S3     2   500 700

# 5:   S4.1     3   300 300

# 6:   S4.2    12   200 200

Data used: (note stringsAsFactors = F)

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)

Benchmark:

dt <- function(df){

  setDT(df)

  df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

}



dply <- function(df){

  df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

}



df <- rbindlist(replicate(1000, df, simplify = F))



microbenchmark::microbenchmark(dt(df), dply(df))

# Unit: milliseconds

#      expr      min       lq     mean   median       uq       max neval

#    dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176  8.306448   100

#  dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181   100

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417518%2fnumber-duplicate-count%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)

df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

# A tibble: 6 x 6

#  sample event start   end     n sample_mod

#  <fct>  <dbl> <dbl> <dbl> <int> <chr>     

#1 S1         1   100   350     1 S1        

#2 S1         1    20   480     1 S1        

#3 S2         4    30    60     1 S2        

#4 S3         2   500   700     1 S3        

#5 S4         3   300   300     2 S4        

#6 S4        12   200   200     2 S4.1

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

1

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

1

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

|
show 1 more comment

up vote
2
down vote

accepted

After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)

df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

# A tibble: 6 x 6

#  sample event start   end     n sample_mod

#  <fct>  <dbl> <dbl> <dbl> <int> <chr>     

#1 S1         1   100   350     1 S1        

#2 S1         1    20   480     1 S1        

#3 S2         4    30    60     1 S2        

#4 S3         2   500   700     1 S3        

#5 S4         3   300   300     2 S4        

#6 S4        12   200   200     2 S4.1

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

1

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

1

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

|
show 1 more comment

up vote
2
down vote

accepted

After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)

df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

# A tibble: 6 x 6

#  sample event start   end     n sample_mod

#  <fct>  <dbl> <dbl> <dbl> <int> <chr>     

#1 S1         1   100   350     1 S1        

#2 S1         1    20   480     1 S1        

#3 S2         4    30    60     1 S2        

#4 S3         2   500   700     1 S3        

#5 S4         3   300   300     2 S4        

#6 S4        12   200   200     2 S4.1

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

After grouping by 'sample', get the number of distinct elements in 'event', create a logical condition with that to modify the values in 'sample' to unique values (make.unique)

df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample_mod = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

# A tibble: 6 x 6

#  sample event start   end     n sample_mod

#  <fct>  <dbl> <dbl> <dbl> <int> <chr>     

#1 S1         1   100   350     1 S1        

#2 S1         1    20   480     1 S1        

#3 S2         4    30    60     1 S2        

#4 S3         2   500   700     1 S3        

#5 S4         3   300   300     2 S4        

#6 S4        12   200   200     2 S4.1

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

edited Nov 21 at 17:33

answered Nov 21 at 17:25

akrun

391k13180253

answered Nov 21 at 17:25

akrun

391k13180253

answered Nov 21 at 17:25

akrun

391k13180253

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

1

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

1

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

|
show 1 more comment

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

1

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

1

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

But that also renames S1 as S1 and S1.1. I don't want to do this as these are not distinct events
– fugu
Nov 21 at 17:27

@fugu Please check the output. It is not renaming S1
– akrun
Nov 21 at 17:28

@fugu Are you sure that you applied the code correctly as I am not able to replicate the issue you showed
– akrun
Nov 21 at 17:29

@fugu Sorry, I can't replicate the issue. Are you loading plyr also with dplyr. Then use dplyr::mutate
– akrun
Nov 21 at 17:34

That was indeed the issue (re: reproducibility)
– fugu
Nov 21 at 17:35

|
show 1 more comment

up vote
2
down vote

library(data.table)

setDT(df)



df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

df

#    sample event start end

# 1:     S1     1   100 350

# 2:     S1     1    20 480

# 3:     S2     4    30  60

# 4:     S3     2   500 700

# 5:   S4.1     3   300 300

# 6:   S4.2    12   200 200

Data used: (note stringsAsFactors = F)

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)

Benchmark:

dt <- function(df){

  setDT(df)

  df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

}



dply <- function(df){

  df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

}



df <- rbindlist(replicate(1000, df, simplify = F))



microbenchmark::microbenchmark(dt(df), dply(df))

# Unit: milliseconds

#      expr      min       lq     mean   median       uq       max neval

#    dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176  8.306448   100

#  dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181   100

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

add a comment |

up vote
2
down vote

library(data.table)

setDT(df)



df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

df

#    sample event start end

# 1:     S1     1   100 350

# 2:     S1     1    20 480

# 3:     S2     4    30  60

# 4:     S3     2   500 700

# 5:   S4.1     3   300 300

# 6:   S4.2    12   200 200

Data used: (note stringsAsFactors = F)

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)

Benchmark:

dt <- function(df){

  setDT(df)

  df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

}



dply <- function(df){

  df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

}



df <- rbindlist(replicate(1000, df, simplify = F))



microbenchmark::microbenchmark(dt(df), dply(df))

# Unit: milliseconds

#      expr      min       lq     mean   median       uq       max neval

#    dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176  8.306448   100

#  dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181   100

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

add a comment |

up vote
2
down vote

library(data.table)

setDT(df)



df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

df

#    sample event start end

# 1:     S1     1   100 350

# 2:     S1     1    20 480

# 3:     S2     4    30  60

# 4:     S3     2   500 700

# 5:   S4.1     3   300 300

# 6:   S4.2    12   200 200

Data used: (note stringsAsFactors = F)

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)

Benchmark:

dt <- function(df){

  setDT(df)

  df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

}



dply <- function(df){

  df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

}



df <- rbindlist(replicate(1000, df, simplify = F))



microbenchmark::microbenchmark(dt(df), dply(df))

# Unit: milliseconds

#      expr      min       lq     mean   median       uq       max neval

#    dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176  8.306448   100

#  dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181   100

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

library(data.table)

setDT(df)



df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

df

#    sample event start end

# 1:     S1     1   100 350

# 2:     S1     1    20 480

# 3:     S2     4    30  60

# 4:     S3     2   500 700

# 5:   S4.1     3   300 300

# 6:   S4.2    12   200 200

Data used: (note stringsAsFactors = F)

df <- data.frame(sample = c('S1', 'S1', 'S2', 'S3', 'S4', 'S4'), event = c(1,1,4,2,3,12), start = c(100, 20, 30, 500, 300, 200), end = c(350, 480, 60, 700, 300, 200), stringsAsFactors = F)

Benchmark:

dt <- function(df){

  setDT(df)

  df[order(event)

   , sample :=  {

      rid <- rleid(event)

      if(any(rid > 1)) paste0(sample, '.', rid)

      else sample }

   , by = sample]

}



dply <- function(df){

  df %>% 

  group_by(sample) %>%

  mutate(n = n_distinct(event)) %>% 

  ungroup %>% 

  mutate(sample = case_when(n >1 ~ make.unique(as.character(sample)), 

     TRUE ~ as.character(sample)))

}



df <- rbindlist(replicate(1000, df, simplify = F))



microbenchmark::microbenchmark(dt(df), dply(df))

# Unit: milliseconds

#      expr      min       lq     mean   median       uq       max neval

#    dt(df) 1.750972 1.970664 2.332920 2.075279 2.391176  8.306448   100

#  dply(df) 5.982349 6.277939 7.046036 6.566759 7.036501 15.112181   100

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

edited Nov 21 at 17:48

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

answered Nov 21 at 17:41

IceCreamToucan

7,7001616

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl