Problem with subtracting value from previous row in dataframe












0















I am having problems with using dplyr to subtract a value from the previous row.



The dataframe is as follows:



graph_data_frame1

# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3


The code I am using is:



atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))


The result created by mutate is all zeroes:



 atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0


Desired Result:



 graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA


I am able to get my desired result with base R but would like to use dplyr.



Data (ungrouped):



structure(list(Trade_Date = structure(c(1512950400, 1512950400, 
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))









share|improve this question




















  • 1





    Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

    – RAB
    Nov 27 '18 at 12:11











  • Could you try it without the group_by statement, please

    – Russ Hyde
    Nov 27 '18 at 12:17











  • I took out the group_by statement and got the same result

    – ejg
    Nov 27 '18 at 12:21











  • please don't just post the dataframe. Use dput like I said. Much easier to read into R

    – RAB
    Nov 27 '18 at 12:25











  • This is my first post; how do I use dput?

    – ejg
    Nov 27 '18 at 12:26
















0















I am having problems with using dplyr to subtract a value from the previous row.



The dataframe is as follows:



graph_data_frame1

# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3


The code I am using is:



atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))


The result created by mutate is all zeroes:



 atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0


Desired Result:



 graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA


I am able to get my desired result with base R but would like to use dplyr.



Data (ungrouped):



structure(list(Trade_Date = structure(c(1512950400, 1512950400, 
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))









share|improve this question




















  • 1





    Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

    – RAB
    Nov 27 '18 at 12:11











  • Could you try it without the group_by statement, please

    – Russ Hyde
    Nov 27 '18 at 12:17











  • I took out the group_by statement and got the same result

    – ejg
    Nov 27 '18 at 12:21











  • please don't just post the dataframe. Use dput like I said. Much easier to read into R

    – RAB
    Nov 27 '18 at 12:25











  • This is my first post; how do I use dput?

    – ejg
    Nov 27 '18 at 12:26














0












0








0








I am having problems with using dplyr to subtract a value from the previous row.



The dataframe is as follows:



graph_data_frame1

# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3


The code I am using is:



atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))


The result created by mutate is all zeroes:



 atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0


Desired Result:



 graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA


I am able to get my desired result with base R but would like to use dplyr.



Data (ungrouped):



structure(list(Trade_Date = structure(c(1512950400, 1512950400, 
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))









share|improve this question
















I am having problems with using dplyr to subtract a value from the previous row.



The dataframe is as follows:



graph_data_frame1

# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3


The code I am using is:



atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))


The result created by mutate is all zeroes:



 atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0


Desired Result:



 graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA


I am able to get my desired result with base R but would like to use dplyr.



Data (ungrouped):



structure(list(Trade_Date = structure(c(1512950400, 1512950400, 
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))






r dplyr mutate






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 14:23









utubun

1,6831912




1,6831912










asked Nov 27 '18 at 12:08









ejgejg

12




12








  • 1





    Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

    – RAB
    Nov 27 '18 at 12:11











  • Could you try it without the group_by statement, please

    – Russ Hyde
    Nov 27 '18 at 12:17











  • I took out the group_by statement and got the same result

    – ejg
    Nov 27 '18 at 12:21











  • please don't just post the dataframe. Use dput like I said. Much easier to read into R

    – RAB
    Nov 27 '18 at 12:25











  • This is my first post; how do I use dput?

    – ejg
    Nov 27 '18 at 12:26














  • 1





    Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

    – RAB
    Nov 27 '18 at 12:11











  • Could you try it without the group_by statement, please

    – Russ Hyde
    Nov 27 '18 at 12:17











  • I took out the group_by statement and got the same result

    – ejg
    Nov 27 '18 at 12:21











  • please don't just post the dataframe. Use dput like I said. Much easier to read into R

    – RAB
    Nov 27 '18 at 12:25











  • This is my first post; how do I use dput?

    – ejg
    Nov 27 '18 at 12:26








1




1





Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

– RAB
Nov 27 '18 at 12:11





Please include the actual data with dput(graph_data_frame1) so we can hep you and also an example of the desired output

– RAB
Nov 27 '18 at 12:11













Could you try it without the group_by statement, please

– Russ Hyde
Nov 27 '18 at 12:17





Could you try it without the group_by statement, please

– Russ Hyde
Nov 27 '18 at 12:17













I took out the group_by statement and got the same result

– ejg
Nov 27 '18 at 12:21





I took out the group_by statement and got the same result

– ejg
Nov 27 '18 at 12:21













please don't just post the dataframe. Use dput like I said. Much easier to read into R

– RAB
Nov 27 '18 at 12:25





please don't just post the dataframe. Use dput like I said. Much easier to read into R

– RAB
Nov 27 '18 at 12:25













This is my first post; how do I use dput?

– ejg
Nov 27 '18 at 12:26





This is my first post; how do I use dput?

– ejg
Nov 27 '18 at 12:26












1 Answer
1






active

oldest

votes


















0














The main problem is with group_by() call. Looking at your data (I did ungroup(dat) on your dput() output), I can see that for each group, defined by group_by(Trade_Date) your atr variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr)) within your mutate() function it returns two values identical to corresponding values in attr, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr, if length of attr within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0.



So look what is going on when you do group your data:



library(dplyr)

dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0


And without grouping:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118


See that the resulting vector is not zeros anymore.



It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA





share|improve this answer


























  • I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

    – ejg
    Nov 28 '18 at 10:04













  • When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

    – ejg
    Nov 28 '18 at 10:06













  • Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

    – utubun
    Nov 28 '18 at 10:36











  • Thank you ungrouping produces the correct results

    – ejg
    Nov 28 '18 at 11:13











  • Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

    – ejg
    Nov 28 '18 at 12:02











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499357%2fproblem-with-subtracting-value-from-previous-row-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The main problem is with group_by() call. Looking at your data (I did ungroup(dat) on your dput() output), I can see that for each group, defined by group_by(Trade_Date) your atr variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr)) within your mutate() function it returns two values identical to corresponding values in attr, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr, if length of attr within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0.



So look what is going on when you do group your data:



library(dplyr)

dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0


And without grouping:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118


See that the resulting vector is not zeros anymore.



It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA





share|improve this answer


























  • I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

    – ejg
    Nov 28 '18 at 10:04













  • When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

    – ejg
    Nov 28 '18 at 10:06













  • Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

    – utubun
    Nov 28 '18 at 10:36











  • Thank you ungrouping produces the correct results

    – ejg
    Nov 28 '18 at 11:13











  • Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

    – ejg
    Nov 28 '18 at 12:02
















0














The main problem is with group_by() call. Looking at your data (I did ungroup(dat) on your dput() output), I can see that for each group, defined by group_by(Trade_Date) your atr variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr)) within your mutate() function it returns two values identical to corresponding values in attr, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr, if length of attr within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0.



So look what is going on when you do group your data:



library(dplyr)

dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0


And without grouping:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118


See that the resulting vector is not zeros anymore.



It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA





share|improve this answer


























  • I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

    – ejg
    Nov 28 '18 at 10:04













  • When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

    – ejg
    Nov 28 '18 at 10:06













  • Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

    – utubun
    Nov 28 '18 at 10:36











  • Thank you ungrouping produces the correct results

    – ejg
    Nov 28 '18 at 11:13











  • Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

    – ejg
    Nov 28 '18 at 12:02














0












0








0







The main problem is with group_by() call. Looking at your data (I did ungroup(dat) on your dput() output), I can see that for each group, defined by group_by(Trade_Date) your atr variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr)) within your mutate() function it returns two values identical to corresponding values in attr, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr, if length of attr within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0.



So look what is going on when you do group your data:



library(dplyr)

dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0


And without grouping:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118


See that the resulting vector is not zeros anymore.



It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA





share|improve this answer















The main problem is with group_by() call. Looking at your data (I did ungroup(dat) on your dput() output), I can see that for each group, defined by group_by(Trade_Date) your atr variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr)) within your mutate() function it returns two values identical to corresponding values in attr, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr, if length of attr within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0.



So look what is going on when you do group your data:



library(dplyr)

dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0


And without grouping:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118


See that the resulting vector is not zeros anymore.



It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:



dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))

# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 27 '18 at 13:20

























answered Nov 27 '18 at 12:54









utubunutubun

1,6831912




1,6831912













  • I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

    – ejg
    Nov 28 '18 at 10:04













  • When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

    – ejg
    Nov 28 '18 at 10:06













  • Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

    – utubun
    Nov 28 '18 at 10:36











  • Thank you ungrouping produces the correct results

    – ejg
    Nov 28 '18 at 11:13











  • Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

    – ejg
    Nov 28 '18 at 12:02



















  • I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

    – ejg
    Nov 28 '18 at 10:04













  • When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

    – ejg
    Nov 28 '18 at 10:06













  • Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

    – utubun
    Nov 28 '18 at 10:36











  • Thank you ungrouping produces the correct results

    – ejg
    Nov 28 '18 at 11:13











  • Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

    – ejg
    Nov 28 '18 at 12:02

















I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

– ejg
Nov 28 '18 at 10:04







I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.

– ejg
Nov 28 '18 at 10:04















When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

– ejg
Nov 28 '18 at 10:06







When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?

– ejg
Nov 28 '18 at 10:06















Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

– utubun
Nov 28 '18 at 10:36





Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.

– utubun
Nov 28 '18 at 10:36













Thank you ungrouping produces the correct results

– ejg
Nov 28 '18 at 11:13





Thank you ungrouping produces the correct results

– ejg
Nov 28 '18 at 11:13













Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

– ejg
Nov 28 '18 at 12:02





Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.

– ejg
Nov 28 '18 at 12:02




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499357%2fproblem-with-subtracting-value-from-previous-row-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)