Problem with subtracting value from previous row in dataframe
I am having problems with using dplyr
to subtract a value from the previous row.
The dataframe is as follows:
graph_data_frame1
# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3
The code I am using is:
atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))
The result created by mutate is all zeroes:
atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0
Desired Result:
graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA
I am able to get my desired result with base R but would like to use dplyr.
Data (ungrouped):
structure(list(Trade_Date = structure(c(1512950400, 1512950400,
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))
r dplyr mutate
|
show 1 more comment
I am having problems with using dplyr
to subtract a value from the previous row.
The dataframe is as follows:
graph_data_frame1
# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3
The code I am using is:
atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))
The result created by mutate is all zeroes:
atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0
Desired Result:
graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA
I am able to get my desired result with base R but would like to use dplyr.
Data (ungrouped):
structure(list(Trade_Date = structure(c(1512950400, 1512950400,
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))
r dplyr mutate
1
Please include the actual data withdput(graph_data_frame1)
so we can hep you and also an example of the desired output
– RAB
Nov 27 '18 at 12:11
Could you try it without thegroup_by
statement, please
– Russ Hyde
Nov 27 '18 at 12:17
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26
|
show 1 more comment
I am having problems with using dplyr
to subtract a value from the previous row.
The dataframe is as follows:
graph_data_frame1
# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3
The code I am using is:
atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))
The result created by mutate is all zeroes:
atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0
Desired Result:
graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA
I am able to get my desired result with base R but would like to use dplyr.
Data (ungrouped):
structure(list(Trade_Date = structure(c(1512950400, 1512950400,
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))
r dplyr mutate
I am having problems with using dplyr
to subtract a value from the previous row.
The dataframe is as follows:
graph_data_frame1
# A tibble: 13 x 5
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr
<dttm> <chr> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3
2 2017-12-11 00:00:00 S 88 3586 43.3
3 2017-12-08 00:00:00 B 176 3594 40.7
4 2017-12-08 00:00:00 S 176 3599. 40.7
5 2017-12-07 00:00:00 NA NA NA 40.3
6 2017-12-06 00:00:00 NA NA NA 41.8
7 2017-12-05 00:00:00 NA NA NA 40.3
8 2017-12-04 00:00:00 B 44 3563 39.5
9 2017-12-04 00:00:00 S 44 3569 39.5
10 2017-12-01 00:00:00 NA NA NA 42.7
11 2017-11-30 00:00:00 NA NA NA 46.2
12 2017-11-29 00:00:00 NA NA NA 47.2
13 2017-11-28 00:00:00 NA NA NA 47.3
The code I am using is:
atr_diff_df <- graph_data_frame1 %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(atr_diff = atr - lag(atr, default = first(atr)))
The result created by mutate is all zeroes:
atr_diff_df
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 0
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0
5 2017-12-07 00:00:00 NA NA NA 40.3 0
6 2017-12-06 00:00:00 NA NA NA 41.8 0
7 2017-12-05 00:00:00 NA NA NA 40.3 0
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 0
10 2017-12-01 00:00:00 NA NA NA 42.7 0
11 2017-11-30 00:00:00 NA NA NA 46.2 0
12 2017-11-29 00:00:00 NA NA NA 47.2 0
13 2017-11-28 00:00:00 NA NA NA 47.3 0
Desired Result:
graph_data_frame1
# A tibble: 13 x 6
# Groups: Trade_Date [10]
Trade_Date B_S Lots_Total Avg_Price atr atr_diff
<dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 2017-12-11 00:00:00 B 88 3591 43.3 0
2 2017-12-11 00:00:00 S 88 3586 43.3 2.61
3 2017-12-08 00:00:00 B 176 3594 40.7 0
4 2017-12-08 00:00:00 S 176 3599. 40.7 0.382
5 2017-12-07 00:00:00 NA NA NA 40.3 -1.53
6 2017-12-06 00:00:00 NA NA NA 41.8 1.55
7 2017-12-05 00:00:00 NA NA NA 40.3 0.754
8 2017-12-04 00:00:00 B 44 3563 39.5 0
9 2017-12-04 00:00:00 S 44 3569 39.5 -3.21
10 2017-12-01 00:00:00 NA NA NA 42.7 -3.47
11 2017-11-30 00:00:00 NA NA NA 46.2 -0.971
12 2017-11-29 00:00:00 NA NA NA 47.2 -0.118
13 2017-11-28 00:00:00 NA NA NA 47.3 NA
I am able to get my desired result with base R but would like to use dplyr.
Data (ungrouped):
structure(list(Trade_Date = structure(c(1512950400, 1512950400,
1512691200, 1512691200, 1512604800, 1512518400, 1512432000, 1512345600,
1512345600, 1512086400, 1.512e+09, 1511913600, 1511827200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), B_S = c("B", "S", "B", "S", NA, NA,
NA, "B", "S", NA, NA, NA, NA), Lots_Total = c(88, 88, 176, 176,
NA, NA, NA, 44, 44, NA, NA, NA, NA), Avg_Price = c(3591, 3586,
3594, 3598.85714285714, NA, NA, NA, 3563, 3569, NA, NA, NA, NA
), atr = c(43.2857142857143, 43.2857142857143, 40.6734693877551,
40.6734693877551, 40.2915451895044, 41.8213244481466, 40.2754209555542,
39.5217893904751, 39.5217893904751, 42.7329623346929, 46.1996820011654,
47.1711560009989, 47.2895622865705)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -13L))
r dplyr mutate
r dplyr mutate
edited Nov 27 '18 at 14:23
utubun
1,6831912
1,6831912
asked Nov 27 '18 at 12:08
ejgejg
12
12
1
Please include the actual data withdput(graph_data_frame1)
so we can hep you and also an example of the desired output
– RAB
Nov 27 '18 at 12:11
Could you try it without thegroup_by
statement, please
– Russ Hyde
Nov 27 '18 at 12:17
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26
|
show 1 more comment
1
Please include the actual data withdput(graph_data_frame1)
so we can hep you and also an example of the desired output
– RAB
Nov 27 '18 at 12:11
Could you try it without thegroup_by
statement, please
– Russ Hyde
Nov 27 '18 at 12:17
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26
1
1
Please include the actual data with
dput(graph_data_frame1)
so we can hep you and also an example of the desired output– RAB
Nov 27 '18 at 12:11
Please include the actual data with
dput(graph_data_frame1)
so we can hep you and also an example of the desired output– RAB
Nov 27 '18 at 12:11
Could you try it without the
group_by
statement, please– Russ Hyde
Nov 27 '18 at 12:17
Could you try it without the
group_by
statement, please– Russ Hyde
Nov 27 '18 at 12:17
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26
|
show 1 more comment
1 Answer
1
active
oldest
votes
The main problem is with group_by()
call. Looking at your data (I did ungroup(dat)
on your dput()
output), I can see that for each group, defined by group_by(Trade_Date)
your atr
variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr))
within your mutate()
function it returns two values identical to corresponding values in attr
, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr
, if length of attr
within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0
.
So look what is going on when you do group your data:
library(dplyr)
dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0
And without grouping:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118
See that the resulting vector is not zeros anymore.
It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499357%2fproblem-with-subtracting-value-from-previous-row-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The main problem is with group_by()
call. Looking at your data (I did ungroup(dat)
on your dput()
output), I can see that for each group, defined by group_by(Trade_Date)
your atr
variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr))
within your mutate()
function it returns two values identical to corresponding values in attr
, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr
, if length of attr
within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0
.
So look what is going on when you do group your data:
library(dplyr)
dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0
And without grouping:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118
See that the resulting vector is not zeros anymore.
It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
|
show 1 more comment
The main problem is with group_by()
call. Looking at your data (I did ungroup(dat)
on your dput()
output), I can see that for each group, defined by group_by(Trade_Date)
your atr
variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr))
within your mutate()
function it returns two values identical to corresponding values in attr
, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr
, if length of attr
within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0
.
So look what is going on when you do group your data:
library(dplyr)
dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0
And without grouping:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118
See that the resulting vector is not zeros anymore.
It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
|
show 1 more comment
The main problem is with group_by()
call. Looking at your data (I did ungroup(dat)
on your dput()
output), I can see that for each group, defined by group_by(Trade_Date)
your atr
variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr))
within your mutate()
function it returns two values identical to corresponding values in attr
, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr
, if length of attr
within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0
.
So look what is going on when you do group your data:
library(dplyr)
dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0
And without grouping:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118
See that the resulting vector is not zeros anymore.
It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA
The main problem is with group_by()
call. Looking at your data (I did ungroup(dat)
on your dput()
output), I can see that for each group, defined by group_by(Trade_Date)
your atr
variable's length is either 1 or 2. In former case, both values are identical. So when you call lag(atr, default = first(atr))
within your mutate()
function it returns two values identical to corresponding values in attr
, within the defined group, when there are 2 values in group, or just one value identical to the first (and only one) value of attr
, if length of attr
within the defined group is equal to 1. Which means, that when you do the subtraction you always subtract identical values, which obviously produces 0
.
So look what is going on when you do group your data:
library(dplyr)
dat %>%
group_by(Trade_Date) %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# # Groups: Trade_Date [10]
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 40.7 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 41.8 0
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 40.3 0
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 39.5 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 42.7 0
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 46.2 0
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 47.2 0
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.3 0
And without grouping:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lag(atr, default = first(atr)))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 0
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 -2.61
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -0.382
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.53
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 -1.55
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 -0.754
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 0
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 3.21
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 3.47
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 0.971
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 0.118
See that the resulting vector is not zeros anymore.
It's absolutely unclear to me, what kind of transformation do you want to perform on your data, and why you do want to do it this way. But you can obtain your desired output this way:
dat %>%
arrange(desc(Trade_Date)) %>%
mutate(first_term = atr, second_term = lag(atr, default = first(atr)),
difference = atr - lead(atr))
# # A tibble: 13 x 8
# Trade_Date B_S Lots_Total Avg_Price atr first_term second_term difference
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-12-11 00:00:00 B 88 3591 43.3 43.3 43.3 0
# 2 2017-12-11 00:00:00 S 88 3586 43.3 43.3 43.3 2.61
# 3 2017-12-08 00:00:00 B 176 3594 40.7 40.7 43.3 0
# 4 2017-12-08 00:00:00 S 176 3599. 40.7 40.7 40.7 0.382
# 5 2017-12-07 00:00:00 NA NA NA 40.3 40.3 40.7 -1.53
# 6 2017-12-06 00:00:00 NA NA NA 41.8 41.8 40.3 1.55
# 7 2017-12-05 00:00:00 NA NA NA 40.3 40.3 41.8 0.754
# 8 2017-12-04 00:00:00 B 44 3563 39.5 39.5 40.3 0
# 9 2017-12-04 00:00:00 S 44 3569 39.5 39.5 39.5 -3.21
#10 2017-12-01 00:00:00 NA NA NA 42.7 42.7 39.5 -3.47
#11 2017-11-30 00:00:00 NA NA NA 46.2 46.2 42.7 -0.971
#12 2017-11-29 00:00:00 NA NA NA 47.2 47.2 46.2 -0.118
#13 2017-11-28 00:00:00 NA NA NA 47.3 47.3 47.2 NA
edited Nov 27 '18 at 13:20
answered Nov 27 '18 at 12:54
utubunutubun
1,6831912
1,6831912
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
|
show 1 more comment
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
I see the problem with grouping the data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros.
– ejg
Nov 28 '18 at 10:04
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows. Do I need to ungroup the data before running the statements?
– ejg
Nov 28 '18 at 10:06
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Sure, you must ungroup it first, otherwise it performs all the operations inside the group, that why it produce such output, as I assume. Let me know if it works.
– utubun
Nov 28 '18 at 10:36
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Thank you ungrouping produces the correct results
– ejg
Nov 28 '18 at 11:13
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
Actually when I checked the results again ungroup doesn't help with my data. However, when I run the first solution: dat %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lag(atr, default = first(atr))) the difference column is all zeros. When I run the second solution: graph_data_frame1 %>% arrange(desc(Trade_Date)) %>% mutate(first_term = atr, second_term = lag(atr, default = first(atr)), difference = atr - lead(atr)) the difference column has 0 for 3 rows and NA for other rows.
– ejg
Nov 28 '18 at 12:02
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499357%2fproblem-with-subtracting-value-from-previous-row-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please include the actual data with
dput(graph_data_frame1)
so we can hep you and also an example of the desired output– RAB
Nov 27 '18 at 12:11
Could you try it without the
group_by
statement, please– Russ Hyde
Nov 27 '18 at 12:17
I took out the group_by statement and got the same result
– ejg
Nov 27 '18 at 12:21
please don't just post the dataframe. Use dput like I said. Much easier to read into R
– RAB
Nov 27 '18 at 12:25
This is my first post; how do I use dput?
– ejg
Nov 27 '18 at 12:26