Dictionary of Headers in R
Is there a way that I can keep a separate list of headers that basically acts like a dictionary that lists the descriptive header and then an easier to use short name for each header that I could call back and forth without needing to maintain the correct order of the columns? I'm not great with this but here is an example of what I was thinking:
Original Data Set
|---------------------|------------------|------------------|
| Descriptive A | Descriptive B | Descriptive C |
|---------------------|------------------|------------------|
| 12 | 34 | 25 |
|---------------------|------------------|------------------|
Dictionary of Headers
|---------------------|------------------|
| long_name | short_name |
|---------------------|------------------|
| Descriptive A | A |
|---------------------|------------------|
| Descriptive B | B |
|---------------------|------------------|
| Descriptive C | C |
|---------------------|------------------|
Then I could have a piece of code that calls on the short_name column of the dictionary to replace the long_name title of the headers with the short_name and then I would not have to rely on the position of headers.
I'm not sure if that is possible but I have a table with 180 columns (that's growing) and they all have descriptive names that don't translate well into R, so I thought this might be a solution that I could continue to add to as the data set grows.
r
add a comment |
Is there a way that I can keep a separate list of headers that basically acts like a dictionary that lists the descriptive header and then an easier to use short name for each header that I could call back and forth without needing to maintain the correct order of the columns? I'm not great with this but here is an example of what I was thinking:
Original Data Set
|---------------------|------------------|------------------|
| Descriptive A | Descriptive B | Descriptive C |
|---------------------|------------------|------------------|
| 12 | 34 | 25 |
|---------------------|------------------|------------------|
Dictionary of Headers
|---------------------|------------------|
| long_name | short_name |
|---------------------|------------------|
| Descriptive A | A |
|---------------------|------------------|
| Descriptive B | B |
|---------------------|------------------|
| Descriptive C | C |
|---------------------|------------------|
Then I could have a piece of code that calls on the short_name column of the dictionary to replace the long_name title of the headers with the short_name and then I would not have to rely on the position of headers.
I'm not sure if that is possible but I have a table with 180 columns (that's growing) and they all have descriptive names that don't translate well into R, so I thought this might be a solution that I could continue to add to as the data set grows.
r
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite$.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.
– r2evans
Nov 27 '18 at 15:49
add a comment |
Is there a way that I can keep a separate list of headers that basically acts like a dictionary that lists the descriptive header and then an easier to use short name for each header that I could call back and forth without needing to maintain the correct order of the columns? I'm not great with this but here is an example of what I was thinking:
Original Data Set
|---------------------|------------------|------------------|
| Descriptive A | Descriptive B | Descriptive C |
|---------------------|------------------|------------------|
| 12 | 34 | 25 |
|---------------------|------------------|------------------|
Dictionary of Headers
|---------------------|------------------|
| long_name | short_name |
|---------------------|------------------|
| Descriptive A | A |
|---------------------|------------------|
| Descriptive B | B |
|---------------------|------------------|
| Descriptive C | C |
|---------------------|------------------|
Then I could have a piece of code that calls on the short_name column of the dictionary to replace the long_name title of the headers with the short_name and then I would not have to rely on the position of headers.
I'm not sure if that is possible but I have a table with 180 columns (that's growing) and they all have descriptive names that don't translate well into R, so I thought this might be a solution that I could continue to add to as the data set grows.
r
Is there a way that I can keep a separate list of headers that basically acts like a dictionary that lists the descriptive header and then an easier to use short name for each header that I could call back and forth without needing to maintain the correct order of the columns? I'm not great with this but here is an example of what I was thinking:
Original Data Set
|---------------------|------------------|------------------|
| Descriptive A | Descriptive B | Descriptive C |
|---------------------|------------------|------------------|
| 12 | 34 | 25 |
|---------------------|------------------|------------------|
Dictionary of Headers
|---------------------|------------------|
| long_name | short_name |
|---------------------|------------------|
| Descriptive A | A |
|---------------------|------------------|
| Descriptive B | B |
|---------------------|------------------|
| Descriptive C | C |
|---------------------|------------------|
Then I could have a piece of code that calls on the short_name column of the dictionary to replace the long_name title of the headers with the short_name and then I would not have to rely on the position of headers.
I'm not sure if that is possible but I have a table with 180 columns (that's growing) and they all have descriptive names that don't translate well into R, so I thought this might be a solution that I could continue to add to as the data set grows.
r
r
asked Nov 27 '18 at 15:39
thejuanaldthejuanald
83
83
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite$.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.
– r2evans
Nov 27 '18 at 15:49
add a comment |
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite$.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.
– r2evans
Nov 27 '18 at 15:49
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the
$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite $.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.– r2evans
Nov 27 '18 at 15:49
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the
$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite $.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.– r2evans
Nov 27 '18 at 15:49
add a comment |
4 Answers
4
active
oldest
votes
Yes, you just need the dictionary (or codebook) as a separate data frame (can be read in from, say, a .csv file). Let's say you have a data frame like this:
df <- data.frame(matrix(rnorm(1000), ncol = 100))
names(df) <- paste0("a very long unfortunate name to be replaced_", 1:ncol(df))
You can create the codebook like this:
codebook <- data.frame(long_name = names(df), short_name = paste0("X_", 1:ncol(df)),
stringsAsFactors = F)
long_name short_name
1 a very long unfortunate name to be replaced_1 X_1
2 a very long unfortunate name to be replaced_2 X_2
3 a very long unfortunate name to be replaced_3 X_3
4 a very long unfortunate name to be replaced_4 X_4
5 a very long unfortunate name to be replaced_5 X_5
6 a very long unfortunate name to be replaced_6 X_6
Let's then change names of df
using the "short names"
names(df) <- codebook[ ,2]
For fun, let's randomise the rows of codebook
to show you cna still use it:
codebook <- codebook[sample(nrow(codebook)), ]
Finally you can use match()
to retrieve the original long names:
codebook$long_name[match(names(df), codebook$short_name)]
[1] a very long unfortunate name to be replaced_1 a very long unfortunate name to be replaced_2
[3] a very long unfortunate name to be replaced_3 a very long unfortunate name to be replaced_4
[5] a very long unfortunate name to be replaced_5 a very long unfortunate name to be replaced_6
[7] a very long unfortunate name to be replaced_7 a very long unfortunate name to be replaced_8
[9] a very long unfortunate name to be replaced_9 a very long unfortunate name to be replaced_10
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
add a comment |
Like I commented, I don't think there's a way to do aliasing in place, but for calculation you can do something like:
df1 <- data.frame(
"Descriptive A" = 12,
"Descriptive B" = 34,
"Descriptive C" = 25,
check.names = FALSE
)
The "aliasing" object can be a frame, but since all you're doing is assigning a name to a name, it is efficiently handled by a named character
vector:
df1_aliases <- c(
"B" = "Descriptive B",
"A" = "Descriptive A",
"C" = "Descriptive C"
)
Your aliases steps would be an intentional pre-/post-translation of names:
names(df1) <- names(df1_aliases)[ match(names(df1), df1_aliases) ]
df1
# A B C
# 1 12 34 25
### do stuff here ###
names(df1) <- df1_aliases[ match(names(df1), names(df1_aliases)) ]
df1
# Descriptive A Descriptive B Descriptive C
# 1 12 34 25
It might be feasible to overwrite $.data.frame
and $<-.data.frame
for basic dollar-sign operations, but you'd also need to overwrite [.data.frame
, [[.data.frame
, and perhaps even with
(depending on your frame-access habits) ... and those rewritten functions might not work from all other functions you are using (depending on their function/namespace search path).
Because of the complexities of tracking down everything that touches the frame, I strongly suggest you make it as explicit as possible: have only one set of names each column is known by (whether the original or your aliases), never both simultaneously. This means the translate/untranslate steps are explicit and anything that works on the frame will work unambiguously.
add a comment |
You could give the names
names, and then subset the names
before subsetting the data.frame
.
For example, using the iris data:
short_names <- names(iris)
names(short_names) <- c("sl","sw","pl","pw","sp")
attributes(iris)$names <- short_names
head(iris[names(iris)[c("sl","sp")]])
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
This is really elegant. I thought of using a named vector andmatch
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.
– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
add a comment |
Using dict
and DF
defined reproducibly in the Note at the end run the for
loop
shown and then we can use A
, B
and C
without quotes as column names.
for(i in 1:nrow(dict)) assign(dict$short_name[i], dict$long_name[i])
# test - use DF[B] in place of DF["Descriptive B"]
DF[B]
## Descriptive B
## 1 34
As shown in the above test it is straight forward when using conventional subscripting. If you want to use nonstandard evaluation such as in dplyr then you will need to use rlang in the usual way:
library(dplyr)
DF %>% mutate(D = !!sym(B))
## Descriptive A Descriptive B Descriptive C D
## 1 12 34 25 34
Note
We assume this input:
Lines1 <- "
long_name | short_name
Descriptive A | A
Descriptive B | B
Descriptive C | C"
dict <- read.table(text = Lines1, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE)
Lines2 <- "
Descriptive A | Descriptive B | Descriptive C
12 | 34 | 25"
DF <- read.table(text = Lines2, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE, check.names = FALSE)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53503133%2fdictionary-of-headers-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, you just need the dictionary (or codebook) as a separate data frame (can be read in from, say, a .csv file). Let's say you have a data frame like this:
df <- data.frame(matrix(rnorm(1000), ncol = 100))
names(df) <- paste0("a very long unfortunate name to be replaced_", 1:ncol(df))
You can create the codebook like this:
codebook <- data.frame(long_name = names(df), short_name = paste0("X_", 1:ncol(df)),
stringsAsFactors = F)
long_name short_name
1 a very long unfortunate name to be replaced_1 X_1
2 a very long unfortunate name to be replaced_2 X_2
3 a very long unfortunate name to be replaced_3 X_3
4 a very long unfortunate name to be replaced_4 X_4
5 a very long unfortunate name to be replaced_5 X_5
6 a very long unfortunate name to be replaced_6 X_6
Let's then change names of df
using the "short names"
names(df) <- codebook[ ,2]
For fun, let's randomise the rows of codebook
to show you cna still use it:
codebook <- codebook[sample(nrow(codebook)), ]
Finally you can use match()
to retrieve the original long names:
codebook$long_name[match(names(df), codebook$short_name)]
[1] a very long unfortunate name to be replaced_1 a very long unfortunate name to be replaced_2
[3] a very long unfortunate name to be replaced_3 a very long unfortunate name to be replaced_4
[5] a very long unfortunate name to be replaced_5 a very long unfortunate name to be replaced_6
[7] a very long unfortunate name to be replaced_7 a very long unfortunate name to be replaced_8
[9] a very long unfortunate name to be replaced_9 a very long unfortunate name to be replaced_10
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
add a comment |
Yes, you just need the dictionary (or codebook) as a separate data frame (can be read in from, say, a .csv file). Let's say you have a data frame like this:
df <- data.frame(matrix(rnorm(1000), ncol = 100))
names(df) <- paste0("a very long unfortunate name to be replaced_", 1:ncol(df))
You can create the codebook like this:
codebook <- data.frame(long_name = names(df), short_name = paste0("X_", 1:ncol(df)),
stringsAsFactors = F)
long_name short_name
1 a very long unfortunate name to be replaced_1 X_1
2 a very long unfortunate name to be replaced_2 X_2
3 a very long unfortunate name to be replaced_3 X_3
4 a very long unfortunate name to be replaced_4 X_4
5 a very long unfortunate name to be replaced_5 X_5
6 a very long unfortunate name to be replaced_6 X_6
Let's then change names of df
using the "short names"
names(df) <- codebook[ ,2]
For fun, let's randomise the rows of codebook
to show you cna still use it:
codebook <- codebook[sample(nrow(codebook)), ]
Finally you can use match()
to retrieve the original long names:
codebook$long_name[match(names(df), codebook$short_name)]
[1] a very long unfortunate name to be replaced_1 a very long unfortunate name to be replaced_2
[3] a very long unfortunate name to be replaced_3 a very long unfortunate name to be replaced_4
[5] a very long unfortunate name to be replaced_5 a very long unfortunate name to be replaced_6
[7] a very long unfortunate name to be replaced_7 a very long unfortunate name to be replaced_8
[9] a very long unfortunate name to be replaced_9 a very long unfortunate name to be replaced_10
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
add a comment |
Yes, you just need the dictionary (or codebook) as a separate data frame (can be read in from, say, a .csv file). Let's say you have a data frame like this:
df <- data.frame(matrix(rnorm(1000), ncol = 100))
names(df) <- paste0("a very long unfortunate name to be replaced_", 1:ncol(df))
You can create the codebook like this:
codebook <- data.frame(long_name = names(df), short_name = paste0("X_", 1:ncol(df)),
stringsAsFactors = F)
long_name short_name
1 a very long unfortunate name to be replaced_1 X_1
2 a very long unfortunate name to be replaced_2 X_2
3 a very long unfortunate name to be replaced_3 X_3
4 a very long unfortunate name to be replaced_4 X_4
5 a very long unfortunate name to be replaced_5 X_5
6 a very long unfortunate name to be replaced_6 X_6
Let's then change names of df
using the "short names"
names(df) <- codebook[ ,2]
For fun, let's randomise the rows of codebook
to show you cna still use it:
codebook <- codebook[sample(nrow(codebook)), ]
Finally you can use match()
to retrieve the original long names:
codebook$long_name[match(names(df), codebook$short_name)]
[1] a very long unfortunate name to be replaced_1 a very long unfortunate name to be replaced_2
[3] a very long unfortunate name to be replaced_3 a very long unfortunate name to be replaced_4
[5] a very long unfortunate name to be replaced_5 a very long unfortunate name to be replaced_6
[7] a very long unfortunate name to be replaced_7 a very long unfortunate name to be replaced_8
[9] a very long unfortunate name to be replaced_9 a very long unfortunate name to be replaced_10
Yes, you just need the dictionary (or codebook) as a separate data frame (can be read in from, say, a .csv file). Let's say you have a data frame like this:
df <- data.frame(matrix(rnorm(1000), ncol = 100))
names(df) <- paste0("a very long unfortunate name to be replaced_", 1:ncol(df))
You can create the codebook like this:
codebook <- data.frame(long_name = names(df), short_name = paste0("X_", 1:ncol(df)),
stringsAsFactors = F)
long_name short_name
1 a very long unfortunate name to be replaced_1 X_1
2 a very long unfortunate name to be replaced_2 X_2
3 a very long unfortunate name to be replaced_3 X_3
4 a very long unfortunate name to be replaced_4 X_4
5 a very long unfortunate name to be replaced_5 X_5
6 a very long unfortunate name to be replaced_6 X_6
Let's then change names of df
using the "short names"
names(df) <- codebook[ ,2]
For fun, let's randomise the rows of codebook
to show you cna still use it:
codebook <- codebook[sample(nrow(codebook)), ]
Finally you can use match()
to retrieve the original long names:
codebook$long_name[match(names(df), codebook$short_name)]
[1] a very long unfortunate name to be replaced_1 a very long unfortunate name to be replaced_2
[3] a very long unfortunate name to be replaced_3 a very long unfortunate name to be replaced_4
[5] a very long unfortunate name to be replaced_5 a very long unfortunate name to be replaced_6
[7] a very long unfortunate name to be replaced_7 a very long unfortunate name to be replaced_8
[9] a very long unfortunate name to be replaced_9 a very long unfortunate name to be replaced_10
answered Nov 27 '18 at 16:01
Milan ValášekMilan Valášek
36319
36319
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
add a comment |
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
Thank you that worked perfectly!
– thejuanald
Nov 27 '18 at 16:20
add a comment |
Like I commented, I don't think there's a way to do aliasing in place, but for calculation you can do something like:
df1 <- data.frame(
"Descriptive A" = 12,
"Descriptive B" = 34,
"Descriptive C" = 25,
check.names = FALSE
)
The "aliasing" object can be a frame, but since all you're doing is assigning a name to a name, it is efficiently handled by a named character
vector:
df1_aliases <- c(
"B" = "Descriptive B",
"A" = "Descriptive A",
"C" = "Descriptive C"
)
Your aliases steps would be an intentional pre-/post-translation of names:
names(df1) <- names(df1_aliases)[ match(names(df1), df1_aliases) ]
df1
# A B C
# 1 12 34 25
### do stuff here ###
names(df1) <- df1_aliases[ match(names(df1), names(df1_aliases)) ]
df1
# Descriptive A Descriptive B Descriptive C
# 1 12 34 25
It might be feasible to overwrite $.data.frame
and $<-.data.frame
for basic dollar-sign operations, but you'd also need to overwrite [.data.frame
, [[.data.frame
, and perhaps even with
(depending on your frame-access habits) ... and those rewritten functions might not work from all other functions you are using (depending on their function/namespace search path).
Because of the complexities of tracking down everything that touches the frame, I strongly suggest you make it as explicit as possible: have only one set of names each column is known by (whether the original or your aliases), never both simultaneously. This means the translate/untranslate steps are explicit and anything that works on the frame will work unambiguously.
add a comment |
Like I commented, I don't think there's a way to do aliasing in place, but for calculation you can do something like:
df1 <- data.frame(
"Descriptive A" = 12,
"Descriptive B" = 34,
"Descriptive C" = 25,
check.names = FALSE
)
The "aliasing" object can be a frame, but since all you're doing is assigning a name to a name, it is efficiently handled by a named character
vector:
df1_aliases <- c(
"B" = "Descriptive B",
"A" = "Descriptive A",
"C" = "Descriptive C"
)
Your aliases steps would be an intentional pre-/post-translation of names:
names(df1) <- names(df1_aliases)[ match(names(df1), df1_aliases) ]
df1
# A B C
# 1 12 34 25
### do stuff here ###
names(df1) <- df1_aliases[ match(names(df1), names(df1_aliases)) ]
df1
# Descriptive A Descriptive B Descriptive C
# 1 12 34 25
It might be feasible to overwrite $.data.frame
and $<-.data.frame
for basic dollar-sign operations, but you'd also need to overwrite [.data.frame
, [[.data.frame
, and perhaps even with
(depending on your frame-access habits) ... and those rewritten functions might not work from all other functions you are using (depending on their function/namespace search path).
Because of the complexities of tracking down everything that touches the frame, I strongly suggest you make it as explicit as possible: have only one set of names each column is known by (whether the original or your aliases), never both simultaneously. This means the translate/untranslate steps are explicit and anything that works on the frame will work unambiguously.
add a comment |
Like I commented, I don't think there's a way to do aliasing in place, but for calculation you can do something like:
df1 <- data.frame(
"Descriptive A" = 12,
"Descriptive B" = 34,
"Descriptive C" = 25,
check.names = FALSE
)
The "aliasing" object can be a frame, but since all you're doing is assigning a name to a name, it is efficiently handled by a named character
vector:
df1_aliases <- c(
"B" = "Descriptive B",
"A" = "Descriptive A",
"C" = "Descriptive C"
)
Your aliases steps would be an intentional pre-/post-translation of names:
names(df1) <- names(df1_aliases)[ match(names(df1), df1_aliases) ]
df1
# A B C
# 1 12 34 25
### do stuff here ###
names(df1) <- df1_aliases[ match(names(df1), names(df1_aliases)) ]
df1
# Descriptive A Descriptive B Descriptive C
# 1 12 34 25
It might be feasible to overwrite $.data.frame
and $<-.data.frame
for basic dollar-sign operations, but you'd also need to overwrite [.data.frame
, [[.data.frame
, and perhaps even with
(depending on your frame-access habits) ... and those rewritten functions might not work from all other functions you are using (depending on their function/namespace search path).
Because of the complexities of tracking down everything that touches the frame, I strongly suggest you make it as explicit as possible: have only one set of names each column is known by (whether the original or your aliases), never both simultaneously. This means the translate/untranslate steps are explicit and anything that works on the frame will work unambiguously.
Like I commented, I don't think there's a way to do aliasing in place, but for calculation you can do something like:
df1 <- data.frame(
"Descriptive A" = 12,
"Descriptive B" = 34,
"Descriptive C" = 25,
check.names = FALSE
)
The "aliasing" object can be a frame, but since all you're doing is assigning a name to a name, it is efficiently handled by a named character
vector:
df1_aliases <- c(
"B" = "Descriptive B",
"A" = "Descriptive A",
"C" = "Descriptive C"
)
Your aliases steps would be an intentional pre-/post-translation of names:
names(df1) <- names(df1_aliases)[ match(names(df1), df1_aliases) ]
df1
# A B C
# 1 12 34 25
### do stuff here ###
names(df1) <- df1_aliases[ match(names(df1), names(df1_aliases)) ]
df1
# Descriptive A Descriptive B Descriptive C
# 1 12 34 25
It might be feasible to overwrite $.data.frame
and $<-.data.frame
for basic dollar-sign operations, but you'd also need to overwrite [.data.frame
, [[.data.frame
, and perhaps even with
(depending on your frame-access habits) ... and those rewritten functions might not work from all other functions you are using (depending on their function/namespace search path).
Because of the complexities of tracking down everything that touches the frame, I strongly suggest you make it as explicit as possible: have only one set of names each column is known by (whether the original or your aliases), never both simultaneously. This means the translate/untranslate steps are explicit and anything that works on the frame will work unambiguously.
answered Nov 27 '18 at 16:01
r2evansr2evans
27.6k33159
27.6k33159
add a comment |
add a comment |
You could give the names
names, and then subset the names
before subsetting the data.frame
.
For example, using the iris data:
short_names <- names(iris)
names(short_names) <- c("sl","sw","pl","pw","sp")
attributes(iris)$names <- short_names
head(iris[names(iris)[c("sl","sp")]])
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
This is really elegant. I thought of using a named vector andmatch
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.
– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
add a comment |
You could give the names
names, and then subset the names
before subsetting the data.frame
.
For example, using the iris data:
short_names <- names(iris)
names(short_names) <- c("sl","sw","pl","pw","sp")
attributes(iris)$names <- short_names
head(iris[names(iris)[c("sl","sp")]])
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
This is really elegant. I thought of using a named vector andmatch
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.
– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
add a comment |
You could give the names
names, and then subset the names
before subsetting the data.frame
.
For example, using the iris data:
short_names <- names(iris)
names(short_names) <- c("sl","sw","pl","pw","sp")
attributes(iris)$names <- short_names
head(iris[names(iris)[c("sl","sp")]])
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
You could give the names
names, and then subset the names
before subsetting the data.frame
.
For example, using the iris data:
short_names <- names(iris)
names(short_names) <- c("sl","sw","pl","pw","sp")
attributes(iris)$names <- short_names
head(iris[names(iris)[c("sl","sp")]])
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
answered Nov 27 '18 at 16:22
JamesJames
51.5k9118165
51.5k9118165
This is really elegant. I thought of using a named vector andmatch
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.
– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
add a comment |
This is really elegant. I thought of using a named vector andmatch
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.
– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
This is really elegant. I thought of using a named vector and
match
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.– 42-
Nov 27 '18 at 16:30
This is really elegant. I thought of using a named vector and
match
as illustrated by another answeer, but this is elegant hacking into the structure of an object. I'd give 10 upvotes if I could. It's motivating me to investigate how the bounty system works.– 42-
Nov 27 '18 at 16:30
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
I like this answer a lot as well. Thank you!
– thejuanald
Nov 28 '18 at 16:48
add a comment |
Using dict
and DF
defined reproducibly in the Note at the end run the for
loop
shown and then we can use A
, B
and C
without quotes as column names.
for(i in 1:nrow(dict)) assign(dict$short_name[i], dict$long_name[i])
# test - use DF[B] in place of DF["Descriptive B"]
DF[B]
## Descriptive B
## 1 34
As shown in the above test it is straight forward when using conventional subscripting. If you want to use nonstandard evaluation such as in dplyr then you will need to use rlang in the usual way:
library(dplyr)
DF %>% mutate(D = !!sym(B))
## Descriptive A Descriptive B Descriptive C D
## 1 12 34 25 34
Note
We assume this input:
Lines1 <- "
long_name | short_name
Descriptive A | A
Descriptive B | B
Descriptive C | C"
dict <- read.table(text = Lines1, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE)
Lines2 <- "
Descriptive A | Descriptive B | Descriptive C
12 | 34 | 25"
DF <- read.table(text = Lines2, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE, check.names = FALSE)
add a comment |
Using dict
and DF
defined reproducibly in the Note at the end run the for
loop
shown and then we can use A
, B
and C
without quotes as column names.
for(i in 1:nrow(dict)) assign(dict$short_name[i], dict$long_name[i])
# test - use DF[B] in place of DF["Descriptive B"]
DF[B]
## Descriptive B
## 1 34
As shown in the above test it is straight forward when using conventional subscripting. If you want to use nonstandard evaluation such as in dplyr then you will need to use rlang in the usual way:
library(dplyr)
DF %>% mutate(D = !!sym(B))
## Descriptive A Descriptive B Descriptive C D
## 1 12 34 25 34
Note
We assume this input:
Lines1 <- "
long_name | short_name
Descriptive A | A
Descriptive B | B
Descriptive C | C"
dict <- read.table(text = Lines1, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE)
Lines2 <- "
Descriptive A | Descriptive B | Descriptive C
12 | 34 | 25"
DF <- read.table(text = Lines2, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE, check.names = FALSE)
add a comment |
Using dict
and DF
defined reproducibly in the Note at the end run the for
loop
shown and then we can use A
, B
and C
without quotes as column names.
for(i in 1:nrow(dict)) assign(dict$short_name[i], dict$long_name[i])
# test - use DF[B] in place of DF["Descriptive B"]
DF[B]
## Descriptive B
## 1 34
As shown in the above test it is straight forward when using conventional subscripting. If you want to use nonstandard evaluation such as in dplyr then you will need to use rlang in the usual way:
library(dplyr)
DF %>% mutate(D = !!sym(B))
## Descriptive A Descriptive B Descriptive C D
## 1 12 34 25 34
Note
We assume this input:
Lines1 <- "
long_name | short_name
Descriptive A | A
Descriptive B | B
Descriptive C | C"
dict <- read.table(text = Lines1, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE)
Lines2 <- "
Descriptive A | Descriptive B | Descriptive C
12 | 34 | 25"
DF <- read.table(text = Lines2, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE, check.names = FALSE)
Using dict
and DF
defined reproducibly in the Note at the end run the for
loop
shown and then we can use A
, B
and C
without quotes as column names.
for(i in 1:nrow(dict)) assign(dict$short_name[i], dict$long_name[i])
# test - use DF[B] in place of DF["Descriptive B"]
DF[B]
## Descriptive B
## 1 34
As shown in the above test it is straight forward when using conventional subscripting. If you want to use nonstandard evaluation such as in dplyr then you will need to use rlang in the usual way:
library(dplyr)
DF %>% mutate(D = !!sym(B))
## Descriptive A Descriptive B Descriptive C D
## 1 12 34 25 34
Note
We assume this input:
Lines1 <- "
long_name | short_name
Descriptive A | A
Descriptive B | B
Descriptive C | C"
dict <- read.table(text = Lines1, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE)
Lines2 <- "
Descriptive A | Descriptive B | Descriptive C
12 | 34 | 25"
DF <- read.table(text = Lines2, header = TRUE, sep = "|", as.is = TRUE,
strip.white = TRUE, check.names = FALSE)
edited Nov 27 '18 at 16:18
answered Nov 27 '18 at 16:04
G. GrothendieckG. Grothendieck
151k10134239
151k10134239
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53503133%2fdictionary-of-headers-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I do not believe there is functionality in R to automatically allow aliases of column names ... in place, that is. You can always rename the columns pre-calc and then rename them back later. The default functionality of the
$
operator does allow for partial matches, but I believe they always match (unambiguously) from the left, not from the right as your example portrays. You might try to rewrite$.data.frame
so that either one would work, but you risk lots of corner-cases and unintended consequences when messing with that.– r2evans
Nov 27 '18 at 15:49