I want to separate a column of string characters. In one column I want all capitilized words. The strings can have one or two uppercase words. Here is an example of the dataframe:
mydataframe <- data.frame(species= c("ACTINIDIACEAE Actinidia arguta",
"ANACARDIACEAE Attilaea abalak E.Martínez & Ramos",
"LEGUMINOSAE CAESALPINIOIDEAE Biancaea decapetala (Roth) O.Deg."),
trait= c(1,2,4))
I tried with separate
and the following regular expression: "\\s+(?=[A-Z]+)"
. This is not working. For the strings with more than two capitilized words it separates the first and the second capitilized words, removing the rest of the string. Here is the code:
mydataframe <- mydataframe %>%
separate(species, into = c("family", "sp"), sep ="\\s+(?=[A-Z]+)")
This is the result of the code:
family | sp | trait |
---|---|---|
ACTINIDIACEAE | Actinidia arguta | 1 |
ANACARDIACEAE | Attilaea abalak | 2 |
LEGUMINOSAE | CAESALPINOIDEAE | 4 |
I want the following format:
family | sp | trait |
---|---|---|
ACTINIDIACEAE | Actinidia arguta | 1 |
ANACARDIACEAE | Attilaea abalak | 2 |
LEGUMINOSAE CAESALPINOIDEAE | Biancaea decapetala | 4 |
Error: arguments imply differing number of rows: 3, 4
. Likely solved easily by reducing the length oftrait=
, but in general reprex questions should not error (outside of the context of the question).