I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have the duplicated strings renamed with ordered suffixes, like in the Column1.new
Column1 Column2 Column1.new
1 A 1_1
1 B 1_2
2 C 2_1
2 D 2_2
3 E 3
4 F 4
Any ideas of how to do this would be appreciated.
Cheers,
Antti
@Cão answer only with base R:
May be a little more of a workaround, but parts of this may be more useful and simpler for someone with not quite the same needs.
make.names
with theunique=T
attribute adds a dot and numbers names that are repeated:This might be enough for some folks. Here you can then grab the first entries of elements that are repeated, but not elements that are not repeated, then add a
.0
to the end.Replace the dots and remove the X
Might be good enough for you. But if you want the indexing to start at 1, grab the numbers, add one then put them back.
Like I said, more of a workaround here, but gives some options.
Let's say your data (ordered by
Column1
) is within an object calledtab
. First create a run length objectThat gives you values of
Column1
and the according number of appearences of each element. Then use that information to create the new column with unique identifiers:Not sure, if this is appropriate in your situation, but you could also just paste together
Column1
andColumn2
, to create an unique identifier...