Use string as random seed in R
Abstract
A quick and easy way to use a string as the random generator seed in R language.
A little bit of background
Using random seed is an essential part of reproducible research, and due to fluctuation of the results, some would use seed hunting for the "best" results (e.g accuracy in Machine Learning). For this reason and in order to detect who is doing honest research and who has wen down the seed hunting, people usually use random seeds like 1
or 12345
. The latter have been my random seed of choice is all my projects and research since 2016. I was once told that I have tried 1
, 12
, 123
, 1234
, and 12345
and then picked the best one. To that person I said that all my papers and my PhD thesis have used 12345
and they are all publicly available, so they can go and check. But wouldn't it be more convenient and easier to trust if were we be able to use a string (e.g our first name or full name) as the random seed? Unfortunately, and yet logically, R only accepts an integer as the random seed. This was my motivation to start looking for a way to use my name as the random seed.
So thefirst thing we need to do is to find a way to convert string. There are many ways we can achieve this. There are two ways that comes to mind:
- convert each character to its index. For example
"a"
would be1
,"b"
would be2
and so on. - convert the string into number.
The first method is suboptimal if we limit ourselves to c(letters, LETTERS)
because what if the user uses characters like space to separate their first name and last name, or use numbers or etc. Yes, we can expand that vector, but ... isn't that waste of time?
So let's get to the second method. There are many ways to do this. You can perhaps use an instance of CyberChef to play around and find some ways.
For instance I quickly put together this recipe: "mehrad" -> to decimal -> sum
which you can see it here. This is very simple method and it definitly does not create unique number for every string (i.e hash collision) but it is good enough. If you want to create a unique one, which will be a definit overkill in all aspects for such use-case, you can use one of the hashing algorithms but you would need third-party packages (e.g [openssl
package}(https://cran.r-project.org/web/packages/openssl/index.html )).
So here I would just go with the plan above and use only things in the base
package.
Note: I will use basepipe (|>
) in the following code blocks because it is very easy to add steps to the code without using Home and End multiple times, but you don't have to use pipe and you can use "normal" syntax of R. If you want to learn more about pipes, watch John Mount video or read David Selby's blog post.
The first step is to convert our string to integers:
utf8ToInt("mehrad")
[1] 109 101 104 114 97 100
utf8ToInt("mehrad") |> length()
[1] 6
We now have one integer per character, but the set.seed()
only uses one integer, so we should somehow turn it into one single number. There are many things we can do, but perhaps the simplest is to add them up :) So let's do that
utf8ToInt("mehrad") |> sum()
[1] 625
Now that we have it, we can use it for the seed:
utf8ToInt("mehrad") |> sum() |> set.seed()
# or in classic R syntax
set.seed(sum(utf8ToInt("mehrad")))
And as always it is good to test if our method really works:
set.seed(sum(utf8ToInt("mehrad"))) ; rnorm(10)
set.seed(sum(utf8ToInt("mehrad"))) ; rnorm(10)
set.seed(sum(utf8ToInt("mehrad"))) ; rnorm(10)
[1] -0.3949023 0.6945254 -0.2651758 -0.4293754 0.2215511 -1.0237239 0.4103700 0.6291080 0.4894505 -1.7841721 [1] -0.3949023 0.6945254 -0.2651758 -0.4293754 0.2215511 -1.0237239 0.4103700 0.6291080 0.4894505 -1.7841721 [1] -0.3949023 0.6945254 -0.2651758 -0.4293754 0.2215511 -1.0237239 0.4103700 0.6291080 0.4894505 -1.7841721
And if we change the string we get another set of random numbers:
# capitalize my name
set.seed(sum(utf8ToInt("Mehrad"))) ; rnorm(10)
# using full name
set.seed(sum(utf8ToInt("Mehrad Mahmoudian"))) ; rnorm(10)
[1] 1.3306590 1.0458094 -0.9870942 0.6997714 -0.7041404 -0.5791050 -2.1145910 1.3888766 0.8057116 0.1516002 [1] -0.54792314 -0.01842653 -0.28491016 1.15766660 0.60054169 -0.45916757 0.66566250 -0.11515557 0.64156640 -0.20554542