Python 2 is quite a bit faster at startup than Python 3, so add that test case.
The classic first program in various languages plus a wrapper to benchmark them. This is a nice way to measure startup overhead of language runtimes.