@arj one thing is that the performance of a run could depend on sub dependencies. for example, I found that ssb-friends was a bottleneck, but I didn't need to change the api to fix it, so it was just a patch. But npm@5 has package locks, so we can hash that and that's the exact code we'll be running. Maybe we can use those to override some deps and run forks of various modules if necessary? To get good results, it probably is necessary to be able to rerun old versions, because then we can make a fair comparison if we run the same benchmark on different hardware. (also - there are multiple factors which determine performance! memory, cpu, disk, network)
Hmm, so currently the modules being benchmarked are just dependencies of the benchmark... if we instead make the benchmarks into a module and pass in the deps then we could run new benchmarks against old code...