better source metrics
Since we only have so much time in our day, we tend to use heuristics to determine the value of a project - the heuristics save us time, but they can be misleading.
If you are anything like me, the first few things you look at when analyzing a library’s usefulness are the following:
- The number of stars and likes on the project
- The most recent commit date
- The number of commits
- Sponsoring organizations
In this post, I will explain why I think some of these metrics are bad signals of a project and highlight a few more ways of analyzing a project’s health for the long term.
By spending a few more minutes (or hours) analyzing a repository, you can save multiples of that time down the line.
Low Signal Metrics
Stars and Popularity
Looking at the number of stars on a project should tell you if it’s a good project or not, right? It turns out that stars are more of a sign of how good a project’s marketing is than the code’s actual usefelness. I know several projects that have hundreds of stars but no active users: how does this happen?
I think that the answer is simple: marketing a project is a separate skill from writing a project that is useful and feature complete. Often, most people have one skill or the other - it is relatively rare to find a person with both.
In fact, stars often encourage a marketing first mentality, since we get a boost of dopamine every time someone likes our project, but there’s less glory and good feelings in fixing bugs or writing useful features.
I’ve also found that the projects with the most stars tend to be the simplest concepts: a project that is complex and high value will often have less stars than a simple but neat project because it takes effort to understand the value of the complex project.
Commit Recency and Frequency
I often fool myself into thinking a project with active development is more useful than a project that has not had a commit in weeks or months, but in truth, a good project will not necessarily need frequent commits. In fact, frequent commits often mean more bugs due to churn, less stable APIs and worse performance because the hot spots keep changing.
If a project has lots of commits that are noisy with bad commit messages, is that better than a project that has a smaller number of dense but more impactful commits?
Corporate Sponsorship
Corporate sponsorship can actually be a useful metric, but not always. When a company sponsors a project, we assume that there are people being paid to use and maintain the project. If so, that is a really valuable signal! But sometimes, companies will just throw a project over the wall and we have no idea if it is used internally or how useful it is. In this case, it makes sense to audit the commit log and try to get a sense for whether this project is actually used in the company’s day to day or of it is a side project of a developer.
High Signal Metrics
Issues and PRs
A healthy project will have a fast turn-around time on issues and PRs even if the commit frequency is low. If you look at a project and see many open PRs with no comment from the core contributors: this is likely a bad sign. Of course, you need to look at the closed PRs and issues as well to get the full picture. Look at the dialogue and attitude of the owners of the repo when they are handling issues: do they respond quickly or do they let issues hang open for multiple days or weeks without comment?
Commit History
Read the commit log and see how well features are implemented. Are there lots of small commits that are relatively unimportant, like one per commit per style change? Do the commits contain test plans and well thought out messages describing the feature and motivation? It’s easy to generate a project with thousands of useless commits that looks active but is really just busy work.
Test drive the project
Often, to determine which library to use takes me multiple attempts. Consider graphing libraries: there are dozens of them - but how do you know which one will work for you? Perhaps you need a line chart, bar charts and interactive popovers.
Previously, I would try out the most promising looking library and then implement each feature I needed one by one. And often I would discover that the library was missing an important feature somewhere down the list. But in my initial implementation I would start by implementing the easiest feature until I ran into the missing feature. Instead, I recommend thinking about which features are necessary and then determining if the library has all the features you will need. This means starting with the hardest to implement feature and seeing how well it is supported, instead of starting with the easiest.
Documentation
If the repository has good documentation, that is a likely signal that thought and effort have gone into making a stable project. Additionally, you’ll have to refer to the documentation over and over again while using the project down the line: if you make sure the docs are good, you can save yourself a lot of headaches later.
Project Longevity
Has the project been maintained for years? I often find projects that only last for a year or two before going into ‘unmaintained mode’. How can you figure out if the project you are about to use will still be around in a few years? I think it makes sense to look at the author’s other projects and try to determine if they are the type to commit long term to their projects or if they like to make a new project and jump ship every few years.
Final Thoughts
Take some time to audit the projects you will use: instead of using the easy to observe metrics, dig a little: Look at the code hygiene, look at developer’s responsiveness and make sure that the code will be around for a while. Otherwise, you might find yourself building on quicksand.