How I algorithmically donated $5000+ to Open Source via GitHub Sponsors and PyPI data

December 1, 2024

We all indirectly depend on open source software — a public good with a ~$9 trillion value, mainly developed by unpaid volunteers. But without maintenance, it can become dysfunctional or even harmful, and this meme brilliantly outlines the fragility of modern infrastructure.

That's why I find it crucial to fund OSS maintainers in a systemic way to efficiently mitigate risks in the software supply chain our world runs on. However, the key current financing solutions do not seem sufficient for this:

Large open-source foundations follow the joint interests of their corporate donors and mainly focus on major projects like Kubernetes, Postgres, Linux, etc. Together with corporations, they often overlook the long tail of small but crucial OSS (e.g. Log4J). Also, many such non-profits seem intransparent to me as a private donor.
Tools like Thanks.dev, Open Collective, and GitHub Sponsors are great for funding one's own supply chain or specific liked projects, including small OSS. However, such donations gravitate toward the most popular, not the most important OSS — and these two dimensions barely correlate (see the proof for Python below).

So a "random person from Nebraska" without publicity rarely gets funded, and it creates substantial risks for all. That's not cool! And what if I want to donate money to the global OSS at large to efficiently reduce such risks?

The solution can be an algorithm-based index for OSS funders — similar to investing in public indexes via ETFs instead of manual stock picking. It would highlight the most crucial but underfunded OSS, serving as an open-source analogue of the "S&P 500."

I would love to donate to such an open-source-at-large index, but there is none, and even niche ecosystems do not have such large-scale structures for donors. So I have built a simple MVP for Python and personally donated ~$5000 through it, mainly using GitHub Sponsors and PyPI data 🙂

GitHub Sponsors

GitHub introduced its sponsorship program in 2019 and has since facilitated $40M+ in donations to its users. However, only about 44,000 accounts are sponsorable on GitHub now — a tiny fraction (0.03%) of its vast ~150 million user base.

When looking at historical cohorts of sponsorable users (by a quarter of account creation), most cohorts have between 600 and 1,000 accounts. Unsurprisingly, the earliest and most recent cohorts tend to have fewer users.

Interestingly, the proportion of sponsorable users grows exponentially with the "age" of their accounts on GitHub. Yet, only 17% of these eligible users (~7,600) have any sponsors at all. The distribution of sponsors is highly uneven, resembling a power-law distribution—a pattern commonly seen in tech markets.

Although GitHub does not support donations via its API, it offers a bulk sponsorship via CSVs (up to 100 users per file). So, fortunately, it can be done at some scale.

But how should one decide which users to sponsor and how much to donate to each one? It requires data on their importance, and I used PyPI to roughly estimate it for Python packages.

PyPI: Popularity and Importance are disconnected

I began by analyzing a dataset of all projects on the Python Package Index (PyPI) which had over 100,000 downloads in the past 12 months (LTM) and then:

automatically identified linked GitHub users where possible (83%),
narrowed the list down to packages with sponsorable users (16%),
grouped these by user, resulting in 946 potential grantees.

LTM downloads also follow a power-law-like distribution. However, when comparing it with the number of GitHub Sponsors, the two metrics appear entirely disconnected!

In other words, there is almost no link between a project's significance (as measured by LTM downloads) and its popularity among GitHub Sponsors (reflected in the number of sponsors) for Python packages.

Algorithmic donation

Selecting open-source projects for financial support remains an imperfect process, with no widely accepted approach or consensus within the OSS community. For the sake of experiment, I decided to start with something relatively simple:

Microgrants ranging from $1 to $200, with a total budget of ~$5,000.
Larger grants were assigned to users with greater average "value" or higher "risk"
Value increases with # total downloads and LTM downloads on PyPI.
Risk increases with the project size and OpenSSF score (security risk)
Risk descreases with the number of sponsors (a proxy for funding).
Metrics were normalized or log-normalized to account for power-law distributions.
Score = Value x Risk

After gathering and normalizing the data, I allocated the budget proportionally based on each project's total score. Grant amounts were rounded, and any grants falling below $1 were removed. Also, some GitHub users had custom minimum thresholds for one-time donations that exceeded my calculated grant amounts. To address this, some grants were increased in cases where the difference was no more than $25.

The final outcome was a list of 866 GitHub users, to whom I donated a total of $5,037 via GitHub Sponsors on November 29, 2024. The largest microgrant was awarded to scikit-learn.

Considerations for the future

Open source maintenance could secure more funding from individuals (~150M GitHub users) if there were more transparent, scalable and systemic tools to efficiently support OSS-at-large. A few highly relevant components seem to be missing for now:

OSS-at-large index algorithmically identifying the most crucial and underfunded OSS from the global software supply chain's perspective. It should include projects across all ecosystems (Python, JavaScript, etc.) using a common approach. Although challenging, most required inputs for it are already online.
More open funding data telling potential donors how well-funded a project is and how to support it. Standardization efforts like GitHub's funding.yml and FLOSS' funding.json are reasonable initiatives in this area.
Funding links in package managers. I was surprised how unstructured the data for PyPI packages is regarding links to GitHub repositories and the associated maintainers who should receive funding. A standardized "funding link" would help resolve this issue and create a more cohesive system.

The last but not the least: I highly recommend the fantastic initiative Open Source Pledge, which requires companies to donate $2,000 per developer annually to OSS maintainers. Unfortunately, it does not accept individuals yet, and if you believe this should change (as I do), please join the ongoing discussion.