The Case for Data-Driven Open Source Development

December 7, 2018

599

The lack of standardized metrics, datasets, methodologies and tools for extracting insights from Open Source projects is real.

Open Source Metrics That Actually Matter

Let’s take a look at the first part of the problem: the metrics. OSS Project stakeholders simply don’t have the data to make informed decisions, identify trends and forecast issues before they arise. Providing standardized and language agnostic data to all stakeholders is essential for the success of the Open Source industry as a whole. Everyone can query the API of large code repositories such as GitHub and GitLab and find interesting metrics but this approach has limitations. The data points you can pull are not always available, complete or structured properly. There are also some public datasets such as GH Archive but these datasets are not optimized for exhaustive querying of commits, issues, PRs, reviews, comments, across a large set of distributed git repositories.

Retrieving source code from a mono-repository is an easier task, but code retrieval at scale is a pain point for researchers, Open Source maintainers or managers who want to track individual or team contributions.

Open Source Metrics That Actually Matter

RELATED ARTICLESMORE FROM AUTHOR

Building Autonomous ML Experimentation with Tangle and Tangent

Score Big on Your Tech Career

Celebrating the Second Year of Linux Man-Pages Maintenance Sponsorship

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

RELATED ARTICLES MORE FROM AUTHOR