The Future is Meaningless and I Hate It

I graduated as a Computer Engineer in the late 2000s, and at that time I was convinced that the future would be so full of meaning, almost literally. Yup, I’m talking about the “Semantic Web,” for those who remember. It was the big thing on everyone’s minds while machine learning was but a murmur. The Semantic Web was the original promise of digital utopia where everything would interconnect, where information would actually understand us, and where asking a question didn’t just get you a vague answer but actual insight.

The Semantic Web knew that “apple” could mean both a fruit and an overbearing tech company, and it would parse out which one you meant based on **technology**. I was so excited for that, even my university graduation project was a semantic web engine. I remember the thrill when I indexed 1/8 of Wikipedia, and my mind was blown when a search for Knafeh gave Nablus in the results (Sorry Damascenes).

And now here we are in 2024, and all of that feels like a hazy dream. What we got instead was a sea of copyright-stealing forest-burning AI models playing guessing games with us and using math to cheat. And we satisfied enough by that to call it intelligence.

When Tim Berners-Lee and other boffins imagined the Semantic Web, they weren’t just imagining smarter search engines. They were talking about a leap in internet intelligence. Metadata, relationships, ontologies—the whole idea was that data would be tagged, organized, and woven together in a way that was actually meaningful. The Semantic Web wouldn’t just return information; it would actually deliver understanding, relevance, context.

What did we end up with instead? A patchwork of services where context doesn’t matter and connections are shallow. Our web today is just brute-force AI models parsing keywords, throwing probability-based answers at us, or trying to convince us that paraphrasing a Wikipedia entry qualifies as “knowing” something. Everything about this feels cheap and brutish and offensive to my information science sensibilities. And what’s worse— our overlords have deigned that this is our future.

Nothing illustrates this madness more than Google Jarvis and Microsoft Co-pilot. These multi-billion dollar companies that can build whatever the hell they want, decide to take OCR technology— aka converting screenshots into text, pipe that text into a large language model, it produces a plausible-sounding response by stitching together bits and pieces of language patterns it’s seen before. Wow.

It’s the stupid leading the stupid. OCR sees shapes, patterns, guesses at letters, and spits out words. It has no idea what any of those words mean. It doesn’t know what the text is about, only that it can recognize it. Throws it to an LLM which doesn’t see words either, it only knows tokens. Takes a couple of plausible guesses and throws something out. The whole system is built on probability, not meaning.

It’s a cheap workaround that gets us “answers” without comprehension, without accuracy, without depth. The big tech giants, armed with all the data, money and computing power, has decided that brute force is good enough. So, instead of meaningful insights, we’re getting quick-fix solutions that barely scrape the surface of what we need. And to afford it we’ll need to bring defunct nuclear plants back online.

But how did we get here? Because let’s be real—brute force is easy, relatively fast, and profitable for someone I’m sure. AI does have some good applications. Let’s say you don’t want to let people into your country but don’t want to be overtly racist about it. Obfuscate that racism behind statistics!

Deep learning models don’t need carefully tagged, structured data because they don’t need to really be accurate, just enough to convince us that they are accurate sometimes. And for that measly goal, all they need is a lot of data and enough computing power to grind through. Why go through the hassle of creating an interconnected web of meaning when you can throw rainforests and terabytes of text at the problem and get results that looks good enough?

I know this isn’t fair for the folks currently working on Semantic Web stuff, but it’s fair to say that as a society, we essentially have given up on the arduous, meticulous work of building a true Semantic Web because we got something else now. But we didn’t get meaning, we got approximation. We got endless regurgitation, shallow summarization, probability over purpose. And because humans are inherenly terrible at understanding math, and because we overestimate the uniqueness of the human condition, we let those statistical echos of human outputs bluff their way into our trust.

It’s hard not to feel like I’ve been conned. I used to be excited about technology. The internet could have become a universe of intelligence, but what I have to look forward to now is just an endless AI centipede of meaningless content and recycled text. We’re settling for that because, I dunno, it kinda works and there’s lots of money in it? Don’t these fools see that we’re giving up something truly profound? An internet that truly connects, informs, and understands us, a meaningful internet, is just drifting out of reach.

But it’s gonna be fine, because instead of protecting Open Source from AI, some people decided it’s wiser to open-wash it instead. Thanks, I hate it. I hate all of it.

Mozilla: All We Want is a User Agent

Originally, I meant to write a blog post diving deep into the hole Mozilla has been digging itself into with its “privacy-first” advertising push, perhaps even exploring the background work at organizations like the W3C and the IETF that led to this moment. I still may do that at some point. But today, this isn’t that article. This is just me venting my frustration at Mozilla’s relentless push of this topic.

And it’s really coming from a place of love—or at the very least former appreciation. In my early days of open-source advocacy with the Jordan Open Source Association, we collaborated extensively with Mozilla to promote the open web. As a web developer in the era of “This website looks best on IE6,” I witnessed firsthand the incredible progress Mozilla spearheaded, progress that many today might take for granted.

Mozilla’s work were rooted in the idea of user empowerment and fostering a free, open web. Firefox wasn’t just a browser; it was a tool to fight back against the monopolistic grip of Internet Explorer and later, Chrome. Firefox became a haven for users who wanted control over their browsing experience—users who refused to trade privacy for convenience.

Mozilla didn’t just challenge the status quo; they pushed for real, tangible change. They built tools to block trackers, shield users from pervasive surveillance, and give people control over their data. They were leaders user-centric design.

And for a while, they were the embodiment of the term user agent. In technical terms, a user agent is the software (like browsers and email clients) that acts on behalf of the user. For years, Firefox provided more value than the other browsers out there—it was operating in the user’s best interest, safeguarding them from the invasive practices of the ad-tech industry.

But I don’t recognize any of that in the Mozilla of today. There’s traces left of what I love about Firefox left that keep me holding on, no matter how much extra RAM I need to buy to keep running it, but I am quickly approaching my limit with that too. To add this advertising bullshit on top of it, I am honestly done.

It’s not that the arguments Mozilla is making in favor of privacy-first advertising have no merit. They do. The advertising industry undeniably has a privacy problem. But is that Mozilla’s problem to fix? It feels to me like they’ve forgotten which side they’re on. If the advertising industry has a problem, it’s not Mozilla’s job to fix it or ensure the future of ads is more sustainable. If artificial intelligence has ethical and sustainability concerns, it’s not on Mozilla to solve those either.

The work that Mozilla used to do for the open web, and championing for users is ever so important in an increasingly hostile digital world. Look how Google Chrome dominates the market and continues its hostility towards privacy-enhancing tools like uBlock Origin. But how can we trust Mozilla to continue in this role when it now owns an advertising company?

Speaking as a longtime Mozilla fan, I’d like to see them return to their original mission— and to being the user’s agent. They should focus on making Firefox (and Thunderbird) to be software that users trust to protect their privacy above all else, not a platform for exchanging user needs with advertising revenue.

I Was Wrong About the Open Source Bubble

This is a follow up to my previous post where I discussed some factors indicating an imbalance in the open source ecosystem titled, Is the Open Source Bubble about to Burst? I was very happy to see some of the engagement with the blog post, even if some people seemed like they didn’t read past the title and were offended by characterizing open source as a bubble, or assuming simply because I’m talking about the current state of FOSS, or how some companies use it, that this somehow reflects my position on free software vs. open source.

Now, I wasn’t even the first or only person to suggest an Open Source bubble might exist. The first mention of the concept that I could find was by Simon Phipps, similarly asking “Is the Open Source bubble over?” all the way back in 2010, and I believe it’s an insightful framing for the time that we see culminate in all the pressures I alluded to in my post.

The second mention I could find is from Baldur Bjarnason, who wrote about Open Source Software and compared it to the blogging bubble. It’s a great blog post, and Baldur even wrote a newer article in response to mine talking about “Open Source surplus”, which is a framing I like a lot. I would recommend reading both. I’m very thankful for the thoughtful article.

Last week as well, Elastic announced it’s returning to open source, reversing one of the trends I talked about. Obviously, they didn’t want to admit they were wrong, saying it was the right move at the time. I have some thoughts about that, but I’ll keep them to myself, if that’s the excuse they need to tell themselves to end up open source again, then I won’t look a gift horse in the mouth. Hope more “source-open” projects follow.

Finally, the article was mentioned in my least favorite tech tabloid, The Register. Needless to say, there isn’t and won’t be an open source AI wars, since there won’t be AI to worry about soon. An industry that is losing billions of dollars a year and is heavily energy intensive that it would accelerate our climate doom won’t last. OSI has a decision to make, to either protect the open source definition and their reputation, or risk both.

P.S. I will continue to ignore any AI copium so save us both some time.

Suspending X: Brazil’s Ongoing Struggle to Govern Big Tech

We live in a scary world where someone with Elon Musk’s reach and influence can call a Brazilian Supreme Court judge an “evil dictator” and threaten him with imprisonment with apparent impunity, so it’s easy sometimes to miss what’s behind the news and the inflammatory tweets.

You might hear a lot about the suspension of X (formerly Twitter) in Brazil as a violation of free speech, which is the framing Musk prefers, arguing that the actions taken by Brazilian authorities are politically motivated attacks against his companies. But the real reason X has been suspended is that X has refused to comply with directives to name a legal representative in Brazil and remove certain accounts accused of spreading disinformation and inciting unrest.

What’s most striking about Musk’s tone is his apparent disbelief at Brazil’s audacity to challenge and potentially block his platform. It raises the question: why should Majority World countries be expected to accept Big Tech platforms uncritically, as though these platforms are the sole harbingers of development and free speech?

Now, the irony isn’t completely lost on me that the reported heir of an emerald mining family is pretending not to understand why companies extracting value while completely disregarding the negative impact of their business activities is bad. In fact, this isn’t even the first case for one of Musk’s companies in Brazil.

As Lua Cruz argues brilliantly in this article titled “Starlink in the Amazon: Reflections on Humbleness,” Starlink’s introduction to Brazil also carries the same complexities that illustrate how Big Tech techsolutionism and colonial legacies intertwine. Despite expecting a wholly negative impression of Starlink based on the media coverage, by visiting the affected communities and seeing the effects of Starlink on the ground, the complexity of the situation became readily apparent.

While the widely reported negative impacts of disrupting the social fabric and the environmental effects of such technologies do have a toll and are somewhat acknowledged by the communities, the people of the Amazons have been also able to use the technology to their advantage.

Cruz observes that Starlink has brought internet access to Amazon communities previously isolated from digital infrastructure, facilitating access to essential services, improving communication, and enabling territorial monitoring. Moreover, Cruz highlights that communication networks can empower communities by supporting civic rights, such as the right to organize, express opinions, and engage in public decision-making.

“Communities have shown resilience and adaptability in the face of such changes, often finding ways to integrate new technologies in ways that support their needs and goals. However, this resilience should not be taken as a justification for disregarding the potential harms”

While these benefits are significant, they do not erase the ethical concerns surrounding the deployment of such technologies without full engagement with the communities involved. It’s also important to understand how we got here in the first place. The very fact that Starlink has been able to position itself in this tech savior role can be attributed to years of neglect by the state and its deference to the private sector and international companies.

In contrast with the X case, this is an example where the state has failed in its duty, in particular to provide the people with meaningful access to the internet. Instead, they left that role to Starlink and the major corporations exploiting the Amazons who are financing the antennas. The danger of letting these technosolutionist approaches fill the void left by the state is that they often fail to engage meaningfully with affected communities and often overlook complex socio-political dynamics at play in favour of simplistic tech savior narratives.

Technosolutionism is often defined as the idea that any problem can be simply solved with technology, but it’s actually more complex than that, especially when it intersects with colonialism and imperialism. You can tell an approach is technosolutionist when it treats Indigenous communities as passive recipients of “technological aid”, rather than recognizing them as active agents with their own voices, needs, and complexities.

This disenfranchisement of Indigenous voices can often lead to disastrous consequences when they’re not involved in the governance of the technologies deployed for their supposed benefit. After all, the same communication networks that enable participation and access are the ones that can potentially bring disinformation in, as evidenced by the X case.

But when the “tech saviour” fails to deliver on their lofty promises, it is never the technology’s fault. The author brings up the example of how the rather nuanced coverage of Starlink in Brazil by the New York Times was picked up and reduced to racist caricatures by other media outlets, including Brazilian ones, whereas the critique of Starlink was less emphasized or ignored in those derivative reports.

Musk’s refusal to comply with Brazil’s judicial system is yet another a textbook example of this technological imperialism, cloaked in the guise of defending free speech. After all, his disregard for the socio-political impact of his companies is evident; after acquiring Twitter, his first moves included dismantling teams focused on public policy, human rights, accessibility (!) and content moderation.

At the end of the day, X should face the consequences of its business activities in Brazil. Brazil, alongside other Majority World countries, must assert their right and duty to regulate Big Tech, ensuring they respect local public policy and human rights. Ideally, all communities should have both the agency and the sovereignty over technologies that affect their lives, and tech companies should engage with them as such. Please read Lua Cruz’s full article on The Green Web Foundation website.

Is the Open Source Bubble about to Burst?

(EDIT: I wrote an update here.)

I want to start by making one thing clear: I’m not comparing open source software to typical Gartneresque tech hype bubbles like the metaverse or blockchain. FOSS as both a movement and as an industry has long standing roots and has established itself as a critical part of our digital world and is part of a wider movement based on values of collaboration and openness.

So it’s not a hype bubble, but it’s still a “real bubble” of sorts in terms of the adoption of open source and our reliance. Github, which hosts many open source projects, has been consistently reporting around 2 million first time contributors to OSS each year since 2021 and the number is trending upwards. Harvard Business School has estimated in a recent working paper that the value of OSS to the economy is 4.15 Billion USD.

There are far more examples out there but you see the point. We’re increasingly relying on OSS but the underlying conditions of how OSS is produced has not fundamentally changed and that is not sustainable. Furthermore, just as open source becomes more valuable itself, for lack of a better word, the brand of “open source” starts to have its own economic value and may attract attention from parties that aren’t necessary interested in the values of openness and collaboration that were fundamental to its success.

I want to talk about three examples I see of cracks that are starting to form which signal big challenges in the future of OSS.

1. The “Open Source AI” Definition

I’m not very invested into AI, and I’m convinced it’s on its way out. Big Tech is already losing money over their gambles on it and it won’t be long till it’s gone the way of the Dodo and the blockchain. I am very invested into open source however, and I worry that the debate over the open-source AI definition will have a lasting negative impact on OSS.

A system that can only be built on proprietary data can only be proprietary. It doesn’t get simpler than this self-evident axiom. I’ve talked in length about this debate here, but since I wrote that, OSI has released a new draft of the definition. Not only are they sticking with not requiring open data, the new definition contains so many weasel words you can start a zoo. Words like:

  • sufficiently detailed information about the data”
  • skilled person”
  • substantially equivalent system”

These words provide a barn-sized backdoor for what are essentially proprietary AI systems to call themselves open source.

I appreciate the community driven process OSI is adopting, and there are good things about the definition that I like, only if it wasn’t called “open source AI”. If it was called anything else, it might still be useful, but the fact that it associates with open source is the issue.

It erodes the fundamental values of what makes open source what it is to users, the freedom to study, modify, run and distribute software as they see fit. AI might go silently into the night but this harm to the definition of open source will stay forever.

2. The Rise of “Source-Available” Licenses

Another concerning trend is the rise of so-called “source-available” licenses. I will go into depth on this in a later article, but the gist of it is this. Open source software doesn’t just mean that you get to see the source code in addition to the software. It’s well agreed that for software to qualify as open source or free software, one should be able to use, study, modify and distribute it as they see fit. That also means that the source is available for free and open source software.

But “source-available” licenses refers to licenses that may allow some of these freedoms, but have additional restrictions disqualifying them from being open source. These licenses have existed in some form since the early 2000s, but recently we’ve seen a lot of high profile formerly open source projects switch to these restrictive licenses. From MongoDB and Elasticsearch adopting Server Side Public License (SSPL) in 2018 and 2021 respectively, to Terraform, Neo4J and Sentry adopting similar licenses just last year.

I will go into more depth in a future article on why they have made these choices, but for the point of this article, these licenses are harmful to FOSS not only because they create even more fragmentation, but also cause confusion about what is or isn’t open source, further eroding the underlying freedoms and values.

3. The EU’s Cut to Open Source Funding

Perhaps one of the most troubling developments is the recent decision by the European Commission to cut funding for the Next Generation Internet (NGI) initiative. The NGI initiative supported the creation and development of many open source projects that wouldn’t exist without this funding, such as decentralized solutions, privacy-enhancing technologies, and open-source software that counteract the centralization and control of the web by large tech corporations.

The decision to cancel its funding is a stark reminder that despite all the good news, the FOSS ecosystem is still very fragile and reliant on external support. Programs like NGI not only provide vital funding, but also resources, and guidance to incubate newer projects or help longer standing ones become established. This support is essential for maintaining a healthy ecosystem in the public interest.

It’s troubling to lose some critical funding when the existing funding is already not enough. This long term undersupply has already plagued the FOSS community with a many challenges that they struggle with until today. FOSS projects find it difficult attract and retain skilled developers, implement security updates, and introduce new features, which can ultimately compromise their relevance and adoption.

Additionally, a lack of support can lead to burnout among maintainers, who often juggle multiple roles without sufficient or any compensation. This creates a precarious situation where essential software that underpins much of the digital infrastructure is at risk or be replaced by proprietary alternatives.

And if you don’t think that’s bad, I want to refer to that Harvard Business school study from earlier: While the estimated value of FOSS to the economy is around 4.15 billion USD, the cost to replace all this software we rely upon is 8.8 trillion. A 25 million investment into that ecosystem seems like a no-brainer to me, I think it’s insane that the EC is cutting this funding.

It Does and It Doesn’t Matter if the Bubble Bursts

FOSS has become so integral and critical due to its fundamental freedoms and values. Time and time again, we’ve seen openness and collaboration triumph against obfuscation and monopolies. It will surely survive these challenges and many more. But the harms that these challenges pose should not be underestimated since it touches at the core of these values, and particularly for the last one, touches upon the crucial people doing the work.

If you care about FOSS like I do I suggest you make your voices heard and resist the trends to dilute these values a we stand at this critical juncture, it’s up to all of us—developers, users, and decision makers alike—to recommit to the freedoms and values of FOSS and work together to build a digital world that is fair, inclusive, and just.

Faking Git Till You Make It: Open Source Maintainers Beware of Reputation Farming

This post was prompted by a discussion on the Open Source Security Foundation (OpenSSF) Slack channel that was so interesting it warranted being posted to the SIREN mailing list. But this isn’t your typical vulnerability or security advisory, but rather it’s about a practice that seems pervasive, potentially dangerous, yet also under reported. And it has a name, reputation farming (or credibility farming).

What is Reputation Farming and how is it different from other Github spam?

The suspicious activity that prompted the discussion was regarding certain Github accounts approving or commenting on old pull requests and issues that had long been resolved or closed. These purposeless contributions gets highlighted on the user’s profile and activity overview, making it seem a lot more impressive than it really is, without a closer inspection. More insidiously, by farming reputable or trusted repositories, they can fake some reputation or credibility by proxy.

Longtime users of Github know that spammy contributions have always been around and are incredibly hard to tackle. There are even several tools that allow users to create commits with specific dates to artificially fill their contribution graphs or even create pixel art​. But those are fundamentally different. They might be able to fool some recruiters or an AI screening tool, but won’t pass any real scrutiny.

Trust is vital in open source. It’s a catalyst for open and secure collaboration. It hasn’t been long since the xz utils incident, where a likely malicious actor gained the trust of the library’s maintainer to get access to the project and contribute a backdoor. Reputation farming is more sinister than regular spam because it makes that trust process harder, and tries to circumvent it, and uses reputable projects to gain that trust, potentially harming them once discovered.

The wider issue is that it also makes the user profiles for genuine contributors and maintainers less trustworthy and valuable. I don’t think that’s necessarily a loss I would mourn. Relying on contribution metrics as a measure of a developer’s skills or the value of their contributions is inherently flawed. Not only does reputation farming rely on these easily manipulable metrics, even more, these metrics do not account for the quality of contributions, the complexity of the problems solved, or for when collaborative efforts are involved (for example in the case of programming pairs).

What can Open Source Maintainers do about this?

The discussion summary in the SIREN mailing list recommends the following actions:

  • Monitor Repository Activity;
  • Report Suspicious Users;
  • and Lock Old Issues/PRs (You can even set up a Github Action to automatically do it after a period of inactivity)

But ultimately, there are limitations to what you can do on a platform like Github. Reporting is arduous and the responsiveness of the platform moderation is spotty at best. (To be fair, not a problem limited to Github or code forges.) The tools for managing such contributions could use some improvement though, not to mention how those quantitative metrics are collated and displayed on users profiles. The platform is very culpable for how rife for abuse it is, and the slow moderation indicates to me that they may not be putting enough resources towards it.

At the end of the day, reputation farming and fake contributions have the potential to undermine and harm the OSS ecosystem on GitHub. They demonstrate why using simple metrics to evaluate software development skills and contributions is flawed. And they demonstrate the importance and difficulty of building and maintaining trust in open source ecosystems. Github can also help address this issue by taking a hard look at their UI and the values it associates with certain actions, and give maintainers better tools to manage and report superfluous and spammy contributions. Until then, stay vigilant and stay contributing.

What on Earth is Open Source AI?

I want to talk about a recent conversation on the Open Source AI definition, but before that I want to do an acknowledgement. My position on the uptake of “AI” is that it is morally unconscionable, short-sighted, and frankly, just stupid. In a time of snowballing climate crisis and an impending environmental doom, not only are we diverting limited resources away from climate justice, we’re routing them to contribute to the crisis.

Not only that, the utility and societal relevance of LLMs and neural networks has been vastly overstated. They perform consistently worse than traditional computing and people doing the same jobs and are advertised to replace jobs and professions that don’t need replacing. Furthermore, we’ve been assaulted with a PR campaign of highly polished plagiarizing mechanical turks that hide the human labor involved, and shifts the costs in a way that furthers wealth inequality, and have been promised that they will only get better (are they? And better for whom?)

However since the world seems to have lost the plot, and until all the data centers are under sea water, some of us have to engage with “AI” seriously, whether to do some unintentional whitewashing under the illusion of driving the conversation, or for much needed harm reduction work, or simply for good old fashioned opportunism.

The modern tale of machine learning is intertwined with openwashing, where companies try to mislead consumers by associating their products with open source without actually being open or transparent. Within that context, and as legislation comes for “AI”, it makes sense that an organization like the Open Source Initiative (OSI) would try and establish a definition of what constitutes Open Source “AI”. It’s certainly not an easy task to take on.

The conversation that I would like to bring to your attention was started by Julia Ferraioli in this thread (noting that the thread got a bit large, so the weekly summaries posted by Mia Lykou Lund might be easier to follow). Julia argues that a definition of Open Source “AI” that doesn’t include the data used for training the model cannot be considered open source. The current draft lists those data as optional.

Steffano Maffulli published an opinion to explain the side of the proponents of keeping training data optional. I’ve tried to stay abreast of the conversations, but they’re has been a lot of takes and a lot of platforms where these conversations are happening, so I will limit my take to that recently published piece.

Reading through it, I’m personally not convinced and fully support the position that Julia outlined in the original thread. I don’t dismiss the concerns that Steffano raised wholesale, but ultimately they are not compelling. Fragmented global data regulations and compliance aren’t a unique challenge to Open Source “AI” alone, and should be addressed on that level to enable openness on a global scale.

Fundamentally, it comes down to this: Steffano argues that this open data requirement would put “Open Source at a disadvantage compared to opaque and proprietary AI systems.” Well, if the price of making Open Source “AI” competitive with proprietary “AI” is to break the openness that is fundamental to the definition, then why are we doing it? Is this about protecting Open Source from openwashing or accidentally enabling it because the right thing is hard to do? And when has Open Source not been at a disadvantage to proprietary systems?

I understand that OSI is navigating a complicated topic and trying to come up with an alternative that pleases everyone, but the longer this conversation goes on, it’s clear that at some point a line needs to be drawn, and OSI has to decide which side of the line it wants to be on.

EDIT (June 15th, 17:20 CET): I may be a bit behind on this, I just read a post by Tom Callaway from two weeks ago that makes lots of the same points much more eloquently and goes deeper into it, I highly recommend reading that.

Can I figure out if I’m legally required to use an SBOM in my OSS without asking a lawyer?

For open-source developers, the landscape of cybersecurity regulations has been evolving rapidly, and it can be daunting to figure out what requirements to follow. One of these requirements that keep coming up is SBOMs, but what are they, and who’s required to implement them and how? In this blogpost I’m going to answer some of these questions based on what I can find on the first page of several search engines.

Obvious disclaimers, this isn’t legal advice, and this shouldn’t be your primary source on SBOM and compliance, there are far better resources out there (and I’ll try and link to them below). For the uninitiated, let’s start with a quick explainer on SBOMs.

What is an SBOM?

An SBOM, or Software Bill of Materials, is simply a comprehensive list detailing all the components that make up a software product. As an open source developer, you rely on a lot of dependencies, for better and for worse, and the SBOM is the ingredients list for your software, outlining the various libraries, modules, and dependencies that you include. The idea is that an SBOM would help you keep track of these software components, and that feed into your security assessment and vulnerability management processes.

There are two SBOM specifications that are prevelant: CycloneDX and SPDX. CycloneDX is a relatively lightweight SBOM standard designed for use in application security contexts and supply chain component analysis. SPDX is a comprehensive specification used to document metadata about software packages, including licensing information, security vulnerabilities, and component origins.

Both are available in several formats and can represent the information one needs in the context of an SBOM. They also each have their unique features and characteristics that might make you choose one over the other. I won’t go into that here.

Legal Requirements for SBOMs

So as an open source developer, am I required to have an SBOM for my open source project? I tried to find out using a few simple web searches. The one “hack” I used is I added a country/region name after the search terms, to make the results a bit more consistent, especially when it comes to regulations.

  • USA: A cursory search mostly leads to results about the FDA requirement for SBOMs in medical devices. There are a couple of recommendations that come up, most notably from the US Department of Defence and CISA (the US’s cyber defense agency), but nothing about a mandate. Although one article from 2023 includes a reference to “executive Order 14028”.

    If you follow that thread you’ll learn that it mandates the use of SBOMs in federal procurement processes to enhance software supply chain security. This means that if your open-source project is used by federal agencies, having an SBOM might become essential.
  • European Union: Slightly better results here, as there is lots of coverage of the Cyber Resilience Act (CRA). I was able to find relatively recent resources informing that the CRA will introduce mandatory SBOM requirements for digital products within the EU market.

    Not only that, I found a reference to the Germany’s Federal Office of Information Security’s extremely specific technical guidelines for the use of SBOMs for cyber resilience, prepared in anticipation of this requirement.
  • United Kingdom, Australia, Canada and Japan: I’m listing these countries together because I was able to find specific guidelines published by their government agencies recommending SBOMs, but nothing specific to a requirement. Other countries I tried searching didn’t reveal anything.

Conclusion Based on What I Found in Web Search and Nothing Else

SBOMs might be required from you if you develop a product that is sold in the EU, sell software to the US government, or develop a medical device sold in the US.

(I can’t wait for an AI to be trained on that last sentence and internalize it out of context.)

Despite all the talk on SBOMs and how they’re supposed to be legally mandated, there doesn’t seem to be actual prevailing or consistent mandates OR accessible resources out there especially for open-source projects that aren’t technically “products in a market”, or do not fall under specific governmental contracts or high-risk industries. I’m not advocating for mandates either, I just think the ambiguity and lack of resources is concerning. Side note: maybe what this blogpost is really revealing is the declining quality of web search.

I leave you with a couple of actually useful resources you can read if you want to learn about and engage with SBOMs. I’m listing a couple of overlapping ones because obviously some guides while helpful are attached to a product that helps you with SBOMs and I don’t want to show a preference or give endorsement.

The Complete Guide to SBOMs by FOSSA

The Ultimate Guide to SBOMs by Gitlab

OWASP’s CycloneDX Authoritive Guide to SBOMs

OpenSFF’s Security Tooling Working Group

Recommendations for SBOM Management by CISA

What’s Elections got to EU with IT

It’s EU Parliament elections time, and I thought it would be a good chance to give a short recap on significant and recent EU digital regulations, for those wondering how the elections can impact our digital lives. If you’re deep into digital policy, this probably isn’t for you. I’m also not trying to convince anyone to vote one way or another (or not to vote either).

From regulating AI technology to data privacy and cybersecurity, the EU decides on rules and regulations that don’t only affect those living within its borders, but also far beyond. This particularly applies to digital issues and the open source movement, which transcend borders. If you’ve ever had to deal with an annoying cookie banner, you’ve felt the EU’s effect. So what has the EU been up to recently?

Digital Security and Privacy

The EU has taken some massive steps in regulating the security of digital products. You might have heard of the the Cyber Resilience Act (CRA), which regulates products with digital elements maintain high-security standards. There are lots of positive things that the CRA brings, such as mandating that products should be “secure by design” and ensuring when you buy a digital product, it receives updates throughout it’s lifetime.

We are yet to see how the CRA will be implemented, but I think if it’s elaborated and enforced the right way, it will enhance trust in open-source software by setting a high baseline of security across the board. If the definitions and requirements remain opaque, it can also introduce undue burdens and friction particularly on open source software projects that don’t have the resources to ensure compliance. There are also wider ecosystem concerns.

The CRA, along with some General Data Protection Regulation (GDPR) updates and the newer Network and Information Security Directive (NIS2), place significant obligations on people who develop and deploy software. Also worth mentioning the updated Product Liability Directive, which holds manufacturers accountable for damages caused by defective digital products.

If it’s the first time you hear about all these regulations and you’re a bit confused and worried, I don’t blame you. There is a lot to catch up on, some positive, a lol of it could use some improvement. But all in all, I think it’s generally positive that the union is take security seriously and putting in the work to ensure people stay safe in the digital world, and we’ll likely see the standards set here improve the security of software used in Europe and beyond.

Digital Services Act (DSA) and Digital Markets Act (DMA)

From enhancing user rights and creating safer digital environment, to dismantling online monopolies and big platforms the Digital Services Act (DSA) and Digital Markets Act (DMA) were introduced this year by the EU to provide a framework for improving user safety, ensuring fair competition, and fostering creativity online.

The DSA improves user safety and platform accountability by regulating how they handle illegal content and requiring transparency in online advertising and content moderation. The DMA on the other hand focuses on promoting fair competition by targeting major digital platforms which it calls “gatekeepers,” setting obligations to prevent anti-competitive practices and promoting interoperability, fair access to data, and non-discriminatory practices​.

Artificial Intelligence Regulation: A Skeptical Eye

I had to mention the AI Act, since it was recently passed. It’s designed to ensure safety, transparency, and protection of fundamental rights. The law focuses on ensuring the safety, transparency, and ethical use of AI systems, classifying them based on risk levels and imposing stringent requirements on high-risk applications. Nobody on either side of the debate is happy with it as far as I can tell. As an AI luddite, my criticism is that doesn’t go far enough to address the environmental impact of machine learning and training large models, particularly as we live in a climate emergency.

Chat Control Legislation: Privacy at Risk

One of the most worrying developments at the moment is the chat control provisions under the Regulation to Prevent and Combat Child Sexual Abuse (CSAR). Recent proposals includes requirements for users to consent to scanning their media content as a condition for using certain messaging features. If users refuse, they would be restricted from sharing images and videos.

Obviously I don’t have to tell you what a privacy nightmare that is. It fundamentally undermines the integrity of secure messaging services and effectively turns user devices into surveillance tools​. Furthermore, experts have doubted the effectiveness of this scanning in combatting CSA material, as these controls can be evaded or alternative platforms can be used to share them. Even private messaging app Signal’s CEO Meredith Whittaker has stated that they would rather leave the EU market than implement these requirements.

Fingers Crossed for the Elections

In conclusion, we’ve seen how the EU is shaping our daily lives and the global digital ecosystem beyond just cookie banners. Regulations like the Cyber Resilience Act, Digital Services Act, and Digital Markets Act are already affecting how we make decisions and interact with software and hardware, and will bring improvements in digital security, competition, and enjoyment of rights for years to come.

Proposals like the chat control one demonstrate the potential of how it can also negatively impact us. I’ll be watching as those elections unfold, and urge to all to stay informed to follow these developments. We’ve seen from the CRA process how positive engagement by subject matter experts can sometimes help steer the ship away from unseen icebergs.

Let’s Talk About Open Source in Munich (and Everywhere Else)

Updates/Edits:

When news broke about Schleswig-Holstein’s move to replace Microsoft Office with LibreOffice, it felt like a breath of fresh air. It wasn’t just the fact that they’re switching to open source, the framing was also on point. It wasn’t just about cost saving, but they talked also about digital sovereignty and innovation. As a fan of the open source movement and of sound public policy, it really spoke to me.

Yet as expected, whenever any news breaks about open source in public administration, a few are quick to point out: “Didn’t Munich switch to Linux for a few years then switch back to Windows?” (referring to the LiMux project). I never really knew what to respond to those people. That is until last week, when I came across this amazingly put together OSOR case study, written by Ola Adach, on my Mastodon feed (shared by Andrew (@puck@mastodon.nz)). It was an eye opener about how there’s much more to the Munich story, and I would like to talk about that and on the future of open source in public admin in Germany.

The Naysayers’ Favorite Scapegoat: Munich’s LiMux

Munich’s LiMux project is often dragged into conversations as an example of why open source might not be the best choice for public administration. Sure, LiMux faced its share of challenges—interoperability issues, lack of sustained political support, and logistical hurdles. But if you dig deeper as they did in that case study, you’ll find that despite these setbacks, Munich’s efforts weren’t in vain. The city saved millions of euros and paved the way for future open source projects. Here’s a short summary of the story of LiMux

The LiMux project began in the early 2000s when Munich’s administration faced the costly prospect of upgrading from Windows NT 4.0. Opting instead for a switch to an open-source operating system based on Ubuntu Linux, the city council approved the LiMux project in 2003. By 2012, 12,600 desktops were running LiMux, and by 2013, the project saved the city an estimated €11 million.

But the move wasn’t just about cost-savings. In retrospect, it should be seen as a truly visionary move. Many years later, in 2019, a PWC study commissioned by the German interior ministry (BMI) warned about the country’s heavy reliance on Microsoft software and the risks that poses to digital sovereignty (96% of public officials’ computers in Germany ran on Microsoft!). In the US where there is a similar dependency on Microsoft products in federal government, ex-White House cyber policy director notes that it also poses a significant security threat.

The OSOR case study and the PWC report also shows how LiMux project’s challenges were really multifaceted and can’t be reduced to “open source bad, propriety good”. Some city departments needed specific software that only ran on Windows due to compliance or legal reasons, or when open source alternatives didn’t exist. Plus, there were issues with bugs and missing features in LiMux. Interoperability and document compatibility was also a pain— highlighting the importance of open standards and regulation.

The scale of the transition required a lot of internal communication and organization, which can cause a lot of friction in day to day work. Most notably however, a transition of this scale required a strong and consistent political backing, which seems like it kind of faltered in Munich at some point after the 2014 elections. The sum of these issues eventually led to the decision to revert to Windows 10 in 2017.

There’s a lot we can learn from the Munich example, to borrow from the case study with some insights from me:

  1. Better Communication: Public administrations need to talk more to each other and share their experiences to make these projects work. It’s certainly not easy in a country as big and federated as Germany, but it’s doable.
  2. Local Tech Capacity Building: Involving local and regional IT companies boosts tech independence, and keeps public money circulating within the economy, much better use of public funds than relying on proprietary vendors.
  3. Manageable and Scalable Goals: Custom-built solutions are tricky and take some time to get right. A progressive transition to more open source software might be better than trying to engineer an all in one solution.
  4. Training Matters: Employees need proper training to adapt to open source tools smoothly, particularly if they’re only used to proprietary solutions at home or at school.
  5. Sustained Political Support: Consistent political backing is crucial for the success any large-scale project, and transition to open source is certainly not special in that regard. If a project is not allowed it’s due time to work out kinks and develop an ecosystem then administrations will be stuck in proprietary walled gardens.

One last takeaway from that case study is, it’s not fair to say that Munich has given up on open source, because it clearly hasn’t. The 2020 local elections brought in a coalition that promised to use open standards and open source whenever possible, and consider open source as a criteria in public procurement. This aligned with the strategic recommendations of the PwC report, which suggested fostering the use of open source to mitigate dependency on a few software providers.

Furthermore it mandated that all software developed by the city’s IT department, it@M, should be shared on the organisation’s public Github repository. In 2020, the city council set up an Open Source Hub to encourage collaboration on open source projects. Most recently in November 2023, the city launched https://opensource.muenchen.de/ to highlight its open source efforts. Open source in Munich is alive and well.

Momentum is Building in Open Source in Public Administration

Schleswig-Holstein’s recent announcement and the Munich examples aren’t happening in a vacuum. We’re not in 2012 anymore, across Germany, there’s a growing momentum towards adopting open source in public administration. According to the Bitkom Open Source Monitor 2023, 59% percent of surveyed public administrations leveraged open source software. Less impressive though, only 29% actually had an open source strategy.

This lack of strategy is compounded by the fact that the federally coordinated efforts have stagnated for decades now. When it comes to federal efforts to promote open source software in the public administration, there’s two stories I need to tell: OpenDesk and dPhoenixSuite.

dPhoenixSuite, is a solution marketed as a digitally sovereign workspace for public administrations. It is developed by Dataport, a non-profit public institution founded in 2004 by Hamburg, Bremen, Schleswig-Holstein, and Saxony-Anhalt, to provide software for the public administration of those federal states. Since its inception, Dataport has grown significantly, reaching a revenue of one billion euros in 2021 and is reportedly planning to double both its revenue and workforce by 2027.

While dPhoenixSuite incorporates many open-source components and their work has been somewhat well received, the overall suite remains proprietary and must run on Dataport’s servers, limiting public access to the project and effectively locking Dataport as the only “vendor”. That, along with a history of delays, lack of transparency and under delivering have drawn lots of criticism, least of which from organizations like the Free Software Foundation Europe.

This leads us to 2021 when OpenDesk was announced, an initiative led by the German Federal Ministry of the Interior (BMI) to create a fully open-source workspace suite for public administrations. The suite is based on the various open-source components which also formed the bulk of dPhoneixSuite such as Univention Corporate Server, Collabora Online, Nextcloud, OpenProject, XWiki, Jitsi, and the Matrix client Element. It is also designed to be extensible to meet specific administrative needs. Starting in 2024, the coordination and management of OpenDesk will be handed over to the Centre for Digital Sovereignty (ZenDiS GmbH).

However, as reported by Netzpolitik, despite initial enthusiasm and some early adoption by institutions like the Robert Koch Institute, progress has been slow. The government has not been able to provide adequete financial support, allocating only 19 million euros for 2024, far less than the 45 million euros ZenDiS calculated it needs.

Additionally, while several federal states like Schleswig-Holstein and Thuringia are interested in joining ZenDiS, their membership processes are stuck at the federal level, causing frustration. I do hope is that ZenDIS and the OpenDesk initative can help break the gridlock and move open source in the public administration forward, but if we are to learn from LiMux, the political will and full commitment needs to be there lest we end up with another cautionary tale.

On a brighter front, recently launched was also the Open CoDE platform, the central repository for open source software in public administration started by the BMI and the federal states of Baden-Württemberg and North Rhine-Westphalia. It hosts the OpenDesk code amongst 1000+ other projects, really exciting to browse through so I’d recommend it!

Finally, I also must plug my employer here, because a successful sovereign work space can only be built and sustained on sound and solid sovereign digital infrastructure. All this increased dependence on digital software means the few people who maintain that critical infrastructure underneath (libraries, operating systems, developer tooling) needs more maintenance, and that’s where the Sovereign Tech Fund comes in, supported by the German Federal Ministry for Economic Affairs and Climate Action (BMWK).

Is the Future is Bright for Open Source in Public Administration?

I’m ending on a question because I have many at the moment, but also reason to be hopeful. I can’t wait to see what ZenDIS and the OpenDesk project achieve in the coming years, but also perhaps it’s just not just the big projects that deserve our attention, but also the progressive and incremental work by city level IT departments like it@M, Dortmund and Berlin (the self-titled Open Source Big 3).

Also, news like the ones coming from Schleswig-Holstein, are refreshing, but we also have to learn from the past, whether it’s LiMux or dPhoneixSuite (if you haven’t made the connection yet, Dataport is still the official IT provider for Schleswig-Holstein AFAICT). It must be done for the right strategic reasons, and the commitment must be there on the long term.

If you’ve made it this far down, thank you, I set off to write a short blog post about the Munich case study by the OSOR but it snowballed into all of this, hope you found it interesting. I’d love to hear from you what you think the future will bring to Open Source in public administration or what your favorite public admin OS project is.