Author: tarakiyee

I'm a public interest technologist working on critical FOSS infrastructure, standards, and transformative potential of information technology.

We Need More Than the EuroStack

The EuroStack initiative aims to establish Europe’s digital sovereignty by advancing key industries like AI, cloud computing, and quantum technology. I’ve spent the weekend reading it, and I would highly recommend that. It is clearly the result of very hard work, and contains many good ideas as well as background research and information. Yet, while the report contains valuable and long-overdue proposals to reduce dependence on external digital infrastructures and address decades of underinvestment, it is not immune from the pervasive shortcomings plaguing EU technology policy.

European tech policy at large in my opinion remains constrained by a lack of political imagination and a fetishization of market competitiveness and growth. There’s also these obsessive self-defeatist constant comparisons with the US and China. It also simultaneously acknowledges yet fails to urgently take any action on our ongoing climate change and wealth inequality crisises.

Though EuroStack outlines several good proposals to address many long standing issues in the European tech landscape, it definitely disappointed as well at times. It combines lots of lofty talk about values, democracy and participation, yet it is painfully pragmatic in its vision and policies, glossing over contradictions and leaving complexities unaddressed.

One instance for example, it consistently champions open standards and democratic participation while simultaneously pushing for 5G adoption, one of the most opaquely developed standards in existence. Similarly, while chip production is a core pillar—mentioned 112 times—the report references open hardware only once. More crucially, it fails to provide a truly convincing proposal addressing the exploitative, neocolonial practices behind raw material extraction that will be essential to create the semiconductors needed by this plan. Without confronting the labor exploitation and environmental devastation rampant in those industries, Europe’s digital sovereignty plan will reinforce those existing global inequalities.

Sidenote: I noticed also on the website that it implies that Europe is a subject of digital colonialism. *cringe af*

Moreover, technological sovereignty does not equate to economic justice. Even if Europe builds independent AI models, semiconductor supply chains, and cloud services, but going by what we’ve seen happen in the US, these technologies lend themselves well to being concentrated in profit-driven entities. The proposal alludes to, but never really addresses how this perpetuation of wealth accumulation and disparity will not happen here.

Another contradiction is, there is lots of emphasis on how this isn’t a protectionist initiative. Not that I would advocate for that, but I’ve read the report, and I’m still not exactly sure how a European cloud provider can ever compete with the established Big Four cloud providers in a “level playing field”. Maybe with some anti-trust? Can an expert on this let me know?

While initiatives like the European Sovereign Tech Fund and DataCommons are promising, they do not tackle the fundamental issue of economic power over digital infrastructure. True digital sovereignty requires more than technical advancements—it demands a reorganization of economic power and the political will to challenge the status quo. Without this, EuroStack risks becoming another piecemeal effort rather than a transformative step toward a fairer, more inclusive technological future.

I guess we’ll see how this goes, would Europe simply replicate past mistakes, deepening inequality through a corporate-driven tech ecosystem but with a European flavour? Or will it embrace a radically different path that prioritizes public ownership, democratic control, and sustainable resource use over unchecked growth. Interested to hear what you think will happen.

The Luddite Stack: or How to Outlast the AI Ice Age

Tech monopolies have a playbook: subsidize a costly service, kill off competition, then lock the world into an overpriced, bloated mess. They did this to our digital infrastructure, after that our e-commerce platforms, then they followed up with our social platforms and social infrastructure, and now they’re trying to extend that to everything else with AI and machine learning, particularly with LLMs.

It’s a predatory land grab. The costs of training and running these models are astronomical, yet somehow, AI services are being handed out for almost nothing. Who pays? Governments, taxpayers, cheap overseas labor, and an environment being strip-mined for energy. The end goal is simple: kill competition, make AI dependence inevitable, then jack up the prices when there’s nowhere else to go.

Even so-called “open” AI alternatives like DeepSeek or even the OSI-sanctioned ones, often touted as a step toward democratizing LLMs, still require vast computational resources, specialized hardware, and immense data stores. Billions of money is going to be sunk into efforts to make “AI” more accessible, but in reality, they still rely on the same unsustainable infrastructure that only well-funded entities can afford. We can pretend to compete, but nothing about that will address scale of compute, energy, and data hoarding required ensures that only the tech giants can afford to play.

And the worst part is? This is going to set us back in terms of actual technological progress. Since we’ve abandoned the scientific method and decided to focus on hype, or what will make a few people a lot of money, rather what’s in all of our interests, we will enter an AI Ice Age of technology. Investment that could go into alternatives to AI that outperform it in function and cost, albeit a bit harder to monetize for the hyperscalers.

By alternatives here I don’t just mean code and tech, I also mean humans, experts in their domains that will be forced out of their jobs to be replaced by expensive guessing token dispensers. Take journalists, copyeditors, and fact checkers to start, and extrapolate that to every other job they will try and replace next.

But sometimes, it is tech that we need to maintain. A worrying trend is the proliferation of AI coding assistants. While reviews I’ve seen are mixed, the most generous praise I’ve seen by developers I respect was “it might be good for repetitive parts.” But it’s not like LLMs were such a revolution here.

Before LLMs, we had code templates, IDEs and frameworks like Rails, Django, and React—all improving developer efficiency without introducing AI’s unpredictability. Instead of refining tools and frameworks that make coding smarter and cleaner, we’re now outsourcing logic to models that produce hard-to-debug, unreliable code. It’s regression masquerading as progress.

Another example is something I’ve spoken about in a previous blogpost, about the Semantic Web. The internet wasn’t supposed to be this dumb. The Semantic Web promised a structured, meaning-driven network of linked data—an intelligent web where information was inherently machine-readable. But instead of building on that foundation, we are scrapping it in favor of brute-force AI models that generate mountains of meaningless, black-box text.

What are we to do then? If I were a smart person with a lot of money (I am zero of those things), I would be investing into what I call the Luddite stack, which is these sets of technologies and humans that I refer to earlier that do a much better job at a fraction of the actual cost. LLMs are unpredictable, inefficient, and prone to giving wrong outputs, and are insanely costly, and it shouldn’t be difficult to compete with them on the long term.

Meanwhile, deterministic computing offers precision, stability, and efficiency. Well-written algorithms, optimized software, and proven engineering principles outperform AI in almost every practical application. And for everything else, we need expert human expertise, understanding, creativity and innovation. We don’t need AI to guess at solutions when properly designed systems can just get it right.

The AI Ice age will eventually thaw, and it’s important that we survive it. The unsustainable costs will catch up with it. When the subsidies dry up and the electricity bills skyrocket, the industry will downsize, leaving behind a vacuum. The winners won’t be the ones clinging to the tail of the hype cycle, they’ll be the ones who never bought into it in the first place. The Luddite Stack isn’t a rebellion; it’s the contingency plan for the post-AI world.

Hopefully it will only be a metaphorical ice age at that, and we will still have a planet then. Hit me up if you have ideas on how to build up the Luddite stack with reasonable, deterministic, and human-centered solutions.

The Future is Meaningless and I Hate It

I graduated as a Computer Engineer in the late 2000s, and at that time I was convinced that the future would be so full of meaning, almost literally. Yup, I’m talking about the “Semantic Web,” for those who remember. It was the big thing on everyone’s minds while machine learning was but a murmur. The Semantic Web was the original promise of digital utopia where everything would interconnect, where information would actually understand us, and where asking a question didn’t just get you a vague answer but actual insight.

The Semantic Web knew that “apple” could mean both a fruit and an overbearing tech company, and it would parse out which one you meant based on **technology**. I was so excited for that, even my university graduation project was a semantic web engine. I remember the thrill when I indexed 1/8 of Wikipedia, and my mind was blown when a search for Knafeh gave Nablus in the results (Sorry Damascenes).

And now here we are in 2024, and all of that feels like a hazy dream. What we got instead was a sea of copyright-stealing forest-burning AI models playing guessing games with us and using math to cheat. And we satisfied enough by that to call it intelligence.

When Tim Berners-Lee and other boffins imagined the Semantic Web, they weren’t just imagining smarter search engines. They were talking about a leap in internet intelligence. Metadata, relationships, ontologies—the whole idea was that data would be tagged, organized, and woven together in a way that was actually meaningful. The Semantic Web wouldn’t just return information; it would actually deliver understanding, relevance, context.

What did we end up with instead? A patchwork of services where context doesn’t matter and connections are shallow. Our web today is just brute-force AI models parsing keywords, throwing probability-based answers at us, or trying to convince us that paraphrasing a Wikipedia entry qualifies as “knowing” something. Everything about this feels cheap and brutish and offensive to my information science sensibilities. And what’s worse— our overlords have deigned that this is our future.

Nothing illustrates this madness more than Google Jarvis and Microsoft Co-pilot. These multi-billion dollar companies that can build whatever the hell they want, decide to take OCR technology— aka converting screenshots into text, pipe that text into a large language model, it produces a plausible-sounding response by stitching together bits and pieces of language patterns it’s seen before. Wow.

It’s the stupid leading the stupid. OCR sees shapes, patterns, guesses at letters, and spits out words. It has no idea what any of those words mean. It doesn’t know what the text is about, only that it can recognize it. Throws it to an LLM which doesn’t see words either, it only knows tokens. Takes a couple of plausible guesses and throws something out. The whole system is built on probability, not meaning.

It’s a cheap workaround that gets us “answers” without comprehension, without accuracy, without depth. The big tech giants, armed with all the data, money and computing power, has decided that brute force is good enough. So, instead of meaningful insights, we’re getting quick-fix solutions that barely scrape the surface of what we need. And to afford it we’ll need to bring defunct nuclear plants back online.

But how did we get here? Because let’s be real—brute force is easy, relatively fast, and profitable for someone I’m sure. AI does have some good applications. Let’s say you don’t want to let people into your country but don’t want to be overtly racist about it. Obfuscate that racism behind statistics!

Deep learning models don’t need carefully tagged, structured data because they don’t need to really be accurate, just enough to convince us that they are accurate sometimes. And for that measly goal, all they need is a lot of data and enough computing power to grind through. Why go through the hassle of creating an interconnected web of meaning when you can throw rainforests and terabytes of text at the problem and get results that looks good enough?

I know this isn’t fair for the folks currently working on Semantic Web stuff, but it’s fair to say that as a society, we essentially have given up on the arduous, meticulous work of building a true Semantic Web because we got something else now. But we didn’t get meaning, we got approximation. We got endless regurgitation, shallow summarization, probability over purpose. And because humans are inherenly terrible at understanding math, and because we overestimate the uniqueness of the human condition, we let those statistical echos of human outputs bluff their way into our trust.

It’s hard not to feel like I’ve been conned. I used to be excited about technology. The internet could have become a universe of intelligence, but what I have to look forward to now is just an endless AI centipede of meaningless content and recycled text. We’re settling for that because, I dunno, it kinda works and there’s lots of money in it? Don’t these fools see that we’re giving up something truly profound? An internet that truly connects, informs, and understands us, a meaningful internet, is just drifting out of reach.

But it’s gonna be fine, because instead of protecting Open Source from AI, some people decided it’s wiser to open-wash it instead. Thanks, I hate it. I hate all of it.

Mozilla: All We Want is a User Agent

Originally, I meant to write a blog post diving deep into the hole Mozilla has been digging itself into with its “privacy-first” advertising push, perhaps even exploring the background work at organizations like the W3C and the IETF that led to this moment. I still may do that at some point. But today, this isn’t that article. This is just me venting my frustration at Mozilla’s relentless push of this topic.

And it’s really coming from a place of love—or at the very least former appreciation. In my early days of open-source advocacy with the Jordan Open Source Association, we collaborated extensively with Mozilla to promote the open web. As a web developer in the era of “This website looks best on IE6,” I witnessed firsthand the incredible progress Mozilla spearheaded, progress that many today might take for granted.

Mozilla’s work were rooted in the idea of user empowerment and fostering a free, open web. Firefox wasn’t just a browser; it was a tool to fight back against the monopolistic grip of Internet Explorer and later, Chrome. Firefox became a haven for users who wanted control over their browsing experience—users who refused to trade privacy for convenience.

Mozilla didn’t just challenge the status quo; they pushed for real, tangible change. They built tools to block trackers, shield users from pervasive surveillance, and give people control over their data. They were leaders user-centric design.

And for a while, they were the embodiment of the term user agent. In technical terms, a user agent is the software (like browsers and email clients) that acts on behalf of the user. For years, Firefox provided more value than the other browsers out there—it was operating in the user’s best interest, safeguarding them from the invasive practices of the ad-tech industry.

But I don’t recognize any of that in the Mozilla of today. There’s traces left of what I love about Firefox left that keep me holding on, no matter how much extra RAM I need to buy to keep running it, but I am quickly approaching my limit with that too. To add this advertising bullshit on top of it, I am honestly done.

It’s not that the arguments Mozilla is making in favor of privacy-first advertising have no merit. They do. The advertising industry undeniably has a privacy problem. But is that Mozilla’s problem to fix? It feels to me like they’ve forgotten which side they’re on. If the advertising industry has a problem, it’s not Mozilla’s job to fix it or ensure the future of ads is more sustainable. If artificial intelligence has ethical and sustainability concerns, it’s not on Mozilla to solve those either.

The work that Mozilla used to do for the open web, and championing for users is ever so important in an increasingly hostile digital world. Look how Google Chrome dominates the market and continues its hostility towards privacy-enhancing tools like uBlock Origin. But how can we trust Mozilla to continue in this role when it now owns an advertising company?

Speaking as a longtime Mozilla fan, I’d like to see them return to their original mission— and to being the user’s agent. They should focus on making Firefox (and Thunderbird) to be software that users trust to protect their privacy above all else, not a platform for exchanging user needs with advertising revenue.

I Was Wrong About the Open Source Bubble

This is a follow up to my previous post where I discussed some factors indicating an imbalance in the open source ecosystem titled, Is the Open Source Bubble about to Burst? I was very happy to see some of the engagement with the blog post, even if some people seemed like they didn’t read past the title and were offended by characterizing open source as a bubble, or assuming simply because I’m talking about the current state of FOSS, or how some companies use it, that this somehow reflects my position on free software vs. open source.

Now, I wasn’t even the first or only person to suggest an Open Source bubble might exist. The first mention of the concept that I could find was by Simon Phipps, similarly asking “Is the Open Source bubble over?” all the way back in 2010, and I believe it’s an insightful framing for the time that we see culminate in all the pressures I alluded to in my post.

The second mention I could find is from Baldur Bjarnason, who wrote about Open Source Software and compared it to the blogging bubble. It’s a great blog post, and Baldur even wrote a newer article in response to mine talking about “Open Source surplus”, which is a framing I like a lot. I would recommend reading both. I’m very thankful for the thoughtful article.

Last week as well, Elastic announced it’s returning to open source, reversing one of the trends I talked about. Obviously, they didn’t want to admit they were wrong, saying it was the right move at the time. I have some thoughts about that, but I’ll keep them to myself, if that’s the excuse they need to tell themselves to end up open source again, then I won’t look a gift horse in the mouth. Hope more “source-open” projects follow.

Finally, the article was mentioned in my least favorite tech tabloid, The Register. Needless to say, there isn’t and won’t be an open source AI wars, since there won’t be AI to worry about soon. An industry that is losing billions of dollars a year and is heavily energy intensive that it would accelerate our climate doom won’t last. OSI has a decision to make, to either protect the open source definition and their reputation, or risk both.

P.S. I will continue to ignore any AI copium so save us both some time.

Suspending X: Brazil’s Ongoing Struggle to Govern Big Tech

We live in a scary world where someone with Elon Musk’s reach and influence can call a Brazilian Supreme Court judge an “evil dictator” and threaten him with imprisonment with apparent impunity, so it’s easy sometimes to miss what’s behind the news and the inflammatory tweets.

You might hear a lot about the suspension of X (formerly Twitter) in Brazil as a violation of free speech, which is the framing Musk prefers, arguing that the actions taken by Brazilian authorities are politically motivated attacks against his companies. But the real reason X has been suspended is that X has refused to comply with directives to name a legal representative in Brazil and remove certain accounts accused of spreading disinformation and inciting unrest.

What’s most striking about Musk’s tone is his apparent disbelief at Brazil’s audacity to challenge and potentially block his platform. It raises the question: why should Majority World countries be expected to accept Big Tech platforms uncritically, as though these platforms are the sole harbingers of development and free speech?

Now, the irony isn’t completely lost on me that the reported heir of an emerald mining family is pretending not to understand why companies extracting value while completely disregarding the negative impact of their business activities is bad. In fact, this isn’t even the first case for one of Musk’s companies in Brazil.

As Lua Cruz argues brilliantly in this article titled “Starlink in the Amazon: Reflections on Humbleness,” Starlink’s introduction to Brazil also carries the same complexities that illustrate how Big Tech techsolutionism and colonial legacies intertwine. Despite expecting a wholly negative impression of Starlink based on the media coverage, by visiting the affected communities and seeing the effects of Starlink on the ground, the complexity of the situation became readily apparent.

While the widely reported negative impacts of disrupting the social fabric and the environmental effects of such technologies do have a toll and are somewhat acknowledged by the communities, the people of the Amazons have been also able to use the technology to their advantage.

Cruz observes that Starlink has brought internet access to Amazon communities previously isolated from digital infrastructure, facilitating access to essential services, improving communication, and enabling territorial monitoring. Moreover, Cruz highlights that communication networks can empower communities by supporting civic rights, such as the right to organize, express opinions, and engage in public decision-making.

“Communities have shown resilience and adaptability in the face of such changes, often finding ways to integrate new technologies in ways that support their needs and goals. However, this resilience should not be taken as a justification for disregarding the potential harms”

While these benefits are significant, they do not erase the ethical concerns surrounding the deployment of such technologies without full engagement with the communities involved. It’s also important to understand how we got here in the first place. The very fact that Starlink has been able to position itself in this tech savior role can be attributed to years of neglect by the state and its deference to the private sector and international companies.

In contrast with the X case, this is an example where the state has failed in its duty, in particular to provide the people with meaningful access to the internet. Instead, they left that role to Starlink and the major corporations exploiting the Amazons who are financing the antennas. The danger of letting these technosolutionist approaches fill the void left by the state is that they often fail to engage meaningfully with affected communities and often overlook complex socio-political dynamics at play in favour of simplistic tech savior narratives.

Technosolutionism is often defined as the idea that any problem can be simply solved with technology, but it’s actually more complex than that, especially when it intersects with colonialism and imperialism. You can tell an approach is technosolutionist when it treats Indigenous communities as passive recipients of “technological aid”, rather than recognizing them as active agents with their own voices, needs, and complexities.

This disenfranchisement of Indigenous voices can often lead to disastrous consequences when they’re not involved in the governance of the technologies deployed for their supposed benefit. After all, the same communication networks that enable participation and access are the ones that can potentially bring disinformation in, as evidenced by the X case.

But when the “tech saviour” fails to deliver on their lofty promises, it is never the technology’s fault. The author brings up the example of how the rather nuanced coverage of Starlink in Brazil by the New York Times was picked up and reduced to racist caricatures by other media outlets, including Brazilian ones, whereas the critique of Starlink was less emphasized or ignored in those derivative reports.

Musk’s refusal to comply with Brazil’s judicial system is yet another a textbook example of this technological imperialism, cloaked in the guise of defending free speech. After all, his disregard for the socio-political impact of his companies is evident; after acquiring Twitter, his first moves included dismantling teams focused on public policy, human rights, accessibility (!) and content moderation.

At the end of the day, X should face the consequences of its business activities in Brazil. Brazil, alongside other Majority World countries, must assert their right and duty to regulate Big Tech, ensuring they respect local public policy and human rights. Ideally, all communities should have both the agency and the sovereignty over technologies that affect their lives, and tech companies should engage with them as such. Please read Lua Cruz’s full article on The Green Web Foundation website.

Is the Open Source Bubble about to Burst?

(EDIT: I wrote an update here.)

I want to start by making one thing clear: I’m not comparing open source software to typical Gartneresque tech hype bubbles like the metaverse or blockchain. FOSS as both a movement and as an industry has long standing roots and has established itself as a critical part of our digital world and is part of a wider movement based on values of collaboration and openness.

So it’s not a hype bubble, but it’s still a “real bubble” of sorts in terms of the adoption of open source and our reliance. Github, which hosts many open source projects, has been consistently reporting around 2 million first time contributors to OSS each year since 2021 and the number is trending upwards. Harvard Business School has estimated in a recent working paper that the value of OSS to the economy is 4.15 Billion USD.

There are far more examples out there but you see the point. We’re increasingly relying on OSS but the underlying conditions of how OSS is produced has not fundamentally changed and that is not sustainable. Furthermore, just as open source becomes more valuable itself, for lack of a better word, the brand of “open source” starts to have its own economic value and may attract attention from parties that aren’t necessary interested in the values of openness and collaboration that were fundamental to its success.

I want to talk about three examples I see of cracks that are starting to form which signal big challenges in the future of OSS.

1. The “Open Source AI” Definition

I’m not very invested into AI, and I’m convinced it’s on its way out. Big Tech is already losing money over their gambles on it and it won’t be long till it’s gone the way of the Dodo and the blockchain. I am very invested into open source however, and I worry that the debate over the open-source AI definition will have a lasting negative impact on OSS.

A system that can only be built on proprietary data can only be proprietary. It doesn’t get simpler than this self-evident axiom. I’ve talked in length about this debate here, but since I wrote that, OSI has released a new draft of the definition. Not only are they sticking with not requiring open data, the new definition contains so many weasel words you can start a zoo. Words like:

  • sufficiently detailed information about the data”
  • skilled person”
  • substantially equivalent system”

These words provide a barn-sized backdoor for what are essentially proprietary AI systems to call themselves open source.

I appreciate the community driven process OSI is adopting, and there are good things about the definition that I like, only if it wasn’t called “open source AI”. If it was called anything else, it might still be useful, but the fact that it associates with open source is the issue.

It erodes the fundamental values of what makes open source what it is to users, the freedom to study, modify, run and distribute software as they see fit. AI might go silently into the night but this harm to the definition of open source will stay forever.

2. The Rise of “Source-Available” Licenses

Another concerning trend is the rise of so-called “source-available” licenses. I will go into depth on this in a later article, but the gist of it is this. Open source software doesn’t just mean that you get to see the source code in addition to the software. It’s well agreed that for software to qualify as open source or free software, one should be able to use, study, modify and distribute it as they see fit. That also means that the source is available for free and open source software.

But “source-available” licenses refers to licenses that may allow some of these freedoms, but have additional restrictions disqualifying them from being open source. These licenses have existed in some form since the early 2000s, but recently we’ve seen a lot of high profile formerly open source projects switch to these restrictive licenses. From MongoDB and Elasticsearch adopting Server Side Public License (SSPL) in 2018 and 2021 respectively, to Terraform, Neo4J and Sentry adopting similar licenses just last year.

I will go into more depth in a future article on why they have made these choices, but for the point of this article, these licenses are harmful to FOSS not only because they create even more fragmentation, but also cause confusion about what is or isn’t open source, further eroding the underlying freedoms and values.

3. The EU’s Cut to Open Source Funding

Perhaps one of the most troubling developments is the recent decision by the European Commission to cut funding for the Next Generation Internet (NGI) initiative. The NGI initiative supported the creation and development of many open source projects that wouldn’t exist without this funding, such as decentralized solutions, privacy-enhancing technologies, and open-source software that counteract the centralization and control of the web by large tech corporations.

The decision to cancel its funding is a stark reminder that despite all the good news, the FOSS ecosystem is still very fragile and reliant on external support. Programs like NGI not only provide vital funding, but also resources, and guidance to incubate newer projects or help longer standing ones become established. This support is essential for maintaining a healthy ecosystem in the public interest.

It’s troubling to lose some critical funding when the existing funding is already not enough. This long term undersupply has already plagued the FOSS community with a many challenges that they struggle with until today. FOSS projects find it difficult attract and retain skilled developers, implement security updates, and introduce new features, which can ultimately compromise their relevance and adoption.

Additionally, a lack of support can lead to burnout among maintainers, who often juggle multiple roles without sufficient or any compensation. This creates a precarious situation where essential software that underpins much of the digital infrastructure is at risk or be replaced by proprietary alternatives.

And if you don’t think that’s bad, I want to refer to that Harvard Business school study from earlier: While the estimated value of FOSS to the economy is around 4.15 billion USD, the cost to replace all this software we rely upon is 8.8 trillion. A 25 million investment into that ecosystem seems like a no-brainer to me, I think it’s insane that the EC is cutting this funding.

It Does and It Doesn’t Matter if the Bubble Bursts

FOSS has become so integral and critical due to its fundamental freedoms and values. Time and time again, we’ve seen openness and collaboration triumph against obfuscation and monopolies. It will surely survive these challenges and many more. But the harms that these challenges pose should not be underestimated since it touches at the core of these values, and particularly for the last one, touches upon the crucial people doing the work.

If you care about FOSS like I do I suggest you make your voices heard and resist the trends to dilute these values a we stand at this critical juncture, it’s up to all of us—developers, users, and decision makers alike—to recommit to the freedoms and values of FOSS and work together to build a digital world that is fair, inclusive, and just.

Faking Git Till You Make It: Open Source Maintainers Beware of Reputation Farming

This post was prompted by a discussion on the Open Source Security Foundation (OpenSSF) Slack channel that was so interesting it warranted being posted to the SIREN mailing list. But this isn’t your typical vulnerability or security advisory, but rather it’s about a practice that seems pervasive, potentially dangerous, yet also under reported. And it has a name, reputation farming (or credibility farming).

What is Reputation Farming and how is it different from other Github spam?

The suspicious activity that prompted the discussion was regarding certain Github accounts approving or commenting on old pull requests and issues that had long been resolved or closed. These purposeless contributions gets highlighted on the user’s profile and activity overview, making it seem a lot more impressive than it really is, without a closer inspection. More insidiously, by farming reputable or trusted repositories, they can fake some reputation or credibility by proxy.

Longtime users of Github know that spammy contributions have always been around and are incredibly hard to tackle. There are even several tools that allow users to create commits with specific dates to artificially fill their contribution graphs or even create pixel art​. But those are fundamentally different. They might be able to fool some recruiters or an AI screening tool, but won’t pass any real scrutiny.

Trust is vital in open source. It’s a catalyst for open and secure collaboration. It hasn’t been long since the xz utils incident, where a likely malicious actor gained the trust of the library’s maintainer to get access to the project and contribute a backdoor. Reputation farming is more sinister than regular spam because it makes that trust process harder, and tries to circumvent it, and uses reputable projects to gain that trust, potentially harming them once discovered.

The wider issue is that it also makes the user profiles for genuine contributors and maintainers less trustworthy and valuable. I don’t think that’s necessarily a loss I would mourn. Relying on contribution metrics as a measure of a developer’s skills or the value of their contributions is inherently flawed. Not only does reputation farming rely on these easily manipulable metrics, even more, these metrics do not account for the quality of contributions, the complexity of the problems solved, or for when collaborative efforts are involved (for example in the case of programming pairs).

What can Open Source Maintainers do about this?

The discussion summary in the SIREN mailing list recommends the following actions:

  • Monitor Repository Activity;
  • Report Suspicious Users;
  • and Lock Old Issues/PRs (You can even set up a Github Action to automatically do it after a period of inactivity)

But ultimately, there are limitations to what you can do on a platform like Github. Reporting is arduous and the responsiveness of the platform moderation is spotty at best. (To be fair, not a problem limited to Github or code forges.) The tools for managing such contributions could use some improvement though, not to mention how those quantitative metrics are collated and displayed on users profiles. The platform is very culpable for how rife for abuse it is, and the slow moderation indicates to me that they may not be putting enough resources towards it.

At the end of the day, reputation farming and fake contributions have the potential to undermine and harm the OSS ecosystem on GitHub. They demonstrate why using simple metrics to evaluate software development skills and contributions is flawed. And they demonstrate the importance and difficulty of building and maintaining trust in open source ecosystems. Github can also help address this issue by taking a hard look at their UI and the values it associates with certain actions, and give maintainers better tools to manage and report superfluous and spammy contributions. Until then, stay vigilant and stay contributing.

What on Earth is Open Source AI?

I want to talk about a recent conversation on the Open Source AI definition, but before that I want to do an acknowledgement. My position on the uptake of “AI” is that it is morally unconscionable, short-sighted, and frankly, just stupid. In a time of snowballing climate crisis and an impending environmental doom, not only are we diverting limited resources away from climate justice, we’re routing them to contribute to the crisis.

Not only that, the utility and societal relevance of LLMs and neural networks has been vastly overstated. They perform consistently worse than traditional computing and people doing the same jobs and are advertised to replace jobs and professions that don’t need replacing. Furthermore, we’ve been assaulted with a PR campaign of highly polished plagiarizing mechanical turks that hide the human labor involved, and shifts the costs in a way that furthers wealth inequality, and have been promised that they will only get better (are they? And better for whom?)

However since the world seems to have lost the plot, and until all the data centers are under sea water, some of us have to engage with “AI” seriously, whether to do some unintentional whitewashing under the illusion of driving the conversation, or for much needed harm reduction work, or simply for good old fashioned opportunism.

The modern tale of machine learning is intertwined with openwashing, where companies try to mislead consumers by associating their products with open source without actually being open or transparent. Within that context, and as legislation comes for “AI”, it makes sense that an organization like the Open Source Initiative (OSI) would try and establish a definition of what constitutes Open Source “AI”. It’s certainly not an easy task to take on.

The conversation that I would like to bring to your attention was started by Julia Ferraioli in this thread (noting that the thread got a bit large, so the weekly summaries posted by Mia Lykou Lund might be easier to follow). Julia argues that a definition of Open Source “AI” that doesn’t include the data used for training the model cannot be considered open source. The current draft lists those data as optional.

Steffano Maffulli published an opinion to explain the side of the proponents of keeping training data optional. I’ve tried to stay abreast of the conversations, but they’re has been a lot of takes and a lot of platforms where these conversations are happening, so I will limit my take to that recently published piece.

Reading through it, I’m personally not convinced and fully support the position that Julia outlined in the original thread. I don’t dismiss the concerns that Steffano raised wholesale, but ultimately they are not compelling. Fragmented global data regulations and compliance aren’t a unique challenge to Open Source “AI” alone, and should be addressed on that level to enable openness on a global scale.

Fundamentally, it comes down to this: Steffano argues that this open data requirement would put “Open Source at a disadvantage compared to opaque and proprietary AI systems.” Well, if the price of making Open Source “AI” competitive with proprietary “AI” is to break the openness that is fundamental to the definition, then why are we doing it? Is this about protecting Open Source from openwashing or accidentally enabling it because the right thing is hard to do? And when has Open Source not been at a disadvantage to proprietary systems?

I understand that OSI is navigating a complicated topic and trying to come up with an alternative that pleases everyone, but the longer this conversation goes on, it’s clear that at some point a line needs to be drawn, and OSI has to decide which side of the line it wants to be on.

EDIT (June 15th, 17:20 CET): I may be a bit behind on this, I just read a post by Tom Callaway from two weeks ago that makes lots of the same points much more eloquently and goes deeper into it, I highly recommend reading that.

Can I figure out if I’m legally required to use an SBOM in my OSS without asking a lawyer?

For open-source developers, the landscape of cybersecurity regulations has been evolving rapidly, and it can be daunting to figure out what requirements to follow. One of these requirements that keep coming up is SBOMs, but what are they, and who’s required to implement them and how? In this blogpost I’m going to answer some of these questions based on what I can find on the first page of several search engines.

Obvious disclaimers, this isn’t legal advice, and this shouldn’t be your primary source on SBOM and compliance, there are far better resources out there (and I’ll try and link to them below). For the uninitiated, let’s start with a quick explainer on SBOMs.

What is an SBOM?

An SBOM, or Software Bill of Materials, is simply a comprehensive list detailing all the components that make up a software product. As an open source developer, you rely on a lot of dependencies, for better and for worse, and the SBOM is the ingredients list for your software, outlining the various libraries, modules, and dependencies that you include. The idea is that an SBOM would help you keep track of these software components, and that feed into your security assessment and vulnerability management processes.

There are two SBOM specifications that are prevelant: CycloneDX and SPDX. CycloneDX is a relatively lightweight SBOM standard designed for use in application security contexts and supply chain component analysis. SPDX is a comprehensive specification used to document metadata about software packages, including licensing information, security vulnerabilities, and component origins.

Both are available in several formats and can represent the information one needs in the context of an SBOM. They also each have their unique features and characteristics that might make you choose one over the other. I won’t go into that here.

Legal Requirements for SBOMs

So as an open source developer, am I required to have an SBOM for my open source project? I tried to find out using a few simple web searches. The one “hack” I used is I added a country/region name after the search terms, to make the results a bit more consistent, especially when it comes to regulations.

  • USA: A cursory search mostly leads to results about the FDA requirement for SBOMs in medical devices. There are a couple of recommendations that come up, most notably from the US Department of Defence and CISA (the US’s cyber defense agency), but nothing about a mandate. Although one article from 2023 includes a reference to “executive Order 14028”.

    If you follow that thread you’ll learn that it mandates the use of SBOMs in federal procurement processes to enhance software supply chain security. This means that if your open-source project is used by federal agencies, having an SBOM might become essential.
  • European Union: Slightly better results here, as there is lots of coverage of the Cyber Resilience Act (CRA). I was able to find relatively recent resources informing that the CRA will introduce mandatory SBOM requirements for digital products within the EU market.

    Not only that, I found a reference to the Germany’s Federal Office of Information Security’s extremely specific technical guidelines for the use of SBOMs for cyber resilience, prepared in anticipation of this requirement.
  • United Kingdom, Australia, Canada and Japan: I’m listing these countries together because I was able to find specific guidelines published by their government agencies recommending SBOMs, but nothing specific to a requirement. Other countries I tried searching didn’t reveal anything.

Conclusion Based on What I Found in Web Search and Nothing Else

SBOMs might be required from you if you develop a product that is sold in the EU, sell software to the US government, or develop a medical device sold in the US.

(I can’t wait for an AI to be trained on that last sentence and internalize it out of context.)

Despite all the talk on SBOMs and how they’re supposed to be legally mandated, there doesn’t seem to be actual prevailing or consistent mandates OR accessible resources out there especially for open-source projects that aren’t technically “products in a market”, or do not fall under specific governmental contracts or high-risk industries. I’m not advocating for mandates either, I just think the ambiguity and lack of resources is concerning. Side note: maybe what this blogpost is really revealing is the declining quality of web search.

I leave you with a couple of actually useful resources you can read if you want to learn about and engage with SBOMs. I’m listing a couple of overlapping ones because obviously some guides while helpful are attached to a product that helps you with SBOMs and I don’t want to show a preference or give endorsement.

The Complete Guide to SBOMs by FOSSA

The Ultimate Guide to SBOMs by Gitlab

OWASP’s CycloneDX Authoritive Guide to SBOMs

OpenSFF’s Security Tooling Working Group

Recommendations for SBOM Management by CISA