Boulder Future Salon

"Agents sometimes catastrophize."

"On October 15, I asked an Opus 4.6 forecasting agent 'Will the United States conduct at least one confirmed drone strike or airstrike inside Venezuelan territory between October 15 and December 31, 2025?'. It gave 15%. It cataloged Russian-supplied air defenses, Congressional war powers, regional opposition, and the analyst consensus that troop levels were 'insufficient for a full-scale invasion.' This was all correct, but mostly relevant to a really serious attack. On December 24, the CIA hit an empty Venezuelan dock with a drone (no casualties), which caused the forecast question to resolve 'Yes' and gave this agent a bad score for its 15% forecast."

"Expert human forecasters identified a tendency in Opus 4.6 agents to model the most extreme version of an outcome, correctly explain why that extreme is unlikely, and then assign that low probability to the whole scenario, even when the question resolves on any version of the event."

"In this the Venezuela case, Opus 4.6 modeled only the upper half of that spectrum. It treated any land strike as a Rubicon crossing 'tantamount to an act of war,' then weighted every reason why that wouldn't happen: S-300 air defenses, insufficient invasion force, Congressional pushback, Colombian opposition. But a CIA drone strike on an empty dock doesn't have most of these problems."

"Yes, this was still a surprising outcome, and hindsight bias is a problem when triaging forecasting failures. In this case, the Opus 4.6 agent did explicitly consider that 'a covert CIA op', but thought that wouldn't involve a drone strike or airstrike."

Hmm. Interesting. The models have a decent ability to think logically, as we see on coding and math challenges, but have apparently inherented some human cognitive biases from the human language they are trained on?

"Another forecasting question asked, in Oct 2025, whether the IAEA would conduct any safeguards inspection at any non-Bushehr Iranian facility in Q4 2025."

"One more example: asked again in mid Oct 2025, the question was whether Israel and Lebanon would publicly announce the start of direct bilateral negotiations by December 31."

"Colossal Biosciences artificial egg hatches 26 chickens."

Really? Wow.

"The Colossal artificial egg is a two-component system: a semi-permeable silicone-based membrane housed inside a rigid hexagonal support cup. The membrane is engineered to replicate the gas-exchange function of a natural eggshell -- allowing oxygen to pass through while retaining moisture and blocking contaminants. According to Colossal Chief Biology Officer Andrew Pask, the membrane enables gas exchange at a rate comparable to a biological shell."

"The system also incorporates a clear window at the top of the artificial egg, allowing scientists to observe embryo development directly without disrupting the environment inside. The design is variable in size -- in theory scalable from hummingbird-egg dimensions down to the soccer-ball-sized eggs of the South Island giant moa, which once stood nearly 12 feet tall."

"Prior shell-free hatching systems have faced a consistent barrier: most require large volumes of supplemental concentrated oxygen during later development stages, which risks damaging DNA in the developing embryo. Success rates using plastic cups, saran wrap, and other artificial containers have historically been low."

"The Colossal artificial egg does not replace the biological processes that precede egg-laying -- it intervenes after them. In the current workflow, scientists examine eggs laid by real hens within 24 to 48 hours of laying, select viable candidates, and transfer the contents -- minus the shell -- into the artificial egg structure. All upstream biology, from fertilization through laying, still occurs in a living bird."

The apparent end goal of all this is "de-extinction":

"For de-extinction applications, the artificial egg is intended as a later-stage incubation vessel, not the point of genetic intervention. To produce a bird resembling the dodo or giant moa, Colossal's scientists would need to introduce species-specific genetic edits at a far earlier cellular stage."

"Colossal's approach to this challenge involves primordial germ cells -- stem cell precursors to sperm and egg cells."

"The internet runs on standards nobody owns. HTTP, SMTP, SQL, RSS -- protocols that any implementor can read, build against, and extend without asking permission. When you send an email, you don't think about which server is on the other end. When you query a database, your application doesn't break because the vendor released a new version. These things work because the contract between components is public, stable, and enforced by convention rather than by a single company's product roadmap."

"AI infrastructure in 2026 looks nothing like this."

"Every orchestration platform -- every system for defining AI agent teams and running them at scale -- speaks its own language. One platform expects a YAML file with a steps array. Another uses a Python SDK with decorators. A third has a visual editor that exports to a proprietary JSON format. None of them interoperate. A team you build on platform A cannot run on platform B without a full rewrite."

Um. Ok, what would this standardization look like? A common API spec between proprietary and open source models? There's no specific proposal here for a standard, only lists of "pros" and "cons". What do you all think standardization in AI would look like?

The Leiden Declaration on Artificial Intelligence and Mathematics. "Calls for action to address the challenges posed by the use of artificial intelligence within mathematics research."

So named, I think, because it originated at Leiden University, which is in Leiden in the Netherlands, about midway between The Hague and Amsterdam.

"Current automated techniques can produce plausible but unreliable (or even incorrect) arguments which are difficult to distinguish from correct mathematical proofs. This applies not only to informal arguments, but also to formalizations, where the difficulty lies in the translation between computer-encoded and human presentations of concepts. These fast-moving developments put our present system of review under increasing pressure, jeopardizing our ability to implement traditional standards for the correctness, transparency, and independent verifiability of proof."

"Technologies that draw extensively on the published mathematical commons undermine the traditional system of attribution. Models trained on published works frequently return outputs that do not properly cite the human works they synthesize."

"Technologies which affect the way in which mathematics is practiced may disturb the current system of incentives. The use of artificial intelligence -- and thus also the sort of problems which it can address -- may become incentivized for its own sake, disrupting our mechanisms for hiring, funding, and recognition. This disadvantages researchers who do not have access to the technologies or decision-making related to them, or who are unwilling to use technologies controlled by organizations whose values they do not share."

When I read "organizations whose values they do not share", I immediately thought, wait, aren't the AI companies "aligning" AI with "human values"? Shouldn't it be impossible for anyone to not share values with these companies? The problem with different humans having different "values" is something I thought of decades ago before "alignment" was a catchword. Also, this document has a "values" section that lists values including mathematics being done by specific authors who take credit for mathematical discovery and assume responsibility for correctness, and making mathematical arguments transparent and subject to independent verification.

"Proper evaluation is endangered if results are communicated through informal channels such as press releases or blog posts, often without any research paper or other disclosure of information necessary for scientific evaluation. This practice seeks publicity for new results on market timelines before the accepted processes of community evaluation in mathematics can take place. In many cases this leads to simplifications in reporting, such as overemphasizing the significance of automated tools and undervaluing the prior human contributions which have made those tools possible."

"These developments put the autonomy of mathematics under threat. The increasing involvement of technology companies in mathematical research raises the risk that research questions may come to be prioritized because of their amenability to automated mathematics, rather than expert judgment of their deeper significance."

What followes is sections on "Recommendations for individual mathematicians", "Recommendations for mathematical organizations and not-for-profit research funders", and "Recommendations for policymakers in government and elsewhere".

The "Recommendations for individual mathematicians" are: Disclose tool use, support the needs of reviewing, adhere to principles of open science, retain the responsibility for correctness, affirm the humanity of authorship, put effort into proper attribution, participate in public discourse, stay informed about the emerging technologies, welcome new contributors, consider carefully which tools to use, and evaluate the ethical consequences of your work, and take action accordingly.

The "Recommendations for mathematical organizations and not-for-profit research funders" are: Build expertise and plan strategically, take the lead on policies for publishing and reviewing, maintain standards of rigor, protect the rights of authors, insist on appropriate publication outlets, support public research laboratories, provide frameworks for collaboration, and align funding with values.

The "Recommendations for policymakers in government and elsewhere" are: Protect the rights of authors, don't believe the hype, regulate the artificial intelligence industry, and invest in public computational infrastructure.

"Putin's options after the war has stalled."

Since we seem to be at a turning point in the war in Ukraine, any "futurist" worth their salt has to report/comment on it. So I'm bringing you all this video. Since I know some of you might not want to watch an 18-minute video, I'll summarize, or you can use this to decide if the whole video is worth watching.

Anders Puck Nielsen, a military analyst in Denmark, makes a matrix with 4 options on the Y axis and 3 "parameters" on the X axis. The 4 options are: 1. Accept defeat, 2. Freeze the conflict, 3. Mass mobilization, and 4. Dramatic escalation -- attack the European countries that are supporting Ukraine's war economy. The 3 "parameters" are: 1. Chance of Russia winning the war, 2. Chance of saving the Russian economy, and 3. Regime security risk for Putin -- how it affects or undermines Putin's legitimacy as the leader of Russia.

He fills out the matrix with his estimations as follows: For "Accept defeat", the chance of Russia winning the war is "None", the chance of saving the Russian economy is "Medium", and the regime security risk for Putin is "Medium". For "Freeze conflict", the chance of Russia winning the war is "Medium", the chance of saving the Russian economy is "Low", and the regime security risk for Putin is "Low". For "Mass mobilization", the chance of Russia winning the war is "Low", the chance of saving the Russian economy is "None", and the regime security risk for Putin is "High". For "Escalation", the chance of Russia winning the war is "Medium", the chance of saving the Russian economy is "Medium", and the regime security risk for Putin is "High".

As for which option he thinks Putin will take, (spoiler!) it's mass mobilization.

He thinks Putin won't accept defeat because it goes against what he perceives as Putin's political goals for the war.

Escalation is the most dangerous option. It is the only option with a significant chance of winning the war -- and quickly. It directly addresses "the root causes of Russia's predicament," in his estimation, namely that Western Europe is willing to keep funding Ukraine's war. He also says, because this option aims to end the war quickly, it also offers the prospect of saving the economy. It's a high-risk/high-reward gamble.

In his opinion, Putin's smartest move is to try to freeze the conflict. It puts Ukraine in a perpetual frozen war. Putin freezes the war simply by ordering soldiers on the front line to stop trying to advance, and instead defend the existing front line, freezing it in place. Advancing Russian soldiers are currently the primary targets of Ukrainian drones.

However, he does not think Putin will do this. He thinks Putin will go for the mass mobilization option. He thinks Putin's generals will convince him that if he (Putin) just gives them another 300,000 or 400,000 or 500,000 troops, they can solve the problem. He (Anders Puck Nielsen) considers this option "the stupidest one of them all" according to his matrix but still thinks it's the one Putin is most likely to choose.

He thinks soon Putin will have no choice but to choose one of these 4 options and adds that Western European countries need to be prepared for the possibility that he might choose the most dangerous option and they might be heading into a military confrontation with Russia in the final phases of the war in Ukraine.

The papal encyclical from Pope Leo XIV, called Magnifica Humanitas, was released by the Holy See on May 25. It was rumored beforehand that it would be about safeguarding the human person in the time of artificial intelligence.

I never heard of an "encyclical". Apparently popes in the past wrote what we today would call a "flyer" but they would do it periodically like a newsletter so it was called an encyclios, which got translated into English as "encyclical".

Does it matter what the Pope says about AI?

I wish I could say I read the papal encyclical, but, I didn't have the patience to read 80 pages, especially as a lot of it has very flowery language. Maybe, besides being always pressed for time, I'm having some of the short attention span that affects so many of us in this digital age? What I'm trying to get at is I just skimmed this document, so I'm not really responding to the document in full.

Having said that, my overall take is, the Pope is primarily concerned with dehumanization and injustice. He is concerned the use of AI will lead to dehumanizing effects, and that it will make the world less just. Justice seems to be a theme running through the whole document, not just the parts about AI. Dehumanization was more in the parts about AI. He calls upon his followers, and the makers of AI, to choose to use AI in their daily lives, or to create AI, in ways that center humanity, rather than being dehumanizing, and in ways that increase justice in the world, rather than decrease justice.

Some quotes to follow. (This may seem like a lot, but remember, it's coming from an 80-page document so is actually a small fraction.)

"It is not possible to provide a single, comprehensive definition of AI. What can be stated, however, is that we must avoid the misconception of equating this type of 'intelligence' with that of human beings. These systems merely imitate certain functions of human intelligence. In doing so, they often surpass human intelligence in speed and computational capacity, offering tangible benefits across many fields. Yet this power remains entirely tied to data processing. So-called artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate language, behavior and analytical skills, or even simulate empathy and understanding, but they do not understand what they produce, for they lack the affective, relational and spiritual perspective through which human beings grow in wisdom. Even when these tools are described as capable of 'learning,' their way of doing so is different from that of a human person. It is not the experience of those who allow themselves to be shaped by life and grow over time through choices, mistakes, forgiveness and fidelity. Rather, it is a form of statistical adaptation based on data and feedback, which can be very effective, but does not imply inner growth." [page 37 paragraph 99]

"The speed and simplicity with which information, complex analyses, media content and practical assistance can be accessed undoubtedly makes life easier. Yet they can also encourage excessive reliance and the search for ready-made answers, and weaken personal creativity and judgment. The apparent objectivity of the responses and suggestions these systems provide can lead us to overlook the fact that they reflect the cultural assumptions of those who designed and trained them, with all their strengths and limitations. The artificial imitation of positive human communication -- words of advice, empathy, friendship and even love -- can be engaging and at times genuinely helpful. However, for less discerning users, it can also be misleading, creating the illusion of a relationship with a real personal subject. When words are simulated, they do not build genuine relationships, but only their appearance. The artificial imitation of care or support can become particularly risky when it enters contexts where real relationships and emotional bonds are lacking. Here, the danger is not so much that a person may believe they are communicating with another person, but rather that they may gradually lose the very desire to form genuine human connections." [page 37 paragraph 100]

"Important and sensitive decisions -- concerning employment, credit, access to public services or even a person's reputation -- risk being fully delegated to automated systems that do not know 'compassion, mercy, forgiveness, and above all, the hope that people are able to change,' and can therefore give rise to new forms of exclusion. There are clearly harmful uses, such as the manipulation of information or violations of privacy. Yet there is also a subtler danger, for when AI systems present themselves as neutral and objective, they end up reflecting and reinforcing the stereotypes or ideological bias of their designers and developers." [page 38 paragraph 102]

"Indeed, entrusting an algorithm in practice with the power to select who is worthy or not, without anyone bearing responsibility for that judgment, is to hand over the task of redefining the boundaries of human possibilities. In this process, political responsibility is also lost, not just empathy toward those excluded, which can, after all, be simulated. The exclusion of the vulnerable becomes cloaked in a veneer of neutrality and objectivity, against which it becomes difficult to raise objections. In this way, injustice goes unnoticed, and compassion, mercy and forgiveness -- understood not as mere appearances but as real political actions -- gradually disappear from view." [page 38 paragraph 103]

"If a system is designed or used in a way that treats some lives as less worthy, or excludes them without the possibility of appeal, then it is not merely a tool 'to be used well,' since it has already introduced criteria that contradict the inalienable dignity of the human person. For this reason, ethical discernment cannot be limited to asking whether we are using a system for good or bad purposes; it must also examine how that system is designed and what vision of the human person and society is embedded in the data and models that guide it." [page 39 paragraph 104]

"Calling for prudence, rigorous evaluation and even, at times, a slower pace in adopting AI does not mean opposing progress; instead, it is an exercise of responsible care for the human family. This need is all the more urgent given the frequent imbalance between the speed of technological growth and the slower development of awareness, norms, safeguards and institutions capable of governing its effects. It is not enough to invoke ethics in the abstract; robust legal frameworks, independent oversight, informed users and a political system that does not abdicate its responsibility are required. Otherwise, change will be governed only by technocratic thinking and presented as necessary and inevitable, ultimately imposing rules shaped by those who control data, infrastructure and computing power." [page 39 paragraph 106]

"We cannot be satisfied with merely calling for the moralization of machines -- the so-called 'alignment' of AI with human values -- without also having the courage to insist on a further condition: the possibility of openly discussing the ethical frameworks involved and subjecting them to shared standards of social justice. Otherwise, those who control AI will impose their own moral vision, which will become the invisible infrastructure of these systems." [page 39 paragraph 107]

"I would like to employ the expression 'to disarm,' which is close to my heart. Disarming AI means freeing it from the mentality of 'armed' competition, which today is not limited simply to the military context, but is also an economic and cognitive phenomenon. This entails a race for ever more powerful algorithms and larger datasets, driven by the desire to secure geopolitical or commercial dominance. To disarm means discrediting the assumption that technical power automatically confers the right to govern." [page 40 paragraph 110]

"I wish to address a special appeal to those who develop artificial intelligence. In one sense, technological innovation can represent human participation in the divine act of creation. Developers, therefore, bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity. Just as the creator of an artistic or literary work must consider the values it conveys, so developers are called to embed values in their projects with due seriousness: with transparency, responsibility toward affected communities and careful attention to ensuring that what is being cultivated is a genuine good." [page 40 paragraph 111]

"Sometimes there is talk of 'artificial moral agents,' as if machines were able to distinguish between right and wrong with greater consistency than a human being. Yet moral judgment cannot be reduced to calculation, for it involves conscience, personal responsibility and the recognition of the other as a person. Therefore, it is not permissible to entrust lethal or otherwise irreversible decisions to artificial systems." [page 67 paragraph 198]

"On May 18, 2026, a large-scale supply chain attack campaign tracked as Megalodon injected malicious GitHub Actions workflows into over 5,500 open-source repositories within a single six-hour window. The campaign targeted repositories with weak branch protection, pushing backdoored CI workflow files designed to steal secrets from every subsequent pipeline run including cloud credentials, SSH keys, API tokens, and GitHub Actions OIDC tokens."

"GitHub Actions workflows run arbitrary shell commands with access to every secret injected into the CI environment. When a repository grants id-token: write permission, workflows can also mint short-lived OIDC tokens that authenticate directly to cloud providers without static credentials. This makes the CI runner a high-value target: a single workflow execution can yield AWS access keys, GCP service account tokens, Azure IMDS credentials, Kubernetes configs, and all repository secrets simultaneously."

This is the first time I've heard of this type of attack, an attack on a continuous integration/continuous delivery (CI/CD) pipeline. But according to this text, it's actually not novel. Have you all heard about this type of attack before? The article goes on to say:

"Megalodon is a textbook direct Poisoned Pipeline Execution (d-PPE) attack, a class of CI/CD attack where an adversary with write access to a repository injects malicious code directly into workflow definition files, causing the CI system to execute attacker-controlled commands on the next pipeline run. Unlike indirect PPE (i-PPE), which requires a pull request from a fork, d-PPE exploits direct push access to the default branch, bypassing any pull request review gate entirely."

"Tesla self-certifies level 4 autonomous vehicles in Texas."

"A new Texas law allows companies with SAE Level 4 or higher autonomous vehicles to offer commercial driverless transportation."

"Tesla wasted no time in self-certifying their vehicles. On the same day the law went into effect, Tesla officially self-certified their FSD software on their robotaxi vehicles as Level 4 compliant."

"By certifying its software as Level 4 for commercial operations, Tesla is willfully absorbing a substantial portion of the operational liability. It's legally stating that its vehicles can operate themselves without any human supervision or intervention under certain conditions. These conditions are typically based on weather, region (geofense), or speed."

The article goes on to explain what distinguishes the Society of Automotive Engineers (SAE) level 2 and level 4 certifications.

"I audited 200 Claude Code skills. 26 were trying to steal your tokens."

Claims the home page of SkillVault, a commercial service ($129) for Claude skills. A "skill" is just a set of instructions in natural language to Claude on how to do things (format a weekly report, adhere to a company's brand guidelines, analyze data using a specific methodology, etc). Like a prompt, but reusable. Generally they are in a folder with the skill name in a file called SKILL.md. After reading this, I suspect you should just hang on to prompts you want to reuse as skills rather than downloading skills from other people. There's a good chance there's an attack buried in them somewhere.

The technical debt behind the AI boom.

Waring: Extensive quoting from the research paper to follow yet again. Rather than write my own commentary, I'll just quote from the paper and let you draw your own conclusions.

"We first collect AI-authored commits from GitHub repositories at scale. We then analyze each AI-authored commit at the code level to determine which quality issues it introduced or fixed. Finally, we track the lifecycle of both the issues and the code itself to determine whether AI-introduced debt persists or gets resolved over time."

"We build attribution rules for widely adopted AI coding tools (e.g., Cursor, GitHub Copilot, Claude Code) identified in the 2025 Stack Overflow Developer Survey. We identify AI-authored commits using explicit signals in Git metadata. Our approach covers AI-authored commits only when the use of an AI coding tool leaves explicit traces in Git metadata."

"We keep only repositories with at least 100 GitHub stars. We also require at least one confirmed AI-authored commit. Our downstream analysis is restricted to production Python, JavaScript, and TypeScript source files, since these are among the most widely used programming languages and are well supported by static analysis tools. We therefore exclude repositories that do not contain any source files in these languages. In total, the discovery stage identified 587,118 candidate repositories. After applying the star threshold, 12,770 repositories remained. After full-history scanning and language filtering, we obtained 6,699 repositories with confirmed AI-authored commits."

"For each AI-authored commit c, we analyze two versions of the source code: the version at c's parent revision (before the commit is applied) and the version at c itself (after the commit is applied). Comparing these two versions allows us to determine which quality issues the commit introduced or fixed."

"We run the same static analysis toolchain on both versions to identify potential code issues. We use ESLint (for JavaScript and TypeScript) and Pylint (for Python) to detect code smells and correctness issues. For security-related issues, we use Semgrep, which provides a unified framework for multi-language static analysis. For each detected issue, we record its rule identifier, line number, detector, and message."

"Detecting technical debt at the time of introduction is only half the picture. An issue that is quickly resolved has a very different cost than one that lingers for months. We therefore track whether AI-introduced issues persist or get resolved over time."

"For each issue introduced by an AI-authored commit, we check whether it still exists at the repository's latest revision (i.e., HEAD). If the file has been renamed, we follow its history using git log --follow. We then run static analysis on the corresponding file at HEAD. Next, we look for the same issue in the analysis results. We do not rely on the line number alone, since the location of the issue may move as the file changes. Instead, we match issues using their rule identifier together with a small amount of surrounding code context. If a match is found, the issue is classified as surviving. Otherwise, it is classified as not surviving. In other words, an introduced issue is counted as surviving only if the same issue is still present at HEAD. If the original issue disappears and a different issue appears later, the original issue is treated as not surviving."

"At the same time, we also record whether files touched by AI-authored commits are modified again before HEAD. We trace the subsequent commit history of each affected file to understand how actively it is maintained after the AI-authored change. This additional context helps us interpret the survival results and understand the maintenance patterns around AI-introduced debt."

"Some tools have very few commits, which may not provide reliable data for comparison. Thus, we focus on the five assistants with more than 10,000 attributed commits: GitHub Copilot, Claude, Cursor, Gemini, and Devin. This results in 6,412 repositories with 317.4K AI-attributed commits."

"In total, we identified 484,366 introduced issues across 3,946 repositories (62.6% of 6,299 repositories) and 27,677 commits (9.1% of 302,579 commits). This shows that a non-trivial portion of AI-authored commits introduce quality issues, and that these issues affect a large number of real-world repositories."

"Code smells are maintainability problems that make code harder to understand, debug, and evolve. They increase long-term maintenance costs, even if they do not cause immediate failures. This finding is consistent with prior work under controlled settings, but our study confirms that the same pattern also appears in real-world repositories. The top 5 most common code smell patterns (e.g., broad exception handling, unused variables or parameters) are often small and easy to overlook during code review."

"Correctness issues are code defects that can cause the program to fail during execution. Compared with code smells, they are less frequent. 28,931 correctness issues are identified, which cover 665 repositories and 1,650 commits. However, their impact is more direct and severe than code smells. The top 5 most common correctness issues include undefined variable or reference, redeclared symbol, access to member before definition, possibly used before assignment, and unsubscriptable object. These patterns suggest that AI-generated code may look locally correct, but still fail to stay consistent with the surrounding context. We identified 23,856 cases of undefined variable or reference."

"Security issues are another concern in AI-generated code. In our study, this category includes not only direct security vulnerabilities, but also insecure coding patterns that can be viewed as security debt. Some of these issues may be exploitable at the time they are introduced, while others may become security risks after later code changes or broader system integration."

"Potentially insecure code patterns are detected in 1,643 repositories and 5,142 commits. Common security issues such as path traversal via path.join or path.resolve, unsafe format strings, non-literal regular expressions, and child process execution. These patterns suggest that AI-generated code can introduce unsafe practices in process execution, file path handling, and string formatting. A common pattern across these issues is unsafe handling of untrusted input, where user- or context-controlled values flow into security-sensitive operations without proper validation or sanitization."

"More than 15% of commits by each AI coding tool introduce at least one issue. The rates also vary across tools, ranging from 17.4% for GitHub Copilot to 29.1% for Gemini. This suggests that technical debt appears across all studied tools, although the rate differs by tool."

"For code smells, we can see that AI-authored commits fix more issues than they introduce (439,817 vs 432,748), resulting in a net reduction of 7,069 code smells. In contrast, for correctness and security issues, AI commits introduce more issues than they fix. What is interesting is that AI introduces about 1.5 times as many security issues as it fixes. These findings indicate that the net impact of AI coding assistants is mixed. AI coding assistants can help reduce maintainability issues, which tend to follow simple and repetitive patterns. However, for correctness and security issues, which require a deeper understanding and reasoning about program logic and context, AI coding assistants introduce more problems than they resolve."

"The net impact analysis above provides an overview of what AI coding assistants add and remove. But it does not show what happens to the specific issues introduced by AI. To answer this question, we track each AI-introduced issue to the latest repository snapshot and check whether it still exists at HEAD. The cumulative number of surviving issues keeps growing over time. The total volume of unresolved technical debt increases rapidly, climbing from just a few hundred issues in early 2025 to over 100k surviving issues by February 2026. This suggests that as the rapid adoption of AI coding assistants continues, the amount of AI-introduced debt in real-world repositories is also growing significantly."

"105,364 out of 464,900 tracked AI-introduced issues still survive at HEAD, corresponding to a survival rate of 22.7%. Surviving issues appear in all age cohorts, including issues introduced more than nine months earlier. For example, 4,893 issues introduced more than nine months ago still remain at HEAD. The survival rate varies across cohorts, ranging from 19.4% for issues introduced 6-9 months ago to 28.2% for issues introduced 3-6 months ago. This suggests that AI-introduced debt is not always removed quickly after it enters the codebase. Although the cohort-level survival rates do not show a simple monotonic trend, the main finding is clear: a substantial number of AI-introduced issues remain unresolved over time."

They have a section on "Implications" and rather than give you any of my own commentary, I'll just quote a few bits from that section:

"AI-assisted development creates persistent debt, not just temporary low-quality code. AI-assisted software development changes how technical debt enters and remains in production systems."

"Developers should be especially cautious about correctness and security issues. Our findings suggest that although AI coding assistants introduce technical debt, they also fix existing issues in the codebase. First, we observe that AI co-authored commits actually fix a similar number of code smell issues as they introduce. This suggests that AI coding assistants are able to perform local cleanup and repetitive maintenance tasks effectively. They can recognize and address surface-level code quality problems (e.g., formatting, naming, or simple refactoring opportunities). However, what is concerning is that AI coding assistants seem to be less effective at fixing correctness and security issues, and they even introduce more of these than they fix."

"Technical debt cannot be solved by switching between AI coding tools. Our cross-tool comparison shows that this problem cannot be solved simply by switching from one assistant to another. All five tools introduce a similar pattern of issues. They all have a high rate of code smells, and a non-trivial rate of correctness and security issues. This suggests that the quality risk is a systemic issue with the current mode of AI-assisted development."

"Future research and tool design should not just focus on generating more acceptable code. We also need to ask whether that code remains maintainable, correct, and secure over time. Most existing research studies focus on short-term outcomes such as task completion, acceptance rate, or immediate correctness. However, these measures only capture what happens when the code is introduced, not what happens later in maintenance."

"AI assistance impairs independent performance and reduces persistence."

"Imagine the following scenario. You are mentoring a student, and they come to you asking you to solve a coding problem. You help them, walking through the solution step by step. They then come back and ask you to solve another problem. And then another. Eventually, you might pause as you recognize that something is going wrong. You realize that your student isn't learning how to code and is simply learning to rely on your help. You subsequently sit them down and talk about the value of persisting through challenges, of practicing new skills, and what it actually means to learn."

"Good collaborators optimize for long-term objectives. A mentor encourages independent development by adjusting the type of help given and sometimes offering no help at all. In essence, the best collaborators maintain a balance between helping and fostering autonomy; they know when not to help."

"Current AI assistants are a stark contrast to this dynamic. They never refuse to help (unless for safety reasons), and provide instant answers to almost any query."

Oh, probably should warn you all: this is another one with extensive quoting from the research paper.

"Although AI assistance improves performance during assisted sessions, people's performance drops sharply once AI is removed. More strikingly, relative to the controls, participants in the AI condition also persist less with tasks and give up more frequently."

"We recruited 354 US-based participants from the online research platform Prolific and paid them $2.60 for participation (our study took approximately 13 minutes to complete). In the experiment, participants were given a series of 15 fraction problems to solve of varying difficulty. Participants were explicitly informed that there was no penalty for providing wrong answers, their payment didn't depend on how many questions they solve correctly, and they were requested to do the task to the best of their abilities. At the beginning of the experiment, participants were randomly assigned to two conditions -- the AI condition (N=191) or the control condition (N=163)."

They later say they excluded participants with poor attention or who could not do basic fractions at the beginning of the experiment, making the final numbers N=185 and N=122.

"Participants in the AI condition were informed that they would have access to an AI assistant for some of the problems and encouraged to use the AI however they liked, with no penalty for doing so. They were then presented with a series of 12 fraction problems with an AI assistant (GPT-5) available in a sidebar. The AI assistant was pre-prompted with each problem and its solution, allowing participants to receive immediate, accurate answers with minimal effort (if they chose to do so). For example, they could simply type 'answer?', and receive a solution in return."

"To measure independent problem-solving capacity, the AI assistant was then removed without warning, and participants were asked to solve 3 additional fraction problems. For these problems, participants were requested not to use AI or other external sources. Importantly, these problems were identical across conditions and served as the primary measure of independent performance."

They give some examples of the kinds of fraction problems they asked people to do.

"Example 'one-step' problem: 5/6 - 1/3."
"Example 'two-step' problem: (7/8 - 1/2) x 5/6."
"Example 'three-step' problem: (5/6 - 1/4) x (3/5 + 1/10)."

"In both conditions, to enable learning from mistakes, if a participant submitted an incorrect answer, the correct solution was shown on the same screen. Furthermore, in both conditions, participants had the option of skipping a problem by clicking a 'skip' button. Since participants were explicitly told there was no penalty for wrong answers, choosing to skip reflects a deliberate decision not to engage, making it a clear measure of motivation and persistence, independent of ability."

This was all the first experiment. They realized their exclusion criteria removed participants unable to solve basic fraction problems "but didn't account for participants in the AI condition who were similarly unable yet submitted correct answers via AI." So they did experiment 2 to correct for this. In experiment 1, exclusions were based on in-experiment performance, but in experiment 2, there was a pre-test, identical for both AI condition and control condition. They also replaced the AI sidebar with pretest solutions to eliminate what they felt was a user interface asymmetry between the two conditions.

"AI assistance improved performance during the learning phase, but solve rates dropped and skip rates increased once the AI was removed . Participants in the AI condition had a lower solve rate than participants in the control condition. Participants in the AI condition also exhibited a higher skip rate than participants in the control condition, but the result was not significant."

"At the end of Experiment 2, we asked participants in the AI condition to self-report how they used the AI assistant during the task (using a multiple choice question). We found that the majority of participants (61%, N=189) in the AI condition self-report that they used the AI primarily to get answers directly. Others reported that they used the AI to get hints or clarifications (27%, N=82), and some participants reported no AI usage (12%, N=37)."

What follows is their statistical analysis, using ANOVA and pairwise t-tests. In addition they calculate mean, standard deviation, p-values, Cohen's d (for effect size), and 95% confidence intervals. If you're interested in these numbers, read the paper (which also has some charts and graphs). I'm assuming most people don't know what these mean or don't care so I'm not going to go through the numerical results. The key non-numerical result is people who used AI had a lower solve rate after the AI was removed.

Experiment 3 repeated the experiment but for reading comprehension.

"Participants in the AI condition were then presented with a series of 5 reading comprehension problems, with an AI assistant (GPT-5) available in a sidebar. The AI assistant was then removed, and participants were asked to solve 3 additional reading comprehension problems. Participants in the control condition were presented all 8 problems without AI assistance."

"Human cognition has always been shaped by external tools, from calculators to internet to GPS navigation. Current AI systems, however, represent a new kind of cognitive scaffold: one that solves anything, rarely refuses to help, and delivers answers instantly. Here, we show that just 10 -- 15 minutes of AI interaction can result in significant impairments in independent performance and persistence -- capacities that are foundational to life-long learning. If brief exposure produces measurable erosion, the cumulative effects of daily AI use over months or years may be profound and difficult to reverse. Two mechanisms may explain the observed decline in persistence. First, when AI routinely completes tasks in seconds, the reference point for how long a task should take can shift -- and as a consequence, unaided work starts to feel counterfactually more effortful, a process structurally analogous to hedonic adaptation . Crucially, this mechanism is self-reinforcing: each act of offloading shifts the reference point, increases the subjective cost of unaided effort, and makes future offloading more attractive. Second, AI removes the productive struggle through which people develop not only accurate knowledge but accurate self -knowledge. Without opportunities to work independently, people never learn what they are capable of, undermining the metacognitive calibration that sustains persistence."

Commentary:

Why do I have a feeling there are going to be people saying, yes, AI reduces ability and persistence but this is a *good* thing: it proves the AI is really doing what it promises to do, which is automate work. As such, this research paper is an AI success story.

"Boy internet vs girl internet (algorithms explained)."

Ever since I read that women are becoming increasingly liberal and Democrat voters while men are becoming increasingly conservative and Republican voters -- and this gap existed even outside the US, in some European countries and was even largest in South Korea -- I've wondered if part of what could be driving it is that men and women experience a completely different internet. (More commentary below.)

This guy (Oren John) would seem to confirm the latter half of that suspicion -- men and women do indeed experience a completely different internet. (He does not address the former half -- the suspicion that this could be driving the widening political divide between men and women.)

The "Mens' vs women's algorithms" part starts at 11:42 in the video. Everything before that is background information, like how everyone used to watch the same media and see the same ads (he is an advertising guy), so there was a "monoculture" shared between men and women -- e.g. Nike was the same brand for both men and women -- and some stuff about how he gets his data. This video also, I should mention, is one of those "Made for TikTok" videos that has giant annoying subtitles across the bottom of the screen so it can be sliced up into "shorts" that have dramatic subtitles which is the style of a TikTok video (as far as I can tell -- I don't use TikTok -- but this style is really annoying for long-form videos on YouTube).

Starting at 11:42 he describes "Mens' vs women's algos", and basically (spoiler), the internet for women is "You are seen" while the internet for men is "You suck". For women, are you in this particular relationship scenario? You are seen: there are other women just like you experiencing the same thing. Do you have a particular skin type? You are seen: there are other women just like you experiencing the same thing. Do you have a particular medical issue? You are seen: there are other women just like you experiencing the same thing. Do you have a particular background (I'm guessing he means racial/ethnic/religious background)? You are seen: there are other women going through the same thing. This content can be packaged as "trad" or "woke", as "sweet" or "scandalous". There's literary/art/quirky content (e.g. BookTok). Women's content is supportive. "Find your tribe." Any opinion you have gets justified. Relentlessly.

For men, why are you not rich? You suck. Why are you not ripped? You suck. You don't have this expensive mansion or this expensive car? You suck. You aren't making $5,000 from new online sign-ups while you sleep? You suck. Your AI agent isn't making $1,000/day? You suck. You're getting "destroyed" on dating apps? You suck. (Why do I suspect this guy is himself getting "destroyed" on dating apps? You'll have to form your own judgment.) Men are relentlessly shown their own faults. The women's algorithm is "Hey, you're seen" while the men's algorithm is "Hey, you suck."

Apparently one thing both women and men agree on is that men suck. I feel like there should be a punchline following that but I can't think of one.

He's an advertising guy so in future videos he says he'll get into how to sell in this ecosystem. But we can see hints here of what that's going to be. First, market to men and women completely separately. For women: commiserate, then go from problem to solution with your product. For men: show vastly more successful men than them (at money, sports, dating, or whatever) (show them how much they suck), then go from problem to solution with your product.

He paints a picture of an internet where women market to other women products to help them impress other women, and men market to other men products to help them impress other men, leaving men and women in their own separate monocultures. One thing he thinks both have in common is that the more time spent online, the more loneliness, and loneliness drives a lot of online buying.

Commentary: On YouTube, there've been a few times when some creators I follow have talked about their YouTube analytics. For example, one time Matt Parker, the math YouTuber (excuse me, maths YouTuber) was talking with a woman YouTuber who also makes math videos (not Hannah Fry -- I can't remember her name), and he said according to his YouTube analytics, 94% of his audience was male, and the woman math YouTube creator said her YouTube analytics were the same -- 94% male. I suspect almost all the content I post here ("futurist" type stuff mostly) has an 80+% or 90+% male audience, if you could see who else was watching or reading the same content. So it looks to me like the online world algorithmically sorts women and men into different worlds.

And even the offline world, if the organizing happens online. I noticed on my last visit to Meetup.com that it told me an upcoming Meetup was 80% male. I thought that was interesting because many years ago, I read that on Meetup, lots of groups are women-only but very few are men-only, and I tried to do some searches to see if that was true and discovered that Meetup had removed the ability to search for groups based on whether they're women-only or men-only. And yet here they are putting a prominent box on upcoming Meetups indicating the gender ratio. But maybe that's beside the point -- the point is when I go to Meetups, they're 80+% male all the time, and that was observable before the gender ratio box on the website. Sure, Meetups are "offline" but the gender sorting happens online. Online algorithms sort women and men into different worlds. It appears that algorithmic gender-sorting is transforming more or less all of life.

YouTube creators have an "Inspiration" tab in their YouTube Creator Studio that generates an endless supply of hilariously clickbaity titles and thumbnails. Toby Hendy, of the math channel Tibees, pulls back the curtain and shows us what she sees as a YouTube creator. The data the AI bases this on is not just the creator's previous videos and audience comments under their videos, but also other videos on YouTube those people also watch.

WikiMap. Wow, interesting idea -- someone took all the articles in Wikipedia that have a GPS coordinate on them and turned those GPS coordinates into clickable pins on a map.

I started clicking stuff at random and discovered they're not all places. For example I clicked some random spot in the north of Canada, and got this article "Solar eclipse of July 10, 1972". Apparently there was a total solar eclipse visible in northern Canada on July 10th, 1972, and I never knew that. Nobody told me about it in 1972. Well, I was 1 year old at the time (wouldn't be 2 until September), so maybe it makes sense that nobody mentioned it. Or maybe someone mentioned it and I don't remember. My English wasn't that great at that time.

You can also change the language. I changed it to Russian but instead of picking a location in Russia, I picked a location here, in the US. The article I got was "Стычка у Крейзи-Вумен-Крика" (in our alphabet, the Latin alphabet, Ctwichka u Kryeyzi-Vumen-Krika), which evidently means "Battle of Crazy Woman Creek". Allegedly (according to this article and Google Translate) on July 20, 1866, there was a clash between an alliance of Cheyenne and Lakota tribes on one side, and a detachment of the US Army on the other. (Other side of an actual creek called "Crazy Woman Creek"? Was there an actual creek?) And it's something people in Russia can read about in Russian. Who knew? (In the alphabet that Russian uses, Cyrillic, "Cheyenne" is "шайеннов" and "Lakota" is "лакота". "US Army" is "армии США". In case you wanted to know, which you undoubtedly didn't. Not only that, but I noticed the word for woman in "Battle of Crazy Woman Creek" wasn't the Russian word for woman, "женщина" (zhenshina), it was "вумен" (vumen), and that made me realize the whole name "Крейзи-Вумен-Крика" (Kraizi-Vumen-Krika) was a phonetic transliteration from English -- the Russian word for "crazy" is "сумасшедший" (sumasshyedshiy), not "крейзи" (kraizi), and the Russian word for "creek", well, "ручей" (ruchyey) could be "stream", "creek", "brook", "rivulet", or "watercourse", according to Google Translate, and is probably the best translation, but "крика" (krika) isn't a Russian word for "creek". Crazy, huh? I wasn't expecting this weirdness from a random click on a map.)

Happiness, equality, individualism, rationalism, progress, universalism, and pacifism are modern values that Daniel Aaron Levy has identified as being completely anomalous throughout the history of human civilization. Throughout the history of civilization (he is excluding hunter/gatherers), the values that formed the basis of human morality were: obedience, hierarchy, collectivism (social class membership), anti-rationalism (religion or tradition instead), order, particularism (morals not universal, morally good to treat different people differently), and belligerency (war inherently good). Throughout human history, slavery, caste systems, feudalism, torture, war, etc, were regarded as morally good. "Human rights" as a concept never existed before the 1600s. (Hour long video.)

Commentary: I feel like in our society, the "pre-modern values" are right below the surface. For example, school is obedience-based, and questioning authority in school is morally wrong. This is true in our society despite him listing "equality", "individualism", and "rationalism" as modern values and putting "obedience" and "hierarchy" on the "pre-modern" side of the ledger. Corporations are autocracies. Etc. Actually, I think he would agree -- he just says what used to be ordinary "conservative" values are today considered "far-right" -- he gets into political labels on the tail end of the video.

Maybe the reason is because, just under the surface of our "information society" is an "industrial society", and just below the "industrial society" is an "agricultural society". We all get our food from farms (and ranches) -- we are all agriculturalists. The only people on this planet who aren't agriculturalists are hunter/gatherers -- and the population of the agriculturalists has grown so much, hunter/gatherers are less than 1% of the population of the planet today.

I decided to ask AI to see if AI would give the same answers as to modern vs pre-modern morality. Actually I broke it down 3 ways: agricultural, industrial, and information society. Link below.

The next transition is the automation of the labor market out of existence by artificial intelligence. How will society's moral code change in response to this? (I didn't ask AI this question -- I wanted to give you humans a chance to take a crack at it first.)

"In an annual data dump from the Bureau of Labor Statistics (BLS), it emerged that a depression in these 'artificial intelligence related occupations' really does appear to be happening. This category was down by 0.2% from May of 2024 to May of 2025, a tiny drop, but one made more notable by employment in general trending up 0.8% in the same time period."

"One outlier subcategory among those 18, 'Medical secretaries and administrative assistants,' could be distorting the picture here, making the AI effect seem smaller than it actually is. Those jobs are hot; BLS got it wrong, for the time being anyway. Employment numbers across the others on the list dropped by 1.6%."

I've mentioned numerous times how there used to be a job translating documents from one language to another, but today, even though "translator" may still exist as a job title, what those people do is proofread the output of AI, not actually translate documents anymore. Well, "Interpreters and translators" was one of the jobs on the list. Weirdly, "software engineer" wasn't on the list, and that seems to be where most of the job losses have concentrated. Or maybe it just seems that way to me because that's what I notice. But I'll bet if the list were properly calibrated it would show greater losses.