2026)

On June 7, 2026June 11, 2026 By Jon UreñaIn Escritura, Inteligencia artificial, ProgramaciónLeave a comment

These past six months or so, I’ve been working on three different projects, or more accurately, on three distinct ideas for projects:

A structured way of creating stories that doesn’t rely on an act structure, by which I mean that the whole stories are based on the causality of tracking entities, entity statuses, cast members, facts, beliefs, secrets, locations, objects, visible affordances, events, intentions, plans, clocks, obligations, consequences, open threads, relationships, and emotions. As part of this project idea, I also wanted to structure worldbuilding and propagation of canon facts.
A platform to encode any card/board game relying on a game-agnostic engine, which would help prototype card/board games.
A maximally-emergent, causality-first simulation that would be able to produce the kind of adventures you get through Dwarf Fortress, Rimworld, Skyrim, etc., without relying on any scripting or game director.

The story-related projects, which ended up being two repositories in Node/Typescript, were the easiest, although they were complicated projects in their own right. But the other two projects deserve their own paragraphs.

That platform to encode any card/board game started as an extremely-ambitious notion: any user would be able to encode every aspect of any game in YAML files. YAML that basically works as programming is a terrible idea, which I realized deeper into the project. I started programming it in Node/Typescript, which was also a terrible idea: it’s far too slow of a combination when it involves AST parsing of the YAML; I had to spend lots of time bytecoding or Rust WASM-fying parts of the Typescript engine to get decent performance, which was never good enough for me, and a lot of possible wins were denied by marshalling issues. But by far, my biggest mistake was proving the project’s ability to accomodate multiple card/board games by implementing two games: Texas Hold ‘Em and Fire in the Lake COIN. Texas Hold ‘Em was mostly fine despite the YAML base: it took like three days.

Merely thinking of Fire in the Lake makes my skin crawl now. Fire in the Lake is an extraordinarily complex board game representing the Vietnam War. Not exactly a traditional wargame, although it’s hard to explain. The point is that back in 2020, before large language models, I attempted to program it in Rust to learn the language. Huge mistake. Having to abandon that project a third of the way in let a splinter stuck in me from which I never fully recovered. So months ago, I decided I would rely on large language models to take the splinter out and properly implement Fire in the Lake. Huge fucking mistake.

The Fire in the Lake map is extremely complex. The adjacencies between provinces and cities are confusing, which involves country borders as well, but much worse, it also features LoCs like roads and the Mekong river. Merely representing it visually without just pasting the map on screen was a fucking nightmare. As if the map graph wasn’t enough of a mess, what the factions can do is much worse: operations and optional special activities, each of which can be as complex as entire regular games. I’m talking sweeping, assaulting, subverting, governing, air striking. Then you have the event deck, which causes stuff to happen on the map and on the markers and tracks, which is easy enough to depict, but the game designers also went out of their way, for which I commend them except for the fact that I tried to program this cursed game, to make almost every card of the 130 its own little exception. Some cards offered grants for actions which may or may not be optional, some of which were taken by different factions other than the one currently playing. Again, makes my skin crawl. I had whole GitHub workflows dedicated to ensuring that the granting system didn’t shit itself.

After I managed to encode the entire game rules and content, which I did manage to do (a fleeting win), I got myself into the issue of encoding competent AI bots. I didn’t want to just copy the flowcharts, given that a computer would run the bots, so I had ChatGPT-Pro deep research what would make a competent human player for each of the factions. My intention was to encode that. I got really deep into AI architecture to figure out what kinds of bots would work well for my requirements, and I finally reached a point in which it seemed like I just had to push through, and keep iteratively improving the bot encoding, to finally get them running. But a few days ago, I found myself not even reading the tickets, not even checking out what was being committed. My subconscious had already understood that this project was a bust. It was crippled from the beginning, and I had done little else than push against entropy. I would never be happy enough with how Fire in the Lake would be shown on the web app, I would never feel comfortable enough with how it was implemented, and worse yet, the engine had taken in too much of Fire in the Lake’s specific shape that disentangling it to implement other games would make me have to fix Fire in the Lake at every step. So I gave up.

The third project was the maximally-emergent, causality-first simulation. From the beginning I had a very clear idea of what I wanted; basically by ideal game which nobody has built yet to my knowledge:

A simulation able to simulate dozens or hundreds (or thousands) of agents in an area the size of Skyrim, with settlements, roads, points of interest, etc.
The simulation should be maximally emergent and causality first. By which I mean that there would be no scripting and no game director. The simulation should be able to produce complex chains like this: a big monster gets displaced from a wilderness area due to the activity of some bandits -> the monster ends up hunting around some nearby road -> the monster attacks a caravan -> the caravan survivors reach a town and speak with the authorities about the attack -> the authorities decide to set up a bounty for the monster -> someone creates the bounty and posts it at some board -> some passing adventurers see the bounty and decide to take it -> they gear up and gather supplies for the hunt -> they travel to the area where the monster is supposed to be -> they track it to their den -> they find out some other adventurers casually killed the creature already. This all should happen due to causal chains, no scripting at all.
You as the player are indistinct to every other AI agent in the simulation. The simulation isn’t even aware that a human player exists. That means that the world continues regardless of what you do, but you can disrupt it.
Every agent, including the human player if any, acts on their beliefs, which may be truthful, stale, or based on straight lies. People gain beliefs by talking to each other, reading notices, observing the world. This allows chains like someone going home to take money, only to discover that the money is missing because someone has stolen it.
I should be able to chat through LLMs with any agent in the game, and the LLM would feed into the game which beliefs were actually shared or seeded, which would have gameplay effects.

That’s the main core. Such a simulation would allow any flavor and any kind of gameplay you could imagine. You would just have to build it incrementally. But I underestimated the complexity of such a project, which should be built upon an extremely-robust set of foundational documents of various categories. Mistakes I make during its implementation:

The foundational docs, while more comprehensive than those of any other repo, just weren’t comprehensive enough. They would have required several iterations of deep research with ChatGPT-Pro.
I approached it almost as a research project instead of something that would allow me to play it at every stage, even on the terminal.
I went breadth first: I wanted a significant set of gameplay mechanics in order to prove the emergent nature of the simulation, and found myself programming takeovers of institutions before I had properly proved that people could eat and shit regularly without dying.
By the time I figured out that even though I had banditry in, AI agents in simulations weren’t able to sustain their basic needs, the project was basically crippled; I just hadn’t accepted it yet. I should have built up step by step, being able to play it to the fullest along the way, and only add features that I missed while playing.
The AI architecture, while extremely competent in some regards, was crippled as well: I relied too much on GOAP because HTN felt like scripting at first. Then I realized that you could use HTN in a fully emergent way that respected the foundational principles, and relying on GOAP only for the leaf goals.

With this third project, I found myself also disheartened, and feeling that I was fixing mistakes in a crippled project, when likely I would need to build it from the ground up.

So, two days ago I decided to start from zero (a Re:Zero, you could say) both those complex projects: the card/board game one, and the emergent simulation one. It’s amazing how quickly projects progress once you’ve taken the lessons to heart and you’re on the right track. For the card/board game, I’m building it up following a ladder of game complexity. I already have Race to N, Tic-tac-toe, Connect Four, and Othello in. I’ve checked legality issues online, and found out that even if you don’t take money for it, publishing a page that simulates a game with an IP behind it is actually illegal (oops), so I couldn’t have uploaded the project with Fire in the Lake in regardless. While I intend to program in the kinds of games I’m most interested in playing (basically those on my gaming shelves), I won’t be able to upload them to the public repository.

Playing Othello and losing terribly

Regarding the simulation, that third project, it’s shaping up far better. I can play the current state of the simulation at any point through a terminal user interface, which at the moment is mostly rudimentary. The foundational documents are sets of dozens, and extremely comprehensive. I will no longer push for breadth, nor implement mechanics that I don’t miss through playing the simulation.

The important thing is that I’m feeling energized again to work on these projects. I woke up at six in the morning today eager to get back to it. And that’s the best state of being for a wrecked creature such as myself.

Personal breakthrough in worldbuilding

On April 20, 2026 By Jon UreñaIn Escritura, Arte, Inteligencia artificial, ProgramaciónLeave a comment

A few days ago I was watching a YouTube video on how terrible the writing and worldbuilding are in the latest Bethesda games, particularly in Fallout 4. Obviously I agree because they’ve gone steadily down since Morrowind, and good old Skyrim was likely the last thing they’ll ever do right. The YouTuber was going laboriously over all the incongruencies and canon breaks in the Fallout DLCs, particularly the Mechanist one; he argued that the introduction of raider-reprogrammed robots was a disaster canon-wise, because that means that regular settlements should have tons of robots tending to most menial tasks.

That got me thinking: why is worldbulding so hard? Is there a way to ensure that any canon addition never breaks existing canon? I told ChatGPT Pro to research in-depth regarding what narrative theory says about how to ensure worldbuilding is as robust as possible, and that every canon addition is evaluated against invariants and previously-established canon facts. Basically, every canon addition should propagate throughout the different aspects of the story world, and the invariants should either be preserved if they’re hard, or modified if they’re soft. Existing canon facts may need to be modified or reframed.

As I was analyzing ChatGPT’s report on the specific procedures and questions that world creation and every canon addition should ideally entail, which would take a human being way too long and too much brainpower, I thought that surely this can be formalized through AI. So I got Claude Code to the task.

In summary: worldbuilding and canon additions can absolutely be formalized, and the results are exceptional. I’ve ended up with several base files for the story world I’m working on (a fantasy one with sentient animal humanoids and in which magic is solely artifact-based, and the artifacts are like radioactive hazards). The AI maintains now the following base files: a canon ledger, economy and resources, everyday life, geography, institutions, invariants, magic or tech systems, mystery reserve, ontology, open questions, peoples and species, timeline, and world kernel.

Here’s the entire current content of the invariants markdown file, one of the shortest ones. Every CF-XXXX entry is a canon fact that I added and that Claude evaluated against the entire corpus.

# INVARIANTS — Animalia (note by me now: nevermind the name; it’s just an identifier)

World-level truths. New canon must not violate these without explicit user-approved revision.

—

## Ontological Invariants

### ONT-1 — Sentience requires biological embodiment

– **Statement**: All sentient peoples are biological, embodied beings with species-typical bodies, drives, lifespans, and senses. Sentience does not exist disembodied in this world. There are no ghosts who think, no walking gods, no machine minds.

– **Rationale**: Premise asks for low-magic, lived-in texture. Disembodied minds shatter the gritty frame and the species-as-civic-fact rule.

– **Examples**: a beaver-folk canal-master, an aurochs-folk wagon-driver, a human magistrate.

– **Non-examples**: ghosts who give orders, an AI ruling a city, a god walking the streets.

– **Break conditions**: only by explicit user-approved cosmological revision.

– **Revision difficulty**: high.

– <!– added by CF-0022 –> **Clarification (CH-0002)**: Modern crafted artifacts (CF-0021) sometimes behave as if “possessed” by entities with apparent agency. At the world level, these vessel-hosted agencies are NOT sentient; ONT-1 is preserved without exception. The in-world dispute about their nature (instructions / beastly / sentient) is observationally unresolvable and is tracked under Mystery Reserve M-6. The dispute does NOT cross-apply to animal-folk sentience (M-5 firewall).

– <!– added by CF-0029 –> **Extended Clarification (CH-0004)**: Magically-animated Maker-Age guardian constructions encountered in enterable ruins (CF-0029) similarly exhibit apparent autonomy without world-level sentience. ONT-1 is preserved without exception. The phenomenon is distinct in origin from modern-crafted vessel-hosted agencies (CF-0022) — Maker-Age guardian ≠ modern crafted apparent-agency — and the two are NOT to be conflated. Guardian-construction mechanism is tracked under Mystery Reserve M-8; presence-heterogeneity (why some ruins have guardians and some don’t) under M-10. The M-5 firewall against animal-folk sentience cross-application applies equally here.

– <!– added by CF-0035 –> **Fourth Clarification (CH-0009)**: Artifact-contaminated mutated non-sentient beasts (CF-0035) in wilderness-distal sites remain categorically NON-SENTIENT. Artifact exposure does NOT produce sentience, speech, tool-use, or proto-folk status in a non-sentient host — morphological and capability alteration is not a step toward sentience. ONT-1 is preserved without exception; the M-5 firewall against cross-application to animal-folk sentience holds for the fourth time (after CF-0022, CF-0029, CF-0031). Mutated-beast phenomenon is DISTINCT in origin from both modern-crafted vessel-hosted agencies (CF-0022) and Maker-Age guardian constructions (CF-0029) — three surface-similar Maker-origin phenomena must NOT be conflated; each has its own bounded-unknown mechanism surface (M-6 / M-8 / new M-15 for the gargantuan-underground scale-tail).

### ONT-2 — Magic exists only as artifact, not as learnable art

– **Statement**: Living people cannot cast spells from will alone. Magical effects are produced by physical artifacts — ceramic, wood, metal vessels and devices — predominantly made by lost makers, with a marginal modern stream produced by leaked-grimoire crafter attempts (CF-0021). Some can be used; some can only be contained; many do nothing recognizable until they fail.

– **Rationale**: Premise. Forces the magic-as-hazard texture and prevents the world drifting into hereditary-mage high fantasy.

– **Examples**: ceramic containment vessel for extracting corruption; an artifact that mind-controls animals into wading water; a modern crafter’s wood-vessel that hums and once destroyed three village hens before going inert.

– **Non-examples**: a wizard throwing fire from his hands; a hereditary mage bloodline; a learned spell.

– **Break conditions**: only by explicit user revision.

– **Revision difficulty**: high.

– <!– added by CF-0021 –> **Annotation (CH-0002)**: The “made by lost makers” clause now includes a marginal modern reverse-engineering. The artifact-as-mediator constraint holds — what is learned by crafters is the craft of producing vessels, not the channeling of magic through the practitioner. The magic still lives in the artifact, not in the maker.

– <!– added by CF-0039 –> **Destruction-Physics Clarification (CH-0014)**: The artifact-as-locus clause carries material-physical consequence: the binding of magical effect to the vessel in Maker-Age artifacts (CF-0039) confers destruction-resistance on the vessel itself — destruction-attempts BIND the effect more deeply rather than unmaking it, and the vessel’s material resistance is elevated accordingly. This is a property OF the artifact-as-locus, NOT a new channel through the crafter. The destruction-physics is SCOPED to Maker-Age artifacts; modern crafter outputs (CF-0021) remain destructible by ordinary means, which preserves CF-0021’s “inferior” stabilizer register. ONT-2 holds without exception.

### ONT-3 — Species do not interbreed

– **Statement**: A hyena-folk and a human cannot produce a child. Cross-species sexual relationships exist socially but are reproductively sterile. Family across species is built through marriage, fostering, and adoption — not blood.

– **Rationale**: Without this rule, speciation collapses, embodiment loses meaning, and “halfbreed” plot conveniences erode the social texture. (Logged for OPEN_QUESTIONS as user-revisable.)

– **Examples**: cross-species marriage with adopted heirs; a human noble and a fox-folk consort raising a fostered hare-folk child.

– **Non-examples**: hybrid offspring, “half-cat half-human” characters.

– **Break conditions**: requires user approval; would force redesign of kinship, succession law, and species cohesion.

– **Revision difficulty**: high.

– <!– added by CF-0035 –> **Clarification (CH-0009)**: “Chimeric” morphology observed in artifact-contaminated mutated non-sentient beasts (CF-0035) — fused features, parallel limbs, merged apparent anatomy — is ARTIFACT-EFFECT on an already-contaminated individual organism, NOT cross-species reproductive hybridization. Species reproductive boundaries are preserved: no mutated-beast lineage arises from sexual interbreeding across species. The chimeric register applies to the shaped-by-artifact body, not to the reproductive act. ONT-3 holds without exception.

—

## Causal Invariants

### CAU-1 — Artifact effects always cost

– **Statement**: Every effect drawn from a magical artifact costs the user, the host, the environment, or all three. Costs include fatigue, host trauma, ward attrition, environmental “bleed-through” (dimming light, gloom, animal compulsion), and slow contamination of place.

– **Rationale**: Brief specifies the ceramic containment procedure is brutal to the host and that artifacts can corrupt water and mind-control beasts.

– **Examples**: extraction leaving the host bedridden; ward attrition near a contained artifact; canal water gone “wrong” downriver of a buried device.

– **Non-examples**: a free, clean magical effect; a costless ward.

– **Break conditions**: forbidden — would invalidate the artifact-as-hazard frame.

– **Revision difficulty**: high.

– <!– added by CF-0035 –> **Cost-Taxonomy Extension (CH-0009)**: The cost taxonomy includes a **rare survival-as-mutation outcome** observed in non-sentient fauna (CF-0035) exposed to uncontained artifacts over time in wilderness-distal sites. Most beasts die of artifact exposure (ordinary CAU-1 lethality); a rare subset survive and exhibit morphological and capability alteration. Survival does NOT exempt the environment, the local ecology, or subsequent secondary hosts from cost; the rare “beneficial for the beast in its niche” outcome (dangerous predator status) is an individual-level windfall embedded in a population-level cost catastrophe (most died). Cost universality holds without exception.

– <!– added by CF-0039 –> **Cost-Taxonomy Extension (CH-0014) — destruction-attempt cost-transfer**: The cost taxonomy includes **destruction-attempt cost-transfer** as a variant form observed on Maker-Age artifacts (CF-0039). Attempts to destroy a Maker-Age artifact — by smithing, fire, crushing, dissolution, or other means available to current-age capability — do NOT annihilate the magical effect; the attempt instead BINDS the effect more deeply into the vessel, elevating the material resistance of the substrate (wood hardens to behave like strong metal; ceramic sets harder; metal refuses the hammer). The cost manifests partially as a material-elevation cost on the vessel (transformation into higher-order resistance) and partially as an attempt-cost on the attempting party (injury, exhaustion, equipment loss, ward-attrition radius). The cost-universality principle holds without exception — destruction-attempts do not escape cost; they redirect it. Cost universality holds without exception. Also see CF-0041 sealed-inert extension: opening a sealed Maker container ACTIVATES the artifact within with CAU-1 cost, making unknown-container unsealing a feared cultural moment.

– <!– added by CF-0040 –> **Cost-Taxonomy Reaffirmation (CH-0014) — mundane-tier per-artifact contamination**: CF-0040 commits that mundane-tier Maker-Age artifacts (breeze-dolls, tick-mirrors, weak coin-sorters, scar-rubbers) are commonly found in ordinary-life contexts; the CAU-1 principle still holds without exception for this tier. Every mundane-tier artifact carries a cost surface — slow-bleed on user, room-misfeel, animal unease, household water off-flavor, vessel fatigue across months of proximity. No “harmless charm” reading is world-truth; the low-tier artifact is low-tier COST not no-cost. AES-3 contamination-clause compliance is mandatory for every mundane-tier CF-record annotation.

### CAU-2 — Corruption produces diagnostic environmental and behavioral signals

– **Statement**: Bleed-through gloom, dimming light, objects feeling “wrong,” compulsions in animals or people are diagnostic of nearby uncontained artifact activity. Practitioners (and seasoned canal-folk) read these signals.

– **Rationale**: Brief specifies bleed-through symptoms used as diagnostic by practitioners.

– **Examples**: a canal-side beaver-folk noticing fish behaving strangely and calling for an inspector; a ward-wright walking the perimeter and feeling the air thin.

– **Non-examples**: invisible, undetectable magical contamination.

– **Break conditions**: forbidden.

– **Revision difficulty**: high.

### CAU-3 — Wards are a public-but-restricted social-technical system

– **Statement**: Wards are publicly understood as a containment system. Their existence, locations, and the concept of “ward breach” are common knowledge. Their specifics — the inscriptions, the materials, the maintenance schedules — are restricted speech, regulated by guild charter.

– **Rationale**: Brief specifies ward-breach as known, and that public discussion of wards carries social risk.

– **Examples**: a child knows what a ward marker looks like; an apprentice cannot legally describe the inscription pattern aloud in a tavern.

– **Non-examples**: wards as secret unknowable magic; wards as fully open public technology.

– **Break conditions**: forbidden.

– **Revision difficulty**: medium.

—

## Distribution Invariants

<!– added by CF-0040 –> ### DIS-1-EXT (CH-0014) — Mundane-tier is explicit subset of “most inert junk” band, not a new distribution tier

CF-0040 commits the mundane-tier (near-imperceptible-effect artifacts: breeze-dolls, tick-mirrors, weak coin-sorters, scar-rubbers) as an EXPLICIT naming of the LARGEST share of the DIS-1 “most inert junk” band, reframed: most of what DIS-1 described as “inert” is in fact LOW-TIER ACTIVE, carrying per-artifact CAU-1 contamination clauses. Inverse strength-rarity relationship holds across the artifact corpus: mundane-tier is the most common; catastrophic-class is the rarest. The mundane-tier IS a subset within DIS-1, NOT a new distribution tier at the distribution-invariant level.

### DIS-1 — Artifacts are routinely turned up underground

– **Statement**: Foundations, canal-digging, mining, and ruin-clearance regularly turn up artifacts. Most are inert junk. A small fraction are dangerous. A tiny fraction are catastrophic.

– **Rationale**: Brief specifies routine construction can unearth them.

– **Examples**: a beaver-folk canal-crew finding a ceramic shard that registers as inert; a foundation dig that uncovers a humming metal disc and triggers a quarter-town evacuation.

– **Non-examples**: artifacts as so rare no one has ever seen one; artifacts as so common they fill marketplaces.

– **Break conditions**: would change the entire artifact economy.

– **Revision difficulty**: medium.

– <!– added by CF-0027 –> **Cross-reference (CH-0004)**: Enterable Maker-Age ruin sites (CF-0027) are a recognized subset of Maker substructures that admits organized multi-professional entry; piecemeal unearthing per DIS-1 remains the dominant modality, but enterable-ruin expeditions (CF-0028) are the less-common, high-risk, high-yield variant. The DIS-1 distribution pattern (most inert, small fraction dangerous, tiny fraction catastrophic) applies equally to enterable-ruin recoveries; catastrophic-class finds trigger resealing rather than recovery (preserves Mystery Reserve M-2).

– <!– added by CF-0035 –> **Wilderness-Distal Subset (CH-0009)**: A subset of DIS-1 unearthings occurs in wilderness beyond civic / guild / chartered-watch reach (earthquake-exposed, flood-exposed, burrowing-animal-exposed, slow-soil-creep-exposed). These exposures are not detected by inspectors and not contained by chartered response; the artifacts remain active in place. Over time this subset produces the CF-0035 contaminated-fauna phenomenon. Inspector-dispatch to wilderness-distal sites is DISCRETIONARY (gated by patron-funder or estate-commission or declared civic-watch predicate), not mandatory — cost, escort requirement, and wilderness reach preclude routine extension. DIS-1 distribution pattern holds within the wilderness-distal subset as everywhere else.

### DIS-2 — Literacy and occult-fragment access is partial, not aristocratic monopoly

– **Statement**: Literacy is not universal but is not gated to nobility. Occult text fragments can be purchased by anyone with coin and a willing seller. Specialist knowledge is gated by guild and apprenticeship more than by class.

– **Rationale**: Brief specifies fragment purchase is possible.

– **Examples**: a tavern bard who can read trade-tongue; a journeyman extractor who owns three fragments; a literate canal-master.

– **Non-examples**: only nobles can read; only priests own books.

– **Break conditions**: would change the structure of knowledge access.

– **Revision difficulty**: medium.

### DIS-3 — Mythical-species sentients are population-rare and locally clustered

– **Statement**: Mythic-species sentients (basilisk-folk, chimera-folk, manticore-folk, gryphon-folk, naga-folk, etc.) exist among animal-folk but are population-rare per region. You do not see a chimera in every market.

– **Rationale**: Prevents specialness inflation. Preserves wonder. Matches the brief’s note that exotic-species bodies attract attention and commentary in public.

– **Examples**: a single naga-folk bargemaster known by reputation across a canal corridor; a basilisk-folk physician practicing in only one city.

– **Non-examples**: mythic species as common as cat-folk.

– **Break conditions**: would inflate mythic-species into background-noise.

– **Revision difficulty**: medium.

– <!– added by CF-0035 –> **Firewall (CH-0009)**: Artifact-contaminated mutated non-sentient beasts (CF-0035) are CATEGORICALLY DISTINCT from Cluster D mythic-species sentient folk. The surface similarity (unusual morphology; chimeric features; wonder-adjacent register) must NOT be conflated: mutated beasts are non-sentient fauna with artifact-shaped bodies; mythic-species sentient folk are civic-participant peoples with species-typical embodiment (ONT-1/SOC-1). The firewall holds across all encounter contexts — no in-world institution may process a mutated beast as a mythic-species individual, and no mythic-species individual may be treated as a mutated-beast specimen.

—

## Social Invariants

### SOC-1 — Animal-folk can occupy any class

– **Statement**: Class mobility is not species-coded. Animal-folk can hold landed nobility, civic authority, guild mastery, and craft livelihoods. A beaver-folk magistrate and a human laborer are both ordinary.

– **Rationale**: Brief explicit.

– **Examples**: an aurochs-folk landowning estate that employs human and otter-folk laborers; a hyena-folk magistrate presiding over a mixed-species court.

– **Non-examples**: caste systems where humans rule and animal-folk serve; species-coded slavery.

– **Break conditions**: forbidden.

– **Revision difficulty**: high.

– <!– added by CF-0036 –> **Clarification (CH-0010)**: individual and sectarian species-prejudice exist as ordinary sociological phenomena (tavern slurs, labor-prejudice distancing, crew-composition sorting preferences, marriage-broker catechism extensions). Isolated single-species exclusionary settlements exist in weak-charter / demographically-fragile regions. Supremacist sectarian doctrine asserting species entitlement to violence against other sentient peoples exists as CONTESTED-CANON sectarian belief held by a fringe. An interspecies-cannibal sub-subset exists within CF-0034 outlaw bands. **None of these constitute world-level class-coded species hierarchy.** Civic charters continue to suppress hierarchy-speech at world level; no chartered polity adopts supremacist doctrine as civic policy; no guild charter gates intake by species; no magistrate-court recognizes a “lesser sentient” legal category. SOC-1 holds without exception. Individual-and-sectarian friction ≠ civic hierarchy; the former is ordinary social phenomenon, the latter is the forbidden break-condition.

### SOC-2 — Public adult barter is legal and visible in many regions

– **Statement**: Sexual services can be publicly posted, framed as pragmatic exchange, and treated by many locals as mundane. Scandal attaches to the character of those involved, not to the act itself. Regional and class variation exists.

– **Rationale**: Brief explicit.

– **Examples**: a posted price-list at a tavern; a respectable courtesan whose patrons include guild-masters; quiet disapproval from a stricter sectarian household.

– **Non-examples**: blanket criminalization; blanket destigmatization.

– **Break conditions**: would shift the texture significantly; medium revision difficulty.

– **Revision difficulty**: medium.

### SOC-3 — Coin contract is sacred by custom

– **Statement**: A payment contracted in coin or in-kind must be honored, or restitution made. Breach is a recognized civil and customary wrong; reputation damage compounds the legal cost. Trade across species depends on this norm.

– **Rationale**: Derived from the brief’s coin-and-livestock framing; necessary for stable trade across species and regions.

– **Examples**: a tavernkeeper hounding a defaulted patron through three towns; a guild withholding work from a known oath-breaker.

– **Non-examples**: payments routinely renegotiated after delivery without consequence.

– **Break conditions**: low — story tension often arises from breach.

– **Revision difficulty**: low.

### SOC-4 — Artifact extraction and traffic is guild-licensed

– **Statement**: Extractors, containment-wrights, and artifact-brokers operate under chartered guilds in most polities. Unlicensed possession of magical artifacts carries criminal or civil penalties. Black markets exist but operate at risk.

– **Rationale**: Derived from brief’s “professions and guilds exist to extract … research … sell … to bidders” plus the social risk around ward discussion.

– **Examples**: a guild-stamped artifact in a noble’s collection; a smuggler executed for trafficking an unregistered device.

– **Non-examples**: anyone can dig and sell freely.

– **Break conditions**: medium.

– **Revision difficulty**: medium.

—

## Aesthetic / Thematic Invariants

### AES-1 — Heroism is paid in coin and scars, not glory

– **Statement**: Risk-taking is treated as labor. Veterans are quietly proud and dryly mocking of glory-talk. Songs that romanticize battle exist but are sung mostly by those who never fought.

– **Rationale**: Tonal contract. The “scarred veteran with longsword” framing of the brief.

– **Examples**: a pension-list at a guildhall; a tavern song that ends with an unpaid widow.

– **Non-examples**: chosen-one narratives treated as world-truth.

– **Break conditions**: would betray tone.

– **Revision difficulty**: high.

### AES-2 — The ordinary keeps the world honest

– **Statement**: Daily life — canal traffic, livestock, tavern songs, winter dread, child-rearing — is always present in the texture. The world cannot be experienced solely from the perspective of heroes, nobles, or cosmologists.

– **Rationale**: Skill discipline; brief emphasis on lived realism.

– **Examples**: a story whose stakes are framed in terms of who eats this winter; a battle scene whose aftermath shows the field being looted by tenants.

– **Non-examples**: a world rendered only as cosmology and ruling families.

– **Break conditions**: would betray world identity.

– **Revision difficulty**: high.

### AES-3 — The magical and the contaminated are aesthetically allied

– **Statement**: Wonder and dread are inseparable in this world. Magic is never simply pretty. Every magical experience carries contamination — physical, social, or moral.

– **Rationale**: Brief.

– **Examples**: an artifact that is beautiful and lethal in the same breath; a containment-wright respected and quietly avoided. <!– added by CF-0035 –> A six-limbed boar carcass hauled to a ward-inspector’s door — the hunters dryly proud, the villagers quietly moving the children indoors; the trophy hung in a guildhall smoke-room, not paraded.

– **Non-examples**: charming, harmless magical decor.

– **Break conditions**: forbidden.

– **Revision difficulty**: high.

I used to hate worldbuilding, but this method makes it fun, as well as a solved problem. This is basically programming-as-prose. It’s already revolutionizing many industries.

LudoForge #4

On January 31, 2026 By Jon UreñaIn gaming, Inteligencia artificial, ProgramaciónLeave a comment

Now that the evolutionary process to grow game definitions is progressing at a steady pace in my app, named LudoForge, I fed the architectural docs to ChatGPT so that it would write a good explanation on how it works. It may be interesting to those curious about how complex systems can grow organically through an evolutionary algorithm that mimics biological evolution.

Teaching a computer to invent tabletop games (and occasionally rediscover the classics)

If you squint, evolving game designs is a lot like evolving creatures: you start with a messy ecosystem of “mostly viable” little organisms, you test which ones can survive in their environment, you keep the best survivors—but you also keep a diverse set of survivors so the whole population doesn’t collapse into one boring species.

In our system, the “organisms” are game definitions written in a small, strict game DSL (a structured way to describe rules: players, state variables, actions, effects, win/lose conditions, turn order, and so on). Each candidate game definition is wrapped in a genome: basically an ID plus the full definition.

From there, the evolutionary loop repeats: seed → simulate → score → place into niches → mutate elites → repeat.

1) Seeding: where the first games come from

Evolution needs a starting population. We can generate seeds automatically, or import them from disk, or mix both approaches. The important bit isn’t “are the seeds good?”—it’s “are they valid and diverse enough to start exploring?”

So seeds must pass two kinds of checks before they’re even allowed into the ecosystem:

Schema validation: does the JSON structure match the DSL’s required shape?
Semantic validation: does it make sense as a playable ruleset (no broken references, impossible requirements, etc.)?

And there’s a third, subtle filter: when we place games into “niches” (more on that below), seeds that land only in junk bins like unknown/under/over are rejected during seed generation, because they don’t help us cover the design space.

Think of this as: we don’t just want “a bunch of seeds,” we want seeds scattered across different climates so evolution has many directions to run in.

2) Playtesting at machine speed: simulation as the “environment”

A human can’t playtest 10,000 games a day. A simulation engine can.

For every candidate game, we run automated playthroughs using AI agents (simple ones like random and greedy are enough to expose lots of structural problems). The engine repeatedly:

Lists the legal moves available right now
Checks termination (win/lose/draw conditions, cutoffs, loop detection)
Lets an agent pick an action (with concrete targets if needed)
Applies costs and effects, recording what happened
Advances the turn/phase according to the game’s turn scheduler

Crucially: when a complex game definition tries to do something illegal (like decrementing below a minimum, or targeting something that doesn’t exist), the engine records skipped effects/triggers instead of crashing, so the system can observe “this design is broken in these ways” rather than just failing outright.

This is the equivalent of an organism interacting with the world and leaving tracks: “it tried to fly, but its wings didn’t work.”

3) Turning playthroughs into numbers: metrics, degeneracy, and fitness

After simulation, we compute a set of analytics that act like proxies for design quality—things like:

Agency: did players have meaningful choices, or were they railroaded?
Strategic depth (proxy): how large is the typical decision space?
Variety: do many different actions get used, or does one dominate?
Interaction rate: are players affecting each other or only themselves?
Structural complexity: is this a tiny toy, or something richer?

These are not “fun detectors.” They’re sensors.

Then we run degeneracy detection: filters that catch the classic failure modes of randomly-generated rulesets:

infinite loops / repeated states
non-terminating games (hits max turns/steps)
games with no real choices
trivial wins, dominant actions, excessive skipped effects, etc.

Some degeneracy flags cause an outright reject (hard gate), others apply a penalty (soft pressure), and too many penalties at once can also trigger rejection.

Finally, all of this becomes a feature vector, and we compute an overall fitness score—the number evolution tries to increase.

4) “Growing in niches”: why we don’t keep only the top 1%

If you only keep the single highest-scoring game each generation, you get premature convergence: the population collapses into one design family and stops surprising you.

Instead, we use MAP-Elites, which you can picture as a big grid of “design neighborhoods.” Each neighborhood is defined by a few chosen descriptors (think: agency bucket, variety bucket, etc.). Each candidate game gets “binned” into a niche based on its descriptor values, and then it competes only with others in that same niche.

Each niche keeps its best resident (the “elite”).
Over time, the map fills with many different elites: fast games, slow games, chaotic games, skillful games, high-interaction games, and so on.

This is how you get a museum of interesting survivors, not one monoculture.

5) Reproduction: mutation (and why mutation is structured, not random noise)

Once we have elites across niches, we generate the next generation by mutating them.

Mutation operators aren’t “flip random bits.” They are rule-aware edits that make plausible changes to game structure, such as:

tweak a number (thresholds, magnitudes)
add/remove/duplicate actions (with safety guards so you can’t delete the last action)
add/remove variables (while rewriting dangling references)
change turn schedulers (round-robin ↔ simultaneous ↔ priority-based, etc.)
add triggers and conditional effects
modify termination rules (win/lose/draw conditions)

The key is: the operator library is rich enough to explore mechanics, not just parameters.

Mutation retries (because many mutations are duds)

Some mutations do nothing (“no-op”), or produce something that can’t be repaired. The runner will retry with a different operator a few times; if it still can’t produce a productive mutation, it falls back to keeping the parent for that offspring slot.

This keeps evolution moving without pretending every random change is meaningful.

6) Repair, rejection reasons, and staying honest about failure

After mutation, we may run repair (optional), then validation and safety gates. If a candidate fails, it’s not just dropped silently—the system classifies why it failed:

repair failure
validation failure
safety failure
evaluation error
evaluation returned null fitness/descriptors

And it persists these outcomes for observability and debugging.

This matters because “evolution” is only useful if you can tell whether the ecosystem is healthy—or if you’ve started breeding nonsense.

7) Adaptive evolution: learning which mutations are actually useful

Not all mutation operators are created equal. Some will be reliably productive; others will mostly create broken genomes.

So the runner tracks per-operator telemetry every generation: attempts, no-ops, repair failures, rejection counts, and how often an operator actually helped fill a new niche or improve an elite. evolution-runner

Those stats feed into adaptive operator weighting, so the system gradually shifts its mutation choices toward what’s working in the current region of the design space—without hardcoding that “operator X is always good.”

8) Optional superpower: motif mining (stealing good patterns from winners)

Sometimes an evolved game contains a little “mechanical phrase” that’s doing real work—like a neat resource exchange loop, or a repeating pattern of effects that creates tension.

When motif mining is enabled, we:

select elites (top per niche + global best)
re-simulate them to extract trajectories
mine repeated effect sequences (“motifs”)
convert those motifs back into DSL effects
feed them into a special mutation operator that can inject those motifs into new games

That’s evolution discovering a useful mechanic, then turning it into reusable genetic material.

9) Human taste enters the loop (without turning it into manual curation)

Metrics are helpful, but “fun” is subjective. So we can add human feedback:

ratings (“how good is this?”)
pairwise comparisons (“A or B?”)

Rather than asking the human to judge random games, the system uses active learning to choose the most informative comparisons—especially cases where its preference model is uncertain, and it tries to include underrepresented niches so taste is learned across the map.

Under the hood, the preference model is an ensemble trained online (so it can update continuously) and its uncertainty controls how much feedback to request per generation (adaptive budget).

Fitness can then blend:

“objective-ish” signals (metrics/degeneracy)
“human preference” signals

So the system doesn’t just breed games that are non-broken—it breeds games that align with what you actually enjoy.

10) Why this can rediscover known games and invent new ones

If your DSL is expressive enough to describe the rules of an existing game, then in principle there exists a genome that encodes it. Evolution doesn’t need to “know” the game—it only needs:

Search operators that can reach that region of the rulespace (structural mutations, not just numeric tweaks)
Selection pressure that rewards the behaviors that make that game work (choice, balance, interaction, clean termination, etc.)
Diversity preservation so the system keeps exploring many styles instead of collapsing early

Once those are in place, rediscovery becomes a side-effect of searching a huge space under the right constraints. And invention is what happens when evolution stumbles into combinations nobody tried on purpose—then keeps them because the ecosystem rewards them.

The simplest mental model

If you want the non-technical version in one breath:

We generate lots of small rulesets, machine-play them thousands of times, score them for “does this behave like a real game?”, sort the survivors into many different “design neighborhoods,” keep the best in each neighborhood, then make mutated children from those survivors—occasionally learning from human taste and reusing discovered mechanics—until the map fills with strong and varied games.

That’s the evolutionary process: not magic, not random, but a relentless loop of variation + playtesting + selection + diversity.

LudoForge #3

On January 30, 2026January 30, 2026 By Jon UreñaIn gaming, Inteligencia artificial, Programación1 Comment

If you don’t know about LudoForge, you should probably read the previous posts. In summary: I’m developing an app to evolve tabletop game prototypes according to a “fun” factor made up of plenty of fun proxies like player agency, strategic depth, etc. The code is now mature enough to run X evolutionary generations on demand, produce shortlists of the best games, and finish. That’s great because it means the code works well, although there’s always room for improvement. But the shortlist of four winners is absolutely terrible in a way that has had me giggling for a while. I’ll let ChatGPT explain these “games.”

Game 1: “The Shared Counter Chicken”

Two players take turns. There’s a shared number on the table that starts at 0 and can be increased up to 15. The “win condition” is simple: whoever makes it hit 15 wins instantly.

The game also includes a couple of other buttons that change other numbers… but those numbers can never actually produce a win. They’re basically decorative dials. One of them always gets set to 12, but the victory threshold for it is 20, so it’s like having a “Launch Rocket” button that always fuels you to 60% and then stops forever.

So what do players really do? They increment the shared counter… and then start playing a tiny psychological game of chicken: “I could push the counter closer to 15… but then you might get the final move.” So the game hands them some useless actions that function as “stalling.” If both players are stubborn, they can just keep stalling until the match times out.

It’s not strategy so much as who blinks first.

Game 2: “The Game Where You Can’t Move”

This one is my favorite kind of failure because it’s so clean.

The game defines two counters. The only action in the entire game says: “Decrease both counters.”

But the action is only allowed if both counters are greater than zero.

And the game starts with both counters at zero.

So on turn one, a player looks at the rules and… there is literally nothing they’re allowed to do. No move exists. The game doesn’t even fail dramatically. It just sits there like a vending machine with the power unplugged.

In other words: it’s a tabletop game prototype that begins in a stalemate.

Game 3: “First Player Loses: The Speedrun”

This one actually runs, which makes it even funnier.

There are two counters. One counter is the only one that can possibly lead to a win — the other one only goes downward forever, so its “win condition” is a permanent lie.

The “real” counter starts at 0, and there’s an action that increases it by 2. The victory threshold is 4.

Here’s what happens:

Player 1 goes first. Their only sensible move is: 0 → 2.
Player 2 goes next and does the same move: 2 → 4.
Player 2 instantly wins.

So the entire “game” is basically:

“Player 1 sets up the win for Player 2.”

It’s like a sport where the first player is forced to place the ball on the penalty spot and then politely step aside.

Game 4: “The Unwinnable Ritual”

Three players this time. There’s one shared counter. Winning requires the counter to reach 4.

But the rules only let you do two things:

Set it to 2.
Decrease it by 2 (if it’s above zero).

Notice what’s missing: any way to make it bigger than 2.

So the win condition is a castle in the sky. The game is a ritual where players take turns setting the number to 2, or knocking it back down toward 0. It can never, ever reach 4. No amount of cleverness changes that.

It’s essentially a machine that cycles between “2” and “less than 2” until you get bored or the turn limit ends it.

ChatGPT finished its report with this note: “The first evolutionary run didn’t produce brilliant board games. It produced life, in the same way early evolution produced algae and sea foam. These were rule-sets that technically existed… but didn’t yet deserve to.”

These idiotic games already provided fun by existing, so as far as I care, this app has already been a success. The good thing is that I have code to handle degeneracy, fitness, etc., so I simply have to tighten that code so some of these nonsensical points would either kill a genome or penalize its fitness.

By the way, earlier tonight I was playing tennis in VR with people across Europe while I waited for Claude Code to work through the tickets of one of the app’s new features. And yet the world is terrible. We live in the lamest cyberpunk dystopia imaginable.

LudoForge #2

On January 28, 2026 By Jon UreñaIn gaming, Inteligencia artificial, Programación1 Comment

I fed the previous post to the Deep Dive team so it would produce a podcast. They did a great job.

LudoForge #1

On January 28, 2026 By Jon UreñaIn Inteligencia artificial, Programación1 Comment

Two nights ago I was rolling around in bed trying to sleep when a notion came into my head, one that has returned from time to time: some of the most flow-like fun I’ve ever had was playing tabletop games. I’m a systems builder by nature, and I love to solve problems with a variety of tools. Tabletop games are complex problems to solve with specific series of tools. My favorite tabletop game is Arkham Horror LCG, although I’ve loved many more like Terraforming Mars, Ark Nova, Baseball Highlights: 2045, Core Worlds, Imperium, Labyrinth, Renegade… But none of them fully captured me. Like some potential game exists that has exactly every feature my brain yearns for, but that game doesn’t exist. I’ve cyclically thought that I should create that game, but I never know where to start. I don’t even know what exactly I want, other than knowing that what I’ve experienced isn’t enough.

These past few weeks I’ve been implementing extremely-complex analytics reports generators for my repository Living Narrative Engine. I was surprised to find out that it’s feasible to mathematically find gaps in extremely complex spaces (dozens of dimensions) as long as they’re mathematically defined. I guess Alicia was justified to be obsessed with math. So I started wondering: what makes a tabletop game good? Surely, the fun you have with it. Can “fun” be mathematically defined? Is it the agency you have? The strategic depth? The variety? If any of such metrics could be mathematically defined, then “fun” is a fitness score that combines them.

And what if you didn’t need to design the game yourself? If you can map a simulated game’s activity to metrics such as the agency per player, the strategic depth, the variety… Then you can evolve a population of game definitions in a way that, generation after generation, the “fun” score improves. If you can turn all game mechanics into primitives, the primitives will mutate in and prove their worth throughout the generations, composing coherent mechanics or even inventing new ones. Initially, a human may need to score game definition variants according to how “fun” the playthrough of those games were, but in the end that could be automated as well.

Because this is the era of Claude Code and Codex, I’ve already implemented the first version of the app. I’ve fed ChatGPT the architectural docs and told it to write a report. You can read it down below.

LudoForge: evolving tabletop games with a deterministic “taste loop”

I’m building LudoForge, a system that tries to answer a pretty blunt question:

What if we treated tabletop game design like search—simulate thousands of candidates, kill the broken ones fast, and let a human “taste model” steer evolution toward what’s actually fun?

Under the hood, it’s a seeded-population evolution loop: you start with a set of game definitions (genomes), run simulations, extract metrics, filter degeneracy, blend in learned human preferences, and then evolve the population using MAP-Elites and genetic operators. Then you repeat.

The big picture: the loop

LudoForge is structured as a pipeline with clean seams so each layer can be tested and swapped without turning the whole thing into spaghetti. The stages look like this: seed → evaluate → simulate → analytics → (optional) human feedback → fitness → MAP-Elites → (optional) mutate/crossover/repair → next generation. pipeline-overview

A key design choice: the core expects a seeded population. There’s no “magic generator” hidden inside that invents games from scratch. If you want a generator, you build it outside and feed it in. That keeps the engine honest and debuggable. Note by me after rereading this part of the report: this will change soon enough.

Games as genomes: a DSL that can be validated and repaired

Each candidate game is a genome: { id, definition }, where definition is a DSL game definition. Before any evaluation happens, the definition goes through schema + semantic validation—and optionally a repair pass if you enable repair operators. Invalid DSL gets rejected before it can contaminate simulation or preference learning.

Repair is deliberately conservative: it’s mostly “DSL safety” (e.g., clamp invalid variable initial values to bounds). Anything that’s “this game is technically valid but dumb/unplayable” is handled by simulation + degeneracy detection, not by sweeping edits that hide the real problem.

The simulation engine: deterministic playthroughs with real termination reasons

The simulation layer runs a single playthrough via runSimulation(config) (or wrapped via createSimulationEngine). It builds initial state from the definition, picks the active agent, lists legal actions, applies costs/effects/triggers, advances turns/phases, and records a trajectory of step snapshots and events.

It’s also built to fail safely:

No legal actions → terminates as a draw with terminationReason = "stalemate".
Max turns exceeded → terminationReason = "max-turns" with an outcome computed in that cutoff mode.
Loop detection (optional hashing + repetition threshold) → terminationReason = "loop-detected".

Most importantly: runs are reproducible. The RNG is a seeded 32-bit LCG, so identical seeds give identical behavior.

Metrics: cheap proxies first, expensive rollouts only when you ask

After simulation, LudoForge summarizes trajectories into analytics: step/turn counts, action frequencies, unique state counts, termination reasons, and sampled “key steps” that include legalActionCount.

From there it computes core metrics like:

Agency (fraction of steps with >1 legal action)
Strategic depth (average legal actions per step)
Variety (action entropy proxy)
Pacing tension (steps per turn)
Interaction rate (turn-taking proxy)

Extended metrics exist too, and some are intentionally opt-in because they’re expensive:

Meaningful choice spread via per-action rollouts at sampled decision points
Comeback potential via correlation between early advantage and final outcome

Here’s the honest stance: these metrics are not “fun”. They’re proxies. They become powerful when you combine them with learned human preference.

Degeneracy detection: kill the boring and the broken early

This is one of the parts I’m most stubborn about. Evolution will happily optimize garbage if you let it.

So LudoForge explicitly detects degeneracy patterns like:

loops / non-termination
stalemates
forced-move and no-choice games
dominant-action spam
trivial wins metrics-and-fitness

By default, those flags can reject candidates outright, and degeneracy flags also become part of the feature vector so the system can learn to avoid them even when they slip through.

Human feedback: turning taste into a model

Metrics get you a feature vector. Humans supply the missing ingredient: taste.

LudoForge supports two feedback modes:

Ratings (1–5) with optional tags and rationale
Pairwise comparisons (A/B/Tie) with optional tags and rationale

Pairwise comparisons are the main signal: they’re cleaner than ratings and train a preference model using a logistic/Bradley–Terry style update. Ratings still matter, but they’re weighted lower by default.

There’s also active learning: it selects comparison pairs where the model is most uncertain (predicted preference closest to 0.5), while reserving slots to ensure underrepresented MAP-Elites niches get surfaced. That keeps your feedback from collapsing into “I only ever see one genre of game.”

Fitness: blending objective proxies, diversity pressure, and learned preference

Fitness isn’t a single magic number pulled from the void. It’s a blend:

Base composite score from metrics (weighted sum/objectives)
Diversity contribution (pressure toward exploring niches)
Preference contribution from the learned preference model (centered/capped, with bootstrap limits early on)

Feature vectors are keyed by metric id (not positional arrays), which matters a lot: adding a new metric doesn’t silently scramble your model weights. Renaming metrics, though, becomes a migration event (and that’s correct—you should feel that pain explicitly).

Evolution: MAP-Elites + mutation/crossover that respect DSL validity

Instead of selecting “top N” and converging into a monoculture, LudoForge uses MAP-Elites: it bins candidates into descriptor niches and keeps the best elite per niche.

Descriptor binning is explicit and deterministic (normalize → floor into bin count; clamp to range), and niche ids serialize coordinates like descriptorId:bin|....

Then you can evolve elites with genetic operators:

Mutations like numeric tweaks, boolean toggles, enum cycling, duplicating/removing actions, nudging effect magnitudes, adding/removing phases, rewriting token/zone references safely, etc.
Crossover via subtree swaps of state.variables or actions, followed by DSL re-validation.

Optional “shortlisting” exists too: it picks a diversified subset of elites for human review using a max-min distance heuristic over descriptor coordinates.

What’s already proven (and what isn’t yet)

This isn’t vaporware; the end-to-end tests already prove key behaviors like:

the ordered phases of the pipeline
invalid DSL rejection before evaluation
safety cutoffs (max-turns) and deterministic seeded outputs
human prompt loops and legality enforcement
deterministic state transitions
MAP-Elites producing stable ids
active learning selection behavior
mutation + repair at scale, including crossover

And there are explicitly documented gaps—like extended metrics aggregation and worker-thread batch simulations.

The point of LudoForge

I’m not trying to build a “game designer replacement.” I’m building a design pressure cooker:

Simulate hard
Reject degeneracy ruthlessly
Measure what you can
Ask humans the right questions
Let evolution explore breadth, not just a single hill

If you’re into procedural design, evolutionary search, or just enjoy the idea of treating “fun” as something you can iteratively approximate with a human-in-the-loop model, that’s what this project is for.

Living Narrative Engine #19

On January 26, 2026 By Jon UreñaIn Arte, Escritura, gaming, Inteligencia artificial, ProgramaciónLeave a comment

I have quite the treat for you fuckers. I’ve recorded myself playing through my test scenario involving Alicia Western. More than an hour of me speaking in my accented English even though I rarely speak in real life, and showing off a fun, frustrating playthrough that made me hungry.

This is, of course, related to my beloved Living Narrative Engine. Repo here.

Living Narrative Engine #18

On January 25, 2026January 25, 2026 By Jon UreñaIn Inteligencia artificial, Programación1 Comment

I’m building a browser-based app to play immersive sims, RPGs and the likes. In practice, I use it to set up short story scenarios or elaborate gooning sessions. I dared myself to build the most comprehensive psychological system imaginable, so that Sibylle Brunne, a 34-year-old orphan living in her parents rustic home somewhere in the Swiss mountains, while controlled by a large language model, would realistically bring her blue-eyed, blonde-hair-braided, full-breasted self to seduce my teenage avatar who is backpacking through the country, eventually convincing me to stay in her house so she can asphyxiate me with her mommy milkers.

Here’s a visual glimpse of the current complexity:

Alicia has become my test subject, as if she didn’t have enough with freezing to death. The system works like this: at the base you have mood axes (like pleasant <-> unpleasant), which change throughout a scene. Actors also have permanent biological or personality-based traits like aversion to harm. Together, mood axes and affect traits serve as weights and gates to specific emotion prototypes like disappointment, suspicion, grief. Delta changes to those polar mood axes naturally intensify or lessen the emotions. I also have sexual state prototypes, which work the same as the emotional states.

These emotional and sexual states serve as the prerequisites for certain expressions to trigger during play. An expression is a definition that tells you “when disappointment is very high and suspicion is high, but despair is relatively low, trigger this narrative beat.” Then, the program would output some text like “{actor} seems suspicious but at the same time as if they had been let down.” The descriptions are far better than that, though. The actors themselves receive in their internal log a first-person version of the narrative beat, which serves as an internal emotional reaction they need to process.

It all works amazingly well. However, to determine if I was truly missing mood axes, affect traits or prototypes, I had to create extremely complex analytics tools. I’ve learned far too much about statistical analysis recently, and I don’t really care about it other than for telling a system, “hey, here are my prototype sets. Please figure out if we have genuine gaps to cover.” Turns out that to answer such a request, some complex calculations need to map 20-dimensional spaces and find out diagonal vectors that run through them.

Anyway, I guess at some point I’ll run my good ol’ test scenario involving Alicia, with her now showing far more emotion than she used to before I implemented this system. That’s a win in my book.

manga2cbz: read manga in VR

On January 23, 2026 By Jon UreñaIn Arte, Inteligencia artificial, ProgramaciónLeave a comment

I’ve been reading manga for a long time, usually relying on my old-ass tablet and scanlations (or whatever they’re called). I came across the Livro app for the Meta Quest 3, and I intended to read manga on it. However, I found out you can’t move the manga folders like you would on a tablet.

So, this morning I had Claude Code create a Go app that can be baked into a Linux/Windows exec to compress manga chapters into corresponding cbz files. Those files can just be copied into the Quest 3 through a link cable, and opened on Livro. Because Livro has trouble rendering WebP files, my app also converts WebP to PNG. As you can see in the video, it works.

Here’s the repo for the app:

https://github.com/joeloverbeck/manga2cbz

Living Narrative Engine #16

On January 6, 2026January 7, 2026 By Jon UreñaIn gaming, Inteligencia artificial, Programación1 Comment

A couple of nights ago, at two in the morning, I was rolling around in bed thinking about my current obsessions: the browser-based app Living Narrative Engine as well as Alicia Western, the tragic character from Cormac McCarthy’s last two novels. Recently I mixed them both by playing through a scenario in LNE that featured Alicia. I “novelized” that little bit of theater in the short story You Will Spend the Rest of Your Life.

Well, I wasn’t entirely happy with Alicia’s acting. Yes, she’s an analytical gal, but she’s in a deep hole there. I wanted to feel the despair from her. The relief. I wanted to see her cry. I wanted to cause a beautiful, blonde woman at the end of her rope to cry. And she didn’t.

As I thought about whether this was a solvable issue, my dear subconscious had a spark of genius: LLM-based characters in LNE already create thoughts, speech, notes, and choose actions. Why not task them with tracking mood changes?

Some deep research and several iterations later, ChatGPT and I came up with the following notions, which are displayed below in a lovely manner, as they appear on the game page of LNE.

The simulation relies on seven base mood axes: valence, arousal, agency control, threat, engagement, future expectancy, and self-evaluation. Apparently that basic breakdown is psychologically sound, but I’m trusting ChatGPT on that. The sexual variables apparently are also well-known: an excitation component is the accelerator, and the inhibition is the brake. Added to a baseline libido dependent on the individual, that calculates the arousal. As seen in the picture, Alicia right now is dry as sandpaper.

The most interesting part for me is that the mood axes and basic sexual variables are ingredients to form emotions and sexual “moods”. I have dozens of them defined, as I’ve been working with ChatGPT in order to depict the whole breadth of emotions that are distinct and meaningful. Here are the current listings of emotions and sexual “moods” that my app calculates:

Emotions: calm, contentment, relief, confidence, joy, euphoria, enthusiasm, amusement, awe, inspiration, aesthetic appreciation, interest, curiosity, fascination, flow, entrancement, hope, optimism, determination, anticipation, sadness, grief, disappointment, despair, numbness, fatigue, loneliness, nostalgia, boredom, apathy, unease, stress, anxiety, craving, thrill, fear, terror, dread, hypervigilance, courage, alarm, suspicion, irritation, frustration, anger, rage, resentment, contempt, disgust, cynicism, pride, triumph, shame, embarrassment, awkwardness, guilt, regret, humiliation, submission, envy, trusting surrender, jealousy, trust, admiration, adoration, gratitude, affection, love attachment, compassion, empathic distress, hatred, surprise startle, confusion
Sexual moods: sexual lust, passionate love, sexual sensual pleasure, sexual submissive pleasure, sexual playfulness, romantic yearning, sexual confidence, aroused but ashamed, aroused but threatened, sexual craving, erotic thrill, sexual performance anxiety, sexual frustration, afterglow, sexual disgust conflict, sexual indifference, sexual repulsion

Emotions are calculated based on detailed prototypes. Here’s one:

"anxiety": {
      "weights": {
        "threat": 0.8,
        "future_expectancy": -0.6,
        "agency_control": -0.6,
        "arousal": 0.4,
        "valence": -0.4
      },
      "gates": [
        "threat >= 0.20",
        "agency_control <= 0.20"
      ]
    }

Those emotions and sexual moods are fed to LLM-based actors. They figure out “hmm, I’m intensely disappointed, strongly cynical, strongly sad, etc., so that needs to color my thoughts, speech, notes, and the actions I take.” I haven’t tested the system much in practice, but the little I tested, the results were like night and day regarding the LLM’s roleplaying.

In real life, we not only do things, but our bodies do things to us. We are aware of how our emotional states change us, and those turn into “tells” to the other people present. In addition, when one thinks in terms of stories, you add “reaction beats” when the emotional state of an actor changes, so I did exactly that: if the LLM has returned changes to the previous mood axes and sexual variables, the library of expressions have a change to trigger (one at a time), based on whether some prerequisite triggers. The following example makes it self-explanatory:

{
    "$schema": "schema://living-narrative-engine/expression.schema.json",
    "id": "emotions:lingering_guilt",
    "description": "Moderate, non-spiking guilt—an apologetic, sheepish self-consciousness after a minor mistake",
    "priority": 57,
    "prerequisites": [
        {
            "logic": {
                "and": [
                    {
                        ">=": [
                            {
                                "var": "emotions.guilt"
                            },
                            0.35
                        ]
                    },
                    {
                        "<=": [
                            {
                                "var": "emotions.guilt"
                            },
                            0.70
                        ]
                    },
                    {
                        "<=": [
                            {
                                "-": [
                                    {
                                        "var": "emotions.guilt"
                                    },
                                    {
                                        "var": "previousEmotions.guilt"
                                    }
                                ]
                            },
                            0.12
                        ]
                    },
                    {
                        "<=": [
                            {
                                "var": "emotions.humiliation"
                            },
                            0.25
                        ]
                    },
                    {
                        "<=": [
                            {
                                "var": "emotions.terror"
                            },
                            0.20
                        ]
                    },
                    {
                        "<=": [
                            {
                                "var": "emotions.fear"
                            },
                            0.35
                        ]
                    }
                ]
            }
        }
    ],
    "actor_description": "It wasn't a catastrophe, but it still sits wrong. I replay the moment and feel that small, sour twist—like I owe someone a cleaner version of myself. I want to smooth it over without making a scene.",
    "description_text": "{actor} looks faintly apologetic—eyes dipping, a tight little wince crossing their face as if they're privately conceding a mistake.",
    "alternate_descriptions": {
        "auditory": "I catch a small, hesitant exhale—like a half-swallowed \"sorry.\""
    },
    "tags": [
        "guilt",
        "sheepish",
        "apology",
        "minor_mistake",
        "lingering"
    ]
}

I’ve created about 53 expressions already, but they’re surprisingly hard to trigger, as they require very specific (and psychologically logical) conditions.

Because testing this new system through playing scenarios would be a nightmare, I’ve created a dev page that makes testing the combinations trivial. In fact, I’ve recorded a video and uploaded to YouTube. So if you want to hear me struggle through my accent that never goes away, and the fact that I very, very rarely speak in real life, here’s ten minutes of it.

I think that’s all. Now I’m going to play through the same Alicia Western scenario as in the short story I posted. If the result is different enough, I will upload it as a short story.

	Post-mortem for I Sa… on I Saw Her Once (Fiction)
	April 27th, 2026… on Motocross Legend, Love of My L…
	The Empty Swing, Pt.… on The Empty Swing, Pt. 3 (N…
	The Empty Swing, Pt.… on The Empty Swing, Pt. 2 (N…
	The Empty Swing, Pt.… on The Empty Swing, Pt. 1 (N…

The Domains of the Emperor Owl

Category: Programación

Life update (06/07/2026)

Personal breakthrough in worldbuilding

LudoForge #4

Teaching a computer to invent tabletop games (and occasionally rediscover the classics)

1) Seeding: where the first games come from

2) Playtesting at machine speed: simulation as the “environment”

3) Turning playthroughs into numbers: metrics, degeneracy, and fitness

4) “Growing in niches”: why we don’t keep only the top 1%

5) Reproduction: mutation (and why mutation is structured, not random noise)

Mutation retries (because many mutations are duds)

6) Repair, rejection reasons, and staying honest about failure

7) Adaptive evolution: learning which mutations are actually useful

8) Optional superpower: motif mining (stealing good patterns from winners)

9) Human taste enters the loop (without turning it into manual curation)

10) Why this can rediscover known games and invent new ones

The simplest mental model

LudoForge #3

Game 1: “The Shared Counter Chicken”

Game 2: “The Game Where You Can’t Move”

Game 3: “First Player Loses: The Speedrun”

Game 4: “The Unwinnable Ritual”

LudoForge #2

LudoForge #1

LudoForge: evolving tabletop games with a deterministic “taste loop”

The big picture: the loop

Games as genomes: a DSL that can be validated and repaired

The simulation engine: deterministic playthroughs with real termination reasons

Metrics: cheap proxies first, expensive rollouts only when you ask

Degeneracy detection: kill the boring and the broken early

Human feedback: turning taste into a model

Fitness: blending objective proxies, diversity pressure, and learned preference

Evolution: MAP-Elites + mutation/crossover that respect DSL validity

What’s already proven (and what isn’t yet)

The point of LudoForge

Living Narrative Engine #19

Living Narrative Engine #18

manga2cbz: read manga in VR

Living Narrative Engine #16