How can I help?#

Whether you operate systems, build them, research them, or help coordinate the response — here is where to plug in, and what to do first.

Reduce your exposure#

How should leaders and engineers reason about exposure?#

A practical way to compare very different systems is to ask four questions:

Dimension Question
Impact If this fails due to time, how bad is it?
Uncertainty Do we know how it behaves past the boundary, or are we guessing?
Difficulty Can we actually fix it in time?
Blast radius Does failure stay local, or cascade?

These translate directly into governance decisions: what must be fixed first, and which assumptions are currently indefensible.

A second lens is useful once exposure is mapped: triage by failure-tempo. Not all failures arrive at the same speed, and scarce response capacity is better spent along that axis than evenly. Life-safety systems sit at the fastest tempo and the highest priority — medical devices, emergency communications, air-traffic systems. Critical infrastructure has a slower onset but broad propagation — power, water, transport, telecommunications. Economic systems run at the longest tempo with the most response flexibility — settlement, billing, insurance, supply chains.

A related trap is the failure that does not announce itself. Some systems cross the boundary and keep running, generating wrong outputs with no crash, no alarm, and no log entry. And because much time arithmetic is forward-looking, consequences can surface before 2038 — certificate notAfter dates, thirty-year mortgages, pensions, and bonds all reach across the boundary today — and after it only when a dormant code path next executes, which for an annual batch job may be a year later.

What should I do next if I suspect exposure?#

Start simple. Identify where timestamps are stored (databases, logs, schemas); where time is imported (NTP, GPS, upstream APIs); and where time is compared (authentication, certificate validation, schedulers). Ask vendors and integrators direct questions about 2036/2038 testing. And treat this as an end-to-end dependency issue, not a single patch.

If you operate critical infrastructure, begin with the integration layers — monitoring, analytics, logging, scheduling, identity.

How should we test for rollover behaviour?#

Carefully, because the obvious approaches have limits. Where it is safe, test on operationally representative systems under realistic load rather than relying on laboratory replicas — the Ariane 5 loss is the standing reminder that software validated in one environment can still fail in another. Advance clocks across the whole 2036–2038 window, not just the January 2038 boundary, since the 2036 NTP rollover often surfaces fragility first. Check both layers — a modern 64-bit host can still run 32-bit application code where the real time-handling lives — and include deliberate clock manipulation, backward and forward, to expose assumptions that time only ever moves steadily ahead. And treat remote scanning as triage: it can suggest where exposure concentrates, never prove an individual system safe. A useful maxim is that you cannot be responsible for what you cannot test.

What if we can't fix everything in time?#

Then the goal shifts from repair to harm minimisation, which is a real category of action rather than a counsel of despair. Even where a system's eventual failure can't be prevented on the available timeline, its consequences are not fixed in advance. Practical moves include decoupling time-coordination from the control paths that must keep working, so a clock fault doesn't take a safety function down with it; adding human-legible, human-operable overrides where a silent time failure would otherwise leave no way to intervene; designing for staged degradation rather than binary success-or-collapse; and retiring digital subsystems whose exposure no longer earns its keep. None of this is new — it draws on a long tradition in resilience engineering and the loose-coupling ideas in Perrow's Normal Accidents, and exists in mature form in parts of aviation, nuclear, and medical practice — but it is under-used across much of the installed base. Whether an organisation can actually take these steps depends as much on its culture as its engineering: as Westrum's work on organisational culture argues, institutions that surface anomalies fare better than those that bury them.

What should we ask for when buying or contracting systems?#

Procurement is often the last moment several remediation options remain open by default, so it is worth using well. Ask suppliers to disclose how their components handle time — timestamp widths, epoch, and signedness — which specifications have not historically required (an emerging idea, by analogy to a software bill of materials). Seek attestation about behaviour across the rollover window; it fixes nothing by itself, but it makes later planning possible. Write maintenance and support agreements that explicitly cover time-handling remediation as in-scope work, rather than discovering later that it isn't. And for systems bought with public money, deployed in long-lived critical roles, or carrying safety-of-life implications, consider source-and-build escrow — not the source alone but the build environment and toolchain — so that someone other than the original vendor can remediate if the vendor is gone, unwilling, or unable. Preserving these options at acquisition costs far less than recovering them afterwards.

How Epochalypse can help#

Epochalypse can explain the landscape in plain language, help frame risk and prioritise remediation, share public references and known failure patterns, and connect practitioners across sectors where that is useful.

What it cannot do is provide a universal "2038 compliance stamp." This is a systems problem; each system has to be assessed in context.

Who is coordinating this#

Who is working on this?#

Over the past year, named institutional workstreams have appeared — but they remain stovepiped, and the piece that has been missing all along is not more institutions, it is a layer to coordinate between them. That coordinating layer is only now beginning to form.

The most substantial sits at ITU-T Study Group 17, the international standards body for telecommunication security. A Technical Paper, XSTP.epoch, makes the case that three timekeeping events inside a three-year window — the NTP era rollover on 7 February 2036, the 32-bit signed time_t overflow on 19 January 2038, and the GPS week-number rollover on 20 November 2038 — together form a single category of global infrastructure risk requiring a coordinated international response, with the 2106 horizon treated as a long-tail consideration. Revision 1 was agreed at the SG17 plenary of 1–10 June 2026 in Geneva and the agreed text has been transmitted to the ITU Telecommunication Standardization Bureau for posting; it runs to roughly 100 pages. Its central empirical contribution is a cross-sector exposure survey covering thirty sectors, with developed treatments of telecommunications, financial systems, electric and connected vehicles, and implantable and hospital medical devices, and the remainder scaffolded for development in later revisions. The Technical Paper format was a deliberate choice: it permits substantive analysis without the consensus-formation requirements of a formal Recommendation, laying groundwork on which later standards, regulation, and operational planning can rest. A public background page is maintained at propertools.be/commons/xstr-epoch.

The Paper sits within a wider frame set by an expert report, When digital systems fail, co-published on 5 May 2026 by ITU, the UN Office for Disaster Risk Reduction, and the Paris School of International Affairs at Sciences Po. That report treated non-intentional digital disruption as a distinct risk category that current resilience frameworks address only partially.

In the IEEE, a new Recommended Practice, P4150, entered development in June 2026 and cross-references the ITU-T work.

Within FIRST, the Time Security SIG — co-chaired by the project's founders — has met monthly since January 2026, with around eighty participants drawn from the NTP core, national CERTs, and major infrastructure vendors, and a first sub-working group on supply chain now stood up. It is increasingly the venue where the IETF, ITU-T, and IEEE strands are coordinated — a role the XSTP.epoch paper itself points to — which is precisely the coordinating function the field has lacked (first.org/global/sigs/time).

In the EU, rollover-readiness expectations for products are being taken up within the harmonised-standards work supporting the Cyber Resilience Act.

A great deal of work still happens quietly inside organisations, without a formal programme or mandate. What has changed in a year is not that the problem is owned — it still isn't — but that a place to coordinate the owners is finally taking shape.

Who is responsible — and who pays?#

This is often the hardest question, and it is as much commercial and legal as technical. A single deployed system may involve a component vendor, a software supplier, a systems integrator, a sub-contractor, and an operator — and when exposure surfaces, it is rarely obvious which of them owns the fix or carries the cost. The economics make it worse: remediation is local and immediate, while the benefit is diffuse and deferred, so there is little incentive to move first. Where the parties cannot agree, the question can end up before a court — as it has in a French rail case currently under appeal — but litigation is a slow and uncertain substitute for settling responsibility in advance. The practical step for operators is to make these questions explicit in contracts and maintenance agreements now, rather than discovering the ambiguity after something has failed.

Where expertise is still needed#

Much about 2038-class risk is settled; a good deal is not. These are open problems where the project and the wider community are actively looking for subject-matter input — from practitioners, researchers, and operators who know their own corner of the substrate better than any survey can.

Each question below sits at Unexamined — an acknowledged evidence gap, named so it can be closed.

How much is actually affected? There is no authoritative inventory of where 32-bit time lives, and the methods for estimating it — analysing public code, fingerprinting internet-facing services, observing client behaviour, examining firmware — are each partial and none complete. The measurement discipline itself is still forming. This calls for embedded and firmware engineers, measurement researchers, and operators with visibility into their own estates.

How do most sectors actually fail? Exposure has been surveyed broadly but characterised in depth for only a handful of sectors; the rest are sketches waiting for specialists to fill in the real failure modes, constraints, and timelines. Power, water, transport, aviation, rail, automotive, medical, telecoms, and finance each need their own practitioners.

How do you test safely, especially in safety-critical systems? You often can't advance production clocks, there is no universal tooling, and laboratory tests don't fully stand in for operational conditions. Better shared testing methods — and the test, QA, and safety engineers to develop them — would move the field forward.

Can disclosure and coordination scale to this? Most time faults can't be neutralised by a single inline mitigation, so handling them one report at a time may not absorb a large wave — especially in operational technology, and across advisory systems that don't interoperate. This is work for CSIRT, PSIRT, and vulnerability-coordination practitioners.

When does redundancy make things worse? If a time fault is shared across a redundant set, the group can agree on the wrong time rather than catch it — consensus certifying an error instead of containing it — and loss of shared time produces secondary failures unlike a simple outage. Distributed-systems and resilience researchers are well placed here.

What replaces "seconds since 1970"? Moving off 32-bit time is not simply "make it 64-bit": a successor representation ideally carries explicit width, signedness, and epoch so that exposure is unambiguous, and no body has yet convened to define one. This is a question for protocol and standards designers.

What works when full remediation isn't achievable? The menu of harm-minimisation options — decoupling, overrides, staged degradation, selective de-digitalisation — is under-developed across much of the installed base. Resilience, safety, and human-factors engineers have a great deal to contribute.

Do we even have the capacity to fix it? Remediation depends on people and artefacts that are eroding: lost build chains and source code, vendors that no longer hold the knowledge to fix what they shipped, a retirement wave, and a thin standards-participant pool. The core of the time ecosystem is fragile in the same way — the protocol much of the internet relies on is maintained by a small, structurally underfunded, largely volunteer community. Whether the capacity to remediate exists at all is its own open question — one that holders of legacy-system knowledge and institutional-continuity specialists are needed to answer.

If one of these is your area, the way in is the FIRST Time Security SIG (see "How can I get involved?" below).

How to get involved#

Join the FIRST Time Security SIG (https://www.first.org/global/sigs/time/), and share safe observations and patterns — what failed, how it failed, and under what conditions.