Provide a structured tradecraft framework for ingesting commercial-technology signals at scale and translating them into IC-grade analytic products that meet the standards set in ICD 203 and the IC OSINT Strategy 2024-2026.
This framework is a translation methodology. It converts unclassified commercial signal traffic (patents, SEC filings, GitHub releases, conference talks, vendor pitches) into analyst-ready inputs scored for source reliability, corroboration, and estimative confidence. It is not a HUMINT or SIGINT replacement, not a substitute for classified corroboration, and not a tool for inferring geopolitical intent from technical capability alone. Analysts retain full responsibility for the assessment.
Commercial market intelligence and the IC analytic line speak different grammars. Equity research speaks in total addressable market, customer concentration, segment narratives, and forward-looking guidance. The analytic line speaks in Words of Estimative Probability, source reliability codes, Key Assumptions Checks, and confidence intervals tied to corroborated evidence. The deficit between these registers is structural and persistent. A 10-K disclosure that materially shifts a capability picture rarely surfaces in finished intelligence at the speed the underlying signal warrants.
The community has acknowledged the gap. The IC OSINT Strategy 2024-2026 named OSINT "the INT of first resort" and committed to modernizing collection and tradecraft. DIA consolidated open-source functions under the National Defense Open Source Center (NDOC) in February 2026, per Federal News Network coverage. Stewart Baker argued the case for a dedicated OSINT agency in Lawfare in 2024. Yet the canonical foundational text for technology-surprise OSINT remains the 2005 National Academies report Avoiding Surprise in an Era of Global Technology Advances (NAP 11286). Two decades on, that report is overdue for an update. Its taxonomy predates the modern signal stack: pre-arXiv volume, pre-GitHub release cadence, pre-WIPO Patent Momentum Indicator, pre-AI-vendor pitch flood. This paper proposes a working translation layer that can hold the line until a more comprehensive update lands.
The framework defines five commercial-signal source categories. Each has a characteristic lead time, a characteristic capability surface, and a dominant collection bias. Analysts working across all five gain triangulation; analysts working within one source type produce a tilted picture by construction.
The WIPO Patent Momentum Indicator, launched in 2025, is the first purpose-built early-warning instrument for technology-trend acceleration drawn directly from patent filings. It collapses filing velocity, cross-jurisdictional family expansion, and citation acceleration into a single quantified momentum score per technology cluster. The framework recommends specific adoption: Source 1 collection should treat WIPO PMI scores as a leading indicator and ingest them on the WIPO release cadence rather than waiting for downstream commercial analyst reporting to reinterpret them.
Translation runs as a five-step pipeline. Each step has a defined input, a defined output, and a defined quality gate. Analysts should not advance to the next step until the prior step's gate is satisfied.
Structured ingestion across the five sources. Tooling stack should support scheduled pulls from USPTO, WIPO PATENTSCOPE, SEC EDGAR, arXiv, GitHub, and HuggingFace, plus event-driven capture for conference proceedings and trade press. Capture metadata at ingestion: source URL, retrieval timestamp, document type, hash. Provenance is not optional.
Adapt the NATO Admiralty Code (A-F for source reliability, 1-6 for information credibility) to commercial signals. Source-reliability scoring keys to the track record of the originating entity: A for SEC-filed primary documents and accepted peer-reviewed preprints with verifiable authorship; B for established trade press with editorial standards; C for vendor white papers and pitch decks; D for unattributed analyst notes; E for promotional content; F for sources with documented credibility failures. Information credibility keys to corroboration status at the moment of capture, independent of reliability.
At least two source types must align before an assessment proceeds beyond preliminary status. A patent filing alone is direction; a patent filing plus a 10-K capital allocation disclosure plus a GitHub repository release is a capability picture. Single-source assessments are flagged as preliminary and routed for additional collection rather than published. The corroboration requirement is the single most important bias control in the framework.
Map evidence quality to the seven-tier Words of Estimative Probability scale defined in ICD 203: almost no chance, very unlikely, unlikely, roughly even chance, likely, very likely, almost certain. Each tier carries explicit probability ranges. Avoid numeric anchoring outside the WEP scale; analysts who introduce custom percentages outside ICD 203 norms degrade comparability across products.
Produce a finished product with three required components: an explicit Key Assumptions Check (the assumptions that, if invalidated, would flip the assessment); a source-reliability table listing each cited signal with its A-F and 1-6 codes; and citations formatted per the ODNI IC Standard for Citation of PAI, CAI, and OSINT in Intelligence Products (December 2024). The citation standard is recent, specific, and binding. Analysts who format commercial-signal citations to it from day one avoid downstream rework when products are routed into classified channels.
Each source category carries dominant biases. The framework names six bias categories and assigns each a mitigation tactic. Analysts should run the bias check as a discrete pre-publication step.
DeepSeek-R1 offers a documented case where the five-source framework, if applied at the moment of signal availability, would have produced an IC-grade assessment eight months before the formal U.S. government technical evaluation landed. The case is instructive because each signal was unclassified and publicly available throughout.
Source 3 (Code & Preprints). DeepSeek released the R1 preprint to arXiv on January 22, 2025, with model weights published concurrently on HuggingFace under an open license. Authorship traced to DeepSeek-AI, with contributor affiliations indicating compute access at a scale inconsistent with the company's nominal funding profile.
Source 2 (SEC equivalents). DeepSeek's parent, High-Flyer, operates as a quantitative hedge fund rather than a U.S. registrant. The functional SEC-equivalent signals were Chinese securities-regulator filings and counterparty disclosures from international prime brokers, both publicly accessible. Capital allocation patterns indicated compute-cluster spending in excess of advertised research budgets.
Source 1 (Patents). The Chinese patent corpus from 2023 to 2024 showed accelerating filings in mixture-of-experts architectures, reinforcement-learning-from-verifiable-rewards methods, and inference-optimization techniques traceable to DeepSeek and adjacent High-Flyer entities. WIPO PATENTSCOPE coverage of the filings was complete by Q4 2024.
Source 5 (Trade press). The Information, Semianalysis, and Chinese-language industry coverage discussed High-Flyer's GPU acquisitions and DeepSeek's hiring throughout 2024. Reporting on hardware procurement workarounds (Singapore and Malaysia intermediaries) appeared in industry coverage well before formal export-control enforcement reviews.
Applying the framework: signal capture was achievable in real time. Source-reliability scoring placed the arXiv preprint and Chinese patent corpus at A-2 and B-2 respectively. Corroboration was satisfied across three source categories. The translated WEP-grade assessment would have read: "It is very likely that DeepSeek's training compute was aggregated through PRC-state-adjacent channels by Q3 2024," and "It is likely that U.S. export-control evasion occurred via Singapore and Malaysia intermediaries."
A disciplined application of the five-source framework would have produced this IC-grade product in January 2025, immediately on preprint release. The NIST CAISI evaluation of DeepSeek capability did not land until September 2025. The House Select Committee on the CCP DeepSeek report arrived in mid-2025. The eight-month delta between achievable assessment and delivered assessment is the operational point of the framework. Signal was present; translation was absent.
The framework requires an organizational owner. Bolt-on adoption inside existing all-source desks produces inconsistent application; the discipline drifts to whichever analyst happens to remember it. A dedicated home is necessary.
Establish a Commercial Signals Cell co-located with the DIA National Defense Open Source Center, with mirror cells embedded at each service S&T directorate (Army Futures Command, Air Force Research Laboratory, Office of Naval Research, USSF Space Systems Command S&T). The Commercial Signals Cell is the single proponent for the translation framework; the mirror cells apply it to service-specific signal beats. The cell maintains rotational Defense Innovation Base liaison billets (DIU, Defense Innovation Board, AFWERX, NavalX), giving analysts direct industry context without creating capture.
Recommended commercial tooling: CSET Emerging Technology Observatory (ETO) for academic and patent signal aggregation; Govini Decision Science Platform for federal procurement and contract intelligence; the Janes and SOSi exoINSIGHT partnership announced May 2026 for integrated defense-industrial intelligence; and custom scrapers for arXiv, GitHub, HuggingFace, and SEC EDGAR. Each tool covers part of the five-source surface; none covers all of it. The cell's tradecraft is what stitches them together.
Training pipeline modeled on the IC OSINT Strategy's foundational-to-expert workforce ladder. The framework is intentionally checklist-driven and teachable: the target analyst archetype is a generalist with disciplined application of the pipeline, not a deep specialist in any single signal type. Specialist consultation is available on demand; primary throughput comes from generalists running the standard pipeline. This is by design. Specialist-only models do not scale to the signal volume the modern commercial-tech surface produces.
The framework's value depends on accurate scoping. The following are out of scope and should be assigned to other collection disciplines or analytic methods.
Citations are organized by category. Where a primary document is available online, the URL is provided. Where a document is print-only or behind a paywall, the standard citation is given without URL.
This white paper is available for download as a PDF for offline reading, citation, and circulation. The framework is published under a permissive use license for educational, governmental, and analytic-tradecraft purposes; please cite when applied or extended.
Anna R. Dudley. Industry as Sensor: A Structured Framework for Translating Commercial-Tech Signals into IC-Grade Tradecraft. annardudley.com, May 2026.
Dudley, Anna R. "Industry as Sensor: A Structured Framework for Translating Commercial-Tech Signals into IC-Grade Tradecraft." annardudley.com, May 2026. https://annardudley.com/industry-as-sensor-white-paper.html
For inquiries on applying the framework inside an organization, training cell adoption, or feedback on the methodology, contact via the channels listed on the main site. The framework is intended to be revised; structured critique is welcomed.
This white paper is also available as a PDF for offline reading and citation. Cite as: Anna R. Dudley, "Industry as Sensor: A Structured Framework for Translating Commercial-Tech Signals into IC-Grade Tradecraft," annardudley.com, May 2026.