Research & Product Pipeline — Active Development

The human genome
as a cryptographic engine

DNAcrypt-AI encodes secrets into genomic coordinates, 6-mer dictionaries, and AI-ranked multi-fasta sequences — producing keys and hashes that are biologically grounded and structurally unlike anything in traditional cryptography.

Layer 01
Core Encryption
Production-ready
Layer 02
Covary AI Ranking
Operational
Layer 03
CLI / SDK Package
In development
Layer 04
Entropy Audit
In progress
Layer 05
Patent Filing
Planned

Complexity Proof-of-Concept

Entropy at biological scale

Three independent layers of genomic complexity that an attacker must invert simultaneously — no two encryption runs share the same vocabulary.

~51M
Nucleotide sequences in the intermediary genome vocabulary
1,025
Multi-fasta sequences ensembled per encryption run
50,000
Max nucleotides per sequence (random length)
3.2B
Base positions across hg19 + hg38 assemblies as hash space

Encryption Pipeline

From plaintext to genomic cipher

Each run samples fresh genomic loci — the same plaintext produces a different genomic address every session. Reversal requires the genome assembly, the kmer dictionary, and the Covary ranking model simultaneously.

Plaintext input
GC% hash
hg-terminal_ref
6-mer dict mapping
Random genomic sampling
FAS2rDNA → DNA seqs
Covary AI ranking
Genomic cipher output
Full technical walkthrough →

Why It's Hard to Break

Four layers of structural defense

🧬
Genomic entropy
3.2 billion base positions across hg19 + hg38 assemblies. The sampling space dwarfs any standard PRNG seed — the key material is drawn from biology, not mathematics alone.
🗺️
Coordinate-based keys
Secrets are stored as genome loci, not character strings. Reversing them requires three independent artifacts: the genome assembly, the kmer dictionary, and the Covary ranking model.
🔀
Session vocabulary randomness
Each run samples fresh genomic locations, so the same plaintext produces a different genomic address every session. No two encryption runs share a vocabulary — eliminating pattern correlation attacks.
📐
Multi-layer AI obfuscation
1,025 × 50,000 nt sequences re-ranked by the Covary model. An attacker must invert three separate AI-driven transformations to recover the original plaintext.

Market Applications

Use cases across five verticals

DNAcrypt-AI's pipeline is adaptable across healthcare, defense, DevOps, research, and decentralized finance.

1
Genome-seeded password generator
Consumer + Enterprise SaaS
Instead of a pseudo-random string, DNAcrypt-AI derives a password from a specific genomic coordinate set. The password is reproducible given the same genome loci but unpredictable to an attacker who doesn't know the coordinate set. Deployable as a browser extension or CLI replacing traditional password managers' RNG with genomic sampling.
Reproducible No stored plaintext Novel differentiator
2
Recoverable master key via genomic coordinates
Vault backup / Disaster recovery
Users store a set of genomic loci (e.g., 12 chromosome coordinates) as their recovery phrase. Anyone with those coordinates can re-derive the master key — replacing the 24-word BIP-39 mnemonic with a biologically grounded coordinate set that is meaningless to an eavesdropper but deterministic for the owner.
Human-memorable anchor No mnemonic leakage
1
Genomic envelope for API keys & secrets
DevOps / CI-CD pipelines
Wrap an API key inside a DNAcrypt-AI envelope: the key is mapped to genomic coordinates and stored as a FASTA file in a repo or vault. Without the 6-mer dictionary, FAS2rDNA, and Covary model, the FASTA file is biologically indistinguishable from a real sequence dataset. Integrates as a secrets backend for HashiCorp Vault, AWS Secrets Manager, or GitHub Actions.
Obfuscation at rest CI-CD compatible Defense-in-depth
2
Audit-trail secrets with genomic timestamps
Compliance & Forensics
Each secret rotation generates a new set of genomic loci tied to the current genome assembly version. The coordinate set acts as a tamper-evident timestamp: if the loci are modified, the secret fails to decrypt. Useful for SOC 2, HIPAA, and NIS2 audit trails where secret provenance must be cryptographically verifiable.
Tamper-evident Compliance-friendly
1
Patient-data encryption for EHR systems
Healthcare / HIPAA
Electronic health records encrypted using genomic coordinates derived from the patient's own reference genome region as a key seed. The key is biologically linked to the patient — and only clinical staff with access to the reference genome assembly and the kmer dictionary can decrypt it. This creates a novel class of patient-specific encryption with HIPAA alignment.
Patient-specific keys HIPAA alignment Novel IP potential
2
Classified comms with genomic one-time pads
Defense / Intelligence
Each message session samples a fresh set of genomic loci, producing a one-time vocabulary discarded after use. Combines the theoretical security of one-time pads with the practical reproducibility of genome coordinates. The genome reference acts as a shared secret between authorized parties — fully out-of-band from the message channel.
One-time-pad principle Shared-reference security
1
Steganographic data hiding in genomic datasets
Bioinformatics / Academic
A message or dataset is encoded into a multi-FASTA file that looks like a legitimate genomic study output. The hidden content is invisible to anyone lacking the 6-mer dictionary and Covary ranking. This is biological steganography — the carrier medium is indistinguishable from real sequence data uploaded to NCBI or GEO databases.
Steganographic Publishable carrier Novel research direction
2
Provenance watermarking for genomic datasets
IP protection / Data licensing
Encode a lab's identity or dataset version as a genomic watermark embedded in published sequence data. If the dataset is misused or plagiarized, the hidden watermark can be extracted to prove provenance. DNAcrypt-AI's reproducibility ensures the watermark survives format conversions and re-uploads.
IP watermarking Reproducible
1
Genomic seed phrase for wallet key derivation
DeFi / Web3
Replace BIP-39 mnemonic phrases with a set of genomic coordinates. The private key for a crypto wallet is derived deterministically from the coordinate set + kmer dictionary. A user stores chr7:117,559,590 / chr11:5,246,696 rather than "abandon ability able…" — coordinates that are meaningful, memorable, and cryptographically stronger than word lists.
BIP-39 alternative Deterministic derivation Quantum-resistant candidate
2
NFT provenance encoded in DNA sequences
Digital art / Collectibles
The provenance record of an NFT (mint date, creator, transfer history) is encoded into a multi-FASTA file that ships with the asset. The DNA sequence becomes the certificate of authenticity. Verification requires DNAcrypt-AI decryption — making forgery computationally equivalent to breaking genomic cryptography.
Unforgeable provenance On-chain optional

Competitive Positioning

Genomic cryptography vs. the status quo

Traditional cryptography (AES, RSA)
Keys are numbers — pattern is purely mathematical
Entropy source is PRNG or hardware TRNG
Brute-force is purely computational
No biological anchor — key is self-contained
Mnemonic phrases are dictionary-attackable
DNAcrypt-AI
Keys are genome loci — pattern is biological
Entropy drawn from 3.2B nucleotide positions + AI ranking
Attack requires genome + kmer dict + Covary model
Biological anchor creates out-of-band key material
Coordinate phrases have no linguistic structure to attack

Development Roadmap

Pipeline to production

DNAcrypt-AI is progressing from a working research prototype toward a formally audited, production-deployable cryptographic service.

Phase 1 — Complete
Core encryption pipeline
6-mer dictionary mapping, FAS2rDNA sequence generation, and Covary AI re-ranking implemented and validated on hg19 + hg38 assemblies.
Phase 2 — Complete
Protocol documentation & reproducibility
Protocols published via protocols.io; encryption runs are fully reproducible from documented genomic coordinates and assembly versions.
Phase 3 — Active
Formal entropy analysis & peer review
Quantifying bits of entropy per genomic sampling run. Engaging cryptography researchers to attempt independent reversal — documented attack resistance is the pipeline's most critical validation milestone.
Phase 4 — Active
CLI / pip-installable SDK
Wrapping the notebook-based pipeline into a dnacrypt Python package. One command: dnacrypt generate → returns a genomically-derived password or key material.
Phase 5 — Planned
Provisional patent filing
Filing a method patent covering genome-coordinate sampling + kmer dictionary + Covary re-ranking as a novel cryptographic method, prior to public product demo.
Phase 6 — Planned
B2B product & service pilots
Targeted pilots in healthcare (EHR encryption), DevOps (genomic secrets management), and academic bioinformatics (dataset watermarking).

Ready to explore the pipeline?

Dive into the architecture, review the research, or get in touch with the team at ChordexBio.