Blog Article

Discovering Shadow Certificates: A 30-Day Inventory Playbook

June 3, 2026
17 min read

Published on

June 3, 2026

Why Shadow Certificates Exist (It's Not Always Shadow IT)

Every PKI team that has run a serious certificate inventory project arrives at the same number: the count is three to ten times higher than the central CA database suggests.

The instinct is to blame shadow IT, but most shadow certificates are not the product of malicious bypass. They are the residue of normal engineering work performed under time pressure. A senior PKI lead launching a first certificate audit should walk in expecting six recurring sources rather than one.

The first source is the emergency self-signed certificate. A production incident at 2am, a load balancer that will not start without a server certificate, a team that does not have ACME credentials handy. Someone runs openssl req -x509 -newkey rsa:2048 -nodes -keyout key.pem -out cert.pem -days 365, restarts the service, and forgets to file a ticket. The certificate appears on the network the next morning, signed by a CN that does not exist anywhere in the corporate CA hierarchy. Six months later it is still there.

The second source is the developer who forgot to renew. A test cluster receives a one-year DV certificate from a public CA, the developer leaves the company, the certificate expires, someone reissues it through a different procurement path, and the certificate lifecycle record is lost.

The third source is the third-party scanner that creates certificates as side effects. SaaS observability platforms, CDN onboarding flows, and ingress controllers issue certificates automatically the moment a domain is added.

The fourth source is the vendor-supplied appliance: storage arrays, firewalls, video conferencing gateways, and HVAC controllers ship with self-signed certificates that operations teams replace inconsistently, or not at all.

The fifth source is Kubernetes ingress auto-issuance via cert-manager, where a single Ingress annotation triggers an ACME order that the PKI team never sees.

The sixth source is the most operationally embarrassing: internal teams that procured external CA certificates on a corporate credit card because the central PKI's turnaround time was too slow.

None of these reflects bad faith. They reflect process friction. A successful 30-day playbook treats them as evidence, not culprits, and uses the inventory to fix the underlying flow described in the shadow certificates and visibility guide.

Days 1-10: Network-Based Discovery

The first ten days are about seeing what an attacker would see. Network-based certificate discovery finds every TLS-terminating endpoint that responds to a probe and records the chain it presents. It does not require any internal credentials, which means it can run in parallel with the political work of getting cloud and Kubernetes access. The goal at day 10 is a deduplicated list of every certificate observable from outside, every certificate observable from inside the corporate network, and a first cut of the gap between the two.

Define the perimeter before you scan. Pull every routable IPv4 and IPv6 range from the IPAM, every public DNS zone from the registrar, every load balancer VIP from the cloud accounts, every internal /16 used for production, staging, and developer subnets. Write the perimeter down as three CSV files: external_ranges.csv, internal_ranges.csv, dns_zones.csv.

Anything outside these files is out of scope for the first 30 days, and you should resist the urge to expand mid-project. Scope creep is the first reason inventory projects miss their deadline.

Stand up the scanner stack on day 2. testssl.sh is the workhorse for deep enumeration of a single host: cipher suites, protocol versions, certificate chain, OCSP stapling, HSTS. Run testssl.sh --json-pretty --severity LOW host.example.com:443 and the output drops into a structured file the parser can consume.

sslyze is faster for breadth: sslyze --regular --json_out results.json target1.example.com:443 target2.example.com:8443. sslscan is the lightest, useful for sweeping thousands of hosts in an hour.

Wrap them in a worker pool with a 30-second per-host timeout. Expect 5 to 15 percent of hosts to time out on the first pass; rerun them serially with extended timeouts on day 4. A 10,000-host environment scans in roughly six hours on a single 16-core scanning host with parallelism set to 64.

Distinguish authenticated from unauthenticated scope explicitly. Unauthenticated scanning catches everything reachable on standard TLS ports: 443, 465, 636, 853, 989, 990, 993, 995, 5061, 8443. Authenticated scanning requires credentials and finds management interfaces hiding behind VPNs and bastions: IPMI, iLO, idrac, firewall management on 4443, Kubernetes API on 6443, etcd on 2379, Elasticsearch on 9200.

Build two scan profiles and run both. The differential between the two is the most useful single dataset of the project: it shows where TLS terminates inside the trust boundary and where ownership of those endpoints is undefined.

Days 6 through 10 belong to Certificate Transparency log monitoring and DNS sweeps. crt.sh is the fastest way to enumerate every publicly trusted certificate ever issued for your domains. The query https://crt.sh/?q=%25.example.com&exclude=expired&output=json returns every unexpired certificate covering any subdomain of example.com as JSON. Pipe it through jq -r '.[] | [.name_value, .issuer_name, .not_after] | @csv' and you have a CSV that can be diffed against the central CA database within minutes.

Censys offers a richer query language: parsed.names: example.com and parsed.validity.end: [2026-01-01 TO *] returns the same data with subject alternative name expansion and historical visibility. Cross-check both: Censys sometimes carries certificates that have been removed from crt.sh because of merge timing.

The DNS sweep closes the loop. Pull every A, AAAA, CNAME, and TXT record from every authoritative zone. For each FQDN, attempt a TLS handshake on 443 and 8443. Anything that returns a certificate but is not in your testssl.sh sweep is a host you missed. Anything in the CT log database whose subject does not resolve in DNS is either a stale certificate or a misconfigured record.

By the end of day 10 the inventory should hold three columns at minimum: FQDN, observed SHA-256 fingerprint, issuer DN. A 5,000-employee organization typically lands at 8,000 to 25,000 rows. Do not panic at the count. The next ten days bring structure.

Days 11-20: API-Based Discovery

Network scanning misses everything that does not terminate TLS on a public socket: certificates inside cloud KMS, keystores on application servers, signing certificates for code, certificates baked into firmware images, certificates issued for client authentication. API-based discovery covers the rest. Days 11 through 20 are about walking every API that can list certificates and pulling them into the same CSV. The transition is also a political shift: where days 1 through 10 needed only a scanner and a network ACL exception, days 11 through 20 need IAM credentials in every account and namespace.

Start with cloud accounts because they are the highest yield.

  • AWS Certificate Manager is the canonical store for ALB and CloudFront certificates: aws acm list-certificates --certificate-statuses ISSUED --region eu-west-1 --output json, repeated across every region and every account, returns ARN, domain, not-after, and key algorithm. ACM Private CA is a separate API: aws acm-pca list-certificate-authorities. IAM server certificates, used for legacy classic load balancers, hide under aws iam list-server-certificates.
  • Azure Key Vault requires a per-vault loop: az keyvault list followed by az keyvault certificate list --vault-name <vault>, with a separate pass for App Service certificates and Application Gateway listeners.
  • GCP Certificate Manager and Compute Engine SSL certificates need gcloud certificate-manager certificates list and gcloud compute ssl-certificates list, plus a walk through every project.

Write a thin Python wrapper that emits one JSON object per certificate with fields cloud, account_id, region, service, arn_or_id, cn, sans, issuer, not_after, key_algo, key_size. Append the output to the master CSV.

Take control of your PKI infrastructure

See how Evertrust simplifies certificate lifecycle management.

Kubernetes is the second-largest source of shadow certificates in modern stacks. Walk every cluster with kubectl get secrets --all-namespaces -o json | jq -r '.items[] | select(.type=="kubernetes.io/tls") | [.metadata.namespace, .metadata.name] | @tsv', then decode each secret with kubectl get secret <name> -n <ns> -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text.

cert-manager adds its own resources: kubectl get certificates,certificaterequests,orders --all-namespaces surfaces every ACME-driven issuance and the issuer responsible. Track Issuer and ClusterIssuer separately so you know which trust roots are accepted in each cluster.

A 100-cluster fleet running ingress-nginx with cert-manager typically holds 3,000 to 8,000 TLS secrets, most of them autogenerated.

Active Directory Certificate Services keeps everything in its CA database. On each issuing CA, certutil -view -out "RequestID,RequesterName,CertificateTemplate,NotBefore,NotAfter,SerialNumber,CertificateHash" -restrict "Disposition=20" /q csv:issued.csv exports the full set of currently issued certificates in CSV.

Disposition 20 is issued; switch to 21 for revoked, 9 for failed, 30 for denied. Repeat for every CA in the forest.

Standalone subordinate CAs operated by individual business units are a common discovery here, and they are by definition shadow infrastructure even though they live in official CA form. Document them in the same CSV with a column flagging them as out-of-band issuing infrastructure.

HashiCorp Vault PKI engines and similar internal CAs need API calls rather than CSV exports. vault list pki/certs returns serial numbers; vault read pki/cert/<serial> returns the PEM.

ACME order logs from Let's Encrypt, ZeroSSL, Buypass, and Google Trust Services give you a second window on the same data. If you operate Boulder, Pebble, or step-ca internally, dump the order table directly from the database. If you use a hosted ACME provider, the account dashboard exports order history for the last 12 months.

By day 20 the inventory CSV should contain six categorical sources — network scan, CT log, cloud API, Kubernetes, AD CS, Vault/ACME — with at least one row per discovered certificate per source. Expect the same certificate to appear in three or four sources; that is the input to reconciliation.

Days 21-25: Reconciliation and Ownership Mapping

Reconciliation is unglamorous and decides whether the project lands. Five days is enough only because the prior twenty were spent normalizing the schema. The objective is one row per unique certificate, every row tied to an application, every application tied to an owner. Anything that fails either of the second two conditions is an orphan, and the project's real value is in the orphan count.

Deduplicate first. Compute SHA-256 of the DER-encoded certificate for every row and use it as the join key: openssl x509 -in cert.pem -outform DER | openssl dgst -sha256 -hex. Two rows with the same fingerprint are the same certificate, even if they were observed at different IPs, different ports, or under different filenames. Roll up source columns into a multi-valued field: a certificate found in CT logs, in AWS ACM, and in a Kubernetes secret should produce one row with sources=[ct,acm,k8s] rather than three.

The deduplication factor in a typical environment is 2.5 to 4x, which is why the post-reconciliation count usually drops from 25,000 observed instances to 7,000 to 10,000 unique certificates.

Normalize the schema in the same pass. Lowercase the CN. Sort the SAN list. Convert notBefore and notAfter to UTC ISO 8601. Compute days_to_expiry as an integer. Parse the public key into key_algo (rsa/ecdsa/ed25519), key_size (bits), and curve (P-256/P-384/etc.). Extract the issuer organization and common name into separate columns so a pivot table groups by trust anchor without regex.

Tag the certificate type from the EKU: serverAuth, clientAuth, codeSigning, emailProtection, timeStamping. The EKU is the field most likely to expose certificates being used for purposes their issuer did not anticipate — a serverAuth certificate showing up on a client authentication endpoint, for instance.

Correlate certificates to applications using two inputs: DNS and the CMDB. The DNS join is mechanical: every SAN maps to an FQDN, every FQDN maps to a service via the DNS-to-app mapping table the platform team already maintains for monitoring. For SANs that do not resolve, fall back to the IP/port pair observed during the network scan.

The CMDB join is political: ask each business unit to claim a list of FQDNs, and treat the unclaimed remainder as the orphan set. A good target for day 25 is 80 percent of certificates mapped to a named application owner, 15 percent mapped to a business unit but not yet to an individual, 5 percent flagged as orphans. The orphan flag is the input to the next conversation, not a failure mode.

Run a final classification pass to mark certificates that are visible to attackers. Anything whose SAN resolves to a publicly routable IP is internet-exposed. Anything issued by a publicly trusted CA appears in CT logs. Anything used on an authenticated client endpoint requires extra care because revocation cost is higher. The reconciliation output should answer four questions on a single row: who owns this, where does it live, what trust path validates it, and how exposed is it.

Days 26-30: Risk Scoring and Reporting

The reconciled inventory is now the input to a scoring rubric. The rubric is a small, defensible set of weights that turn the table into a prioritized work queue. It is not a predictive model. It is a way to make sure the same certificate gets the same risk score no matter which engineer evaluates it.

Six factors carry the weight in most playbooks.

  • Expiry window: 50 points if days_to_expiry < 30, 30 points if < 60, 10 points if < 90.
  • Key strength: 40 points for RSA < 2048 or ECDSA < 256, 20 points for RSA = 2048, 0 points for RSA ≥ 3072 or modern ECDSA/Ed25519.
  • Algorithm: 50 points for SHA-1 or MD5 signature, 0 for SHA-256 or stronger.
  • Internet exposure: 30 points if the SAN resolves to a public IP, 10 points if it is reachable from a partner network, 0 if internal only.
  • CT log presence: 20 points if the certificate is publicly trusted and absent from CT logs (an indicator of a non-compliant issuer or a deliberately suppressed entry).
  • Unknown issuer: 25 points if the issuer DN does not appear in the approved CA list.

Sum and bin: 0-25 informational, 26-60 medium, 61-100 high, >100 critical. Tune the weights once with the security architect, document them in the playbook, do not change them mid-project. The same weighting logic is discussed in the certificate policy and governance guide for organizations that want to align with their existing policy register.

Want to master certificate management?

Browse our resources on PKI best practices.

Days 28 through 30 are for reporting. Three artifacts come out of the project. The first is the raw inventory CSV, treated as the source of truth and stored in a place that operations can query daily.

The second is the board-ready one-pager: total certificates discovered, breakdown by issuer, percentage in each risk bin, top three remediation themes, projected cost of inaction expressed in expected outage hours per quarter. The one-pager exists to fund the continuous program, not to document discovery.

The third is a remediation tracker: every critical and high finding becomes a ticket with an owner, a due date, and a renewal or replacement decision. Tickets without an owner are returned to the business unit head with a 10-day SLA before they default to the PKI team.

Certificate outage history is the most persuasive material for the board conversation; pull the last 24 months of outages and convert them into hours and revenue impact.

Templates: CSV Schemas and Ownership Matrix

The inventory CSV is the artifact that survives the 30 days. Its schema should be specified before scanning starts, not after, because every downstream report depends on column names that do not change. A workable schema includes 22 fields.

  • Identity: fingerprint_sha256, serial_number, issuer_dn, subject_dn, cn, sans (semicolon-delimited).
  • Validity: not_before, not_after, days_to_expiry.
  • Cryptography: key_algo, key_size, curve, signature_algo.
  • Trust: issuer_org, chain_depth, is_publicly_trusted, eku.
  • Deployment: sources (multi-valued: ct, acm, kv, k8s, adcs, vault, scan), environments (prod/stage/dev), endpoints (IP:port list).
  • Ownership: application, owner_email, business_unit.
  • Risk: risk_score, risk_bin, remediation_ticket.

Keep the file in version control. Diff weekly. The diff is more important than the snapshot.

The RACI matrix for certificate ownership runs in parallel. Five roles cover most organizations.

  • The PKI operations team is responsible for issuance, renewal automation, CA hierarchy, and the discovery program.
  • The application owner is responsible for the certificate's purpose, the FQDNs it covers, and the response to expiry alerts.
  • The platform team is responsible for the deployment substrate (load balancer, Kubernetes, mesh) and for ensuring certificates can be rotated without downtime.
  • The security team is responsible for policy enforcement, the approved CA list, and the risk scoring weights.
  • The business unit head is accountable for budget and for designating an application owner when one is missing.

Publish the matrix on the same page as the inventory dashboard so every escalation path is visible from a single URL. The structure aligns naturally with the CLM strategy framework a mature program eventually adopts.

Beyond Day 30: Turning Discovery into Continuous Monitoring

A 30-day inventory is a snapshot. A snapshot decays within weeks: every new ingress, every cluster autoscale event, every cloud account provisioned for a new product team adds certificates the playbook did not see. Beyond day 30 the program shifts from sprint to operations.

The weekly delta is the single most useful artifact. Rerun the network scan, CT log query, and cloud/Kubernetes API walks once per week. Compute the set difference against the previous week. Report three numbers: new certificates introduced, certificates removed (either rotated or decommissioned), and certificates whose risk score crossed a bin boundary. Send the delta as a one-page email to the PKI lead and the security architect every Monday. Treat anything in the high or critical bin that is new as an incident, not a finding.

Integrate the inventory with renewal automation so discovery feeds action directly. Every certificate that the inventory marks as approaching expiry should trigger an ACME order if the issuer is automation-capable, a workflow ticket if not. The automated certificate management guide covers the protocol layer; the playbook's contribution is the ownership and exposure context that automation alone cannot provide. Without that context, automated renewal extends the life of certificates that should have been retired and adds cost without reducing risk.

Set an SLO on certificate age and one on inventory freshness. The certificate-age SLO is policy-driven: a 90-day lifespan policy with a 30-day renewal window translates to no production certificate may have days_to_expiry < 30 without an active renewal ticket.

The freshness SLO is observability-driven: 90 percent of certificates in the inventory must have been observed in the last 7 days. Track both as Grafana panels backed by the same CSV the playbook produces. Anything that drifts is a discovery gap, not a renewal gap, and routes to the platform team. Continuous discovery is also the foundation of certificate discovery and inventory as a permanent capability rather than a project.

Cadence matters as much as content for board reporting. A quarterly board update keeps the certificate program visible without consuming executive attention on details. The recommended structure is four slides: total certificates and trend over the last four quarters, percentage in each risk bin with quarter-over-quarter delta, top three outage near-misses prevented by the program, and the renewal automation coverage rate.

The narrative arc across quarters should be a steady decline in the high and critical bins and a steady rise in automation coverage. If neither curve moves, the program has stalled. If both move in the wrong direction, the inventory is regressing and the discovery cadence needs to tighten.

Where Evertrust Fits

A 30-day playbook proves the program is possible. Sustaining it is a different problem. The recurring cost of discovery, reconciliation, ownership mapping, risk scoring, and renewal correlation is what causes most inventory projects to fall behind reality within a quarter.

The teams that succeed long-term move the recurring portion of the work onto a platform that does it continuously and that publishes the same artifacts the manual playbook produced — the CSV, the delta, the risk bin, the renewal ticket, the board view — without the weekly engineering tax.

Evertrust takes the playbook artifacts and operates them as a service. Discovery runs continuously across networks, cloud accounts, Kubernetes clusters, AD CS forests, and ACME issuers.

Reconciliation against the issuing CAs is automatic. Ownership is captured per certificate and per business unit, with escalation paths wired to the same RACI the playbook defines.

Risk scoring uses the rubric your team approved, applied to every observed certificate, refreshed on every scan.

Renewal is automated against the protocols the inventory has already discovered, so the path from certificate observed to certificate renewed under policy is one workflow, not three.

To see how the playbook turns into a permanent capability, explore the certificate manager and Evertrust Certificate Lifecycle Management.

Found this helpful?
Back to blog

Table of Contents

Stay Updated

Get the latest PKI insights delivered to your inbox.

By subscribing you accept to receive our communications.

Related Articles

Evertrust PQC

Are European enterprises ready for Post-Quantum Cryptography (PQC) migration? The gaps and the path forward

September 10, 2025
1 min

Explore why PQC adoption lags in Europe, the real blockers, and how to achieve quantum-safe security.

Read more
Evertrust PQC

NIST Releases New Post-Quantum Cryptography Standards

September 10, 2025
1 min

Discover NIST’s new Post-Quantum Cryptography standards (FIPS 203, 204, 205) and how Evertrust is preparing to integrate them for enhanced cybersecurity.

Read more
Evertrust ACME

ACME Clients on Linux

February 12, 2024
1 min

The ACME protocol is a network protocol designed to automate the process of domain validation, deliverance and renewal of X.509 certificates. The process is set up between an ACME server and an ACME client.

Read more

Ready to take control of your certificates?

Talk to our experts and discover how Evertrust can help you implement best practices in PKI and certificate lifecycle management.

Talk to an expert