
Reorganize documentation into core/advanced/extraction sections for better navigation. Update terminal theme styles and add rich library for better CLI output. Remove redundant tutorial files and consolidate content into core sections. Add personal story to index page for project context. BREAKING CHANGE: Documentation structure has been significantly reorganized
5.9 KiB
SSLCertificate
Reference
The SSLCertificate
class encapsulates an SSL certificate’s data and allows exporting it in various formats (PEM, DER, JSON, or text). It’s used within Crawl4AI whenever you set fetch_ssl_certificate=True
in your CrawlerRunConfig
.
1. Overview
Location: crawl4ai/ssl_certificate.py
class SSLCertificate:
"""
Represents an SSL certificate with methods to export in various formats.
Main Methods:
- from_url(url, timeout=10)
- from_file(file_path)
- from_binary(binary_data)
- to_json(filepath=None)
- to_pem(filepath=None)
- to_der(filepath=None)
...
Common Properties:
- issuer
- subject
- valid_from
- valid_until
- fingerprint
"""
Typical Use Case
- You enable certificate fetching in your crawl by:
CrawlerRunConfig(fetch_ssl_certificate=True, ...)
- After
arun()
, ifresult.ssl_certificate
is present, it’s an instance ofSSLCertificate
. - You can read basic properties (issuer, subject, validity) or export them in multiple formats.
2. Construction & Fetching
2.1 from_url(url, timeout=10)
Manually load an SSL certificate from a given URL (port 443). Typically used internally, but you can call it directly if you want:
cert = SSLCertificate.from_url("https://example.com")
if cert:
print("Fingerprint:", cert.fingerprint)
2.2 from_file(file_path)
Load from a file containing certificate data in ASN.1 or DER. Rarely needed unless you have local cert files:
cert = SSLCertificate.from_file("/path/to/cert.der")
2.3 from_binary(binary_data)
Initialize from raw binary. E.g., if you captured it from a socket or another source:
cert = SSLCertificate.from_binary(raw_bytes)
3. Common Properties
After obtaining a SSLCertificate
instance (e.g. result.ssl_certificate
from a crawl), you can read:
1. issuer
(dict)
- E.g.
{"CN": "My Root CA", "O": "..."}
2.subject
(dict) - E.g.
{"CN": "example.com", "O": "ExampleOrg"}
3.valid_from
(str) - NotBefore date/time. Often in ASN.1/UTC format.
4.
valid_until
(str) - NotAfter date/time.
5.
fingerprint
(str) - The SHA-256 digest (lowercase hex).
- E.g.
"d14d2e..."
4. Export Methods
Once you have a SSLCertificate
object, you can export or inspect it:
4.1 to_json(filepath=None)
→ Optional[str]
- Returns a JSON string containing the parsed certificate fields.
- If
filepath
is provided, saves it to disk instead, returningNone
.
Usage:
json_data = cert.to_json() # returns JSON string
cert.to_json("certificate.json") # writes file, returns None
4.2 to_pem(filepath=None)
→ Optional[str]
- Returns a PEM-encoded string (common for web servers).
- If
filepath
is provided, saves it to disk instead.
pem_str = cert.to_pem() # in-memory PEM string
cert.to_pem("/path/to/cert.pem") # saved to file
4.3 to_der(filepath=None)
→ Optional[bytes]
- Returns the original DER (binary ASN.1) bytes.
- If
filepath
is specified, writes the bytes there instead.
der_bytes = cert.to_der()
cert.to_der("certificate.der")
4.4 (Optional) export_as_text()
- If you see a method like
export_as_text()
, it typically returns an OpenSSL-style textual representation. - Not always needed, but can help for debugging or manual inspection.
5. Example Usage in Crawl4AI
Below is a minimal sample showing how the crawler obtains an SSL cert from a site, then reads or exports it. The code snippet:
import asyncio
import os
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
tmp_dir = "tmp"
os.makedirs(tmp_dir, exist_ok=True)
config = CrawlerRunConfig(
fetch_ssl_certificate=True,
cache_mode=CacheMode.BYPASS
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun("https://example.com", config=config)
if result.success and result.ssl_certificate:
cert = result.ssl_certificate
# 1. Basic Info
print("Issuer CN:", cert.issuer.get("CN", ""))
print("Valid until:", cert.valid_until)
print("Fingerprint:", cert.fingerprint)
# 2. Export
cert.to_json(os.path.join(tmp_dir, "certificate.json"))
cert.to_pem(os.path.join(tmp_dir, "certificate.pem"))
cert.to_der(os.path.join(tmp_dir, "certificate.der"))
if __name__ == "__main__":
asyncio.run(main())
6. Notes & Best Practices
1. Timeout: SSLCertificate.from_url
internally uses a default 10s socket connect and wraps SSL.
2. Binary Form: The certificate is loaded in ASN.1 (DER) form, then re-parsed by OpenSSL.crypto
.
3. Validation: This does not validate the certificate chain or trust store. It only fetches and parses.
4. Integration: Within Crawl4AI, you typically just set fetch_ssl_certificate=True
in CrawlerRunConfig
; the final result’s ssl_certificate
is automatically built.
5. Export: If you need to store or analyze a cert, the to_json
and to_pem
are quite universal.
Summary
SSLCertificate
is a convenience class for capturing and exporting the TLS certificate from your crawled site(s).- Common usage is in the
CrawlResult.ssl_certificate
field, accessible after settingfetch_ssl_certificate=True
. - Offers quick access to essential certificate details (
issuer
,subject
,fingerprint
) and is easy to export (PEM, DER, JSON) for further analysis or server usage.
Use it whenever you need insight into a site’s certificate or require some form of cryptographic or compliance check.