179 lines
5.9 KiB
Markdown
179 lines
5.9 KiB
Markdown
![]() |
# `SSLCertificate` Reference
|
|||
|
|
|||
|
The **`SSLCertificate`** class encapsulates an SSL certificate’s data and allows exporting it in various formats (PEM, DER, JSON, or text). It’s used within **Crawl4AI** whenever you set **`fetch_ssl_certificate=True`** in your **`CrawlerRunConfig`**.
|
|||
|
|
|||
|
## 1. Overview
|
|||
|
|
|||
|
**Location**: `crawl4ai/ssl_certificate.py`
|
|||
|
|
|||
|
```python
|
|||
|
class SSLCertificate:
|
|||
|
"""
|
|||
|
Represents an SSL certificate with methods to export in various formats.
|
|||
|
|
|||
|
Main Methods:
|
|||
|
- from_url(url, timeout=10)
|
|||
|
- from_file(file_path)
|
|||
|
- from_binary(binary_data)
|
|||
|
- to_json(filepath=None)
|
|||
|
- to_pem(filepath=None)
|
|||
|
- to_der(filepath=None)
|
|||
|
...
|
|||
|
|
|||
|
Common Properties:
|
|||
|
- issuer
|
|||
|
- subject
|
|||
|
- valid_from
|
|||
|
- valid_until
|
|||
|
- fingerprint
|
|||
|
"""
|
|||
|
```
|
|||
|
|
|||
|
### Typical Use Case
|
|||
|
1. You **enable** certificate fetching in your crawl by:
|
|||
|
```python
|
|||
|
CrawlerRunConfig(fetch_ssl_certificate=True, ...)
|
|||
|
```
|
|||
|
2. After `arun()`, if `result.ssl_certificate` is present, it’s an instance of **`SSLCertificate`**.
|
|||
|
3. You can **read** basic properties (issuer, subject, validity) or **export** them in multiple formats.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
## 2. Construction & Fetching
|
|||
|
|
|||
|
### 2.1 **`from_url(url, timeout=10)`**
|
|||
|
Manually load an SSL certificate from a given URL (port 443). Typically used internally, but you can call it directly if you want:
|
|||
|
|
|||
|
```python
|
|||
|
cert = SSLCertificate.from_url("https://example.com")
|
|||
|
if cert:
|
|||
|
print("Fingerprint:", cert.fingerprint)
|
|||
|
```
|
|||
|
|
|||
|
### 2.2 **`from_file(file_path)`**
|
|||
|
Load from a file containing certificate data in ASN.1 or DER. Rarely needed unless you have local cert files:
|
|||
|
|
|||
|
```python
|
|||
|
cert = SSLCertificate.from_file("/path/to/cert.der")
|
|||
|
```
|
|||
|
|
|||
|
### 2.3 **`from_binary(binary_data)`**
|
|||
|
Initialize from raw binary. E.g., if you captured it from a socket or another source:
|
|||
|
|
|||
|
```python
|
|||
|
cert = SSLCertificate.from_binary(raw_bytes)
|
|||
|
```
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
## 3. Common Properties
|
|||
|
|
|||
|
After obtaining a **`SSLCertificate`** instance (e.g. `result.ssl_certificate` from a crawl), you can read:
|
|||
|
|
|||
|
1. **`issuer`** *(dict)*
|
|||
|
- E.g. `{"CN": "My Root CA", "O": "..."}`
|
|||
|
2. **`subject`** *(dict)*
|
|||
|
- E.g. `{"CN": "example.com", "O": "ExampleOrg"}`
|
|||
|
3. **`valid_from`** *(str)*
|
|||
|
- NotBefore date/time. Often in ASN.1/UTC format.
|
|||
|
4. **`valid_until`** *(str)*
|
|||
|
- NotAfter date/time.
|
|||
|
5. **`fingerprint`** *(str)*
|
|||
|
- The SHA-256 digest (lowercase hex).
|
|||
|
- E.g. `"d14d2e..."`
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
## 4. Export Methods
|
|||
|
|
|||
|
Once you have a **`SSLCertificate`** object, you can **export** or **inspect** it:
|
|||
|
|
|||
|
### 4.1 **`to_json(filepath=None)` → `Optional[str]`**
|
|||
|
- Returns a JSON string containing the parsed certificate fields.
|
|||
|
- If `filepath` is provided, saves it to disk instead, returning `None`.
|
|||
|
|
|||
|
**Usage**:
|
|||
|
```python
|
|||
|
json_data = cert.to_json() # returns JSON string
|
|||
|
cert.to_json("certificate.json") # writes file, returns None
|
|||
|
```
|
|||
|
|
|||
|
### 4.2 **`to_pem(filepath=None)` → `Optional[str]`**
|
|||
|
- Returns a PEM-encoded string (common for web servers).
|
|||
|
- If `filepath` is provided, saves it to disk instead.
|
|||
|
|
|||
|
```python
|
|||
|
pem_str = cert.to_pem() # in-memory PEM string
|
|||
|
cert.to_pem("/path/to/cert.pem") # saved to file
|
|||
|
```
|
|||
|
|
|||
|
### 4.3 **`to_der(filepath=None)` → `Optional[bytes]`**
|
|||
|
- Returns the original DER (binary ASN.1) bytes.
|
|||
|
- If `filepath` is specified, writes the bytes there instead.
|
|||
|
|
|||
|
```python
|
|||
|
der_bytes = cert.to_der()
|
|||
|
cert.to_der("certificate.der")
|
|||
|
```
|
|||
|
|
|||
|
### 4.4 (Optional) **`export_as_text()`**
|
|||
|
- If you see a method like `export_as_text()`, it typically returns an OpenSSL-style textual representation.
|
|||
|
- Not always needed, but can help for debugging or manual inspection.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
## 5. Example Usage in Crawl4AI
|
|||
|
|
|||
|
Below is a minimal sample showing how the crawler obtains an SSL cert from a site, then reads or exports it. The code snippet:
|
|||
|
|
|||
|
```python
|
|||
|
import asyncio
|
|||
|
import os
|
|||
|
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
|
|||
|
|
|||
|
async def main():
|
|||
|
tmp_dir = "tmp"
|
|||
|
os.makedirs(tmp_dir, exist_ok=True)
|
|||
|
|
|||
|
config = CrawlerRunConfig(
|
|||
|
fetch_ssl_certificate=True,
|
|||
|
cache_mode=CacheMode.BYPASS
|
|||
|
)
|
|||
|
|
|||
|
async with AsyncWebCrawler() as crawler:
|
|||
|
result = await crawler.arun("https://example.com", config=config)
|
|||
|
if result.success and result.ssl_certificate:
|
|||
|
cert = result.ssl_certificate
|
|||
|
# 1. Basic Info
|
|||
|
print("Issuer CN:", cert.issuer.get("CN", ""))
|
|||
|
print("Valid until:", cert.valid_until)
|
|||
|
print("Fingerprint:", cert.fingerprint)
|
|||
|
|
|||
|
# 2. Export
|
|||
|
cert.to_json(os.path.join(tmp_dir, "certificate.json"))
|
|||
|
cert.to_pem(os.path.join(tmp_dir, "certificate.pem"))
|
|||
|
cert.to_der(os.path.join(tmp_dir, "certificate.der"))
|
|||
|
|
|||
|
if __name__ == "__main__":
|
|||
|
asyncio.run(main())
|
|||
|
```
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
## 6. Notes & Best Practices
|
|||
|
|
|||
|
1. **Timeout**: `SSLCertificate.from_url` internally uses a default **10s** socket connect and wraps SSL.
|
|||
|
2. **Binary Form**: The certificate is loaded in ASN.1 (DER) form, then re-parsed by `OpenSSL.crypto`.
|
|||
|
3. **Validation**: This does **not** validate the certificate chain or trust store. It only fetches and parses.
|
|||
|
4. **Integration**: Within Crawl4AI, you typically just set `fetch_ssl_certificate=True` in `CrawlerRunConfig`; the final result’s `ssl_certificate` is automatically built.
|
|||
|
5. **Export**: If you need to store or analyze a cert, the `to_json` and `to_pem` are quite universal.
|
|||
|
|
|||
|
---
|
|||
|
|
|||
|
### Summary
|
|||
|
|
|||
|
- **`SSLCertificate`** is a convenience class for capturing and exporting the **TLS certificate** from your crawled site(s).
|
|||
|
- Common usage is in the **`CrawlResult.ssl_certificate`** field, accessible after setting `fetch_ssl_certificate=True`.
|
|||
|
- Offers quick access to essential certificate details (`issuer`, `subject`, `fingerprint`) and is easy to export (PEM, DER, JSON) for further analysis or server usage.
|
|||
|
|
|||
|
Use it whenever you need **insight** into a site’s certificate or require some form of cryptographic or compliance check.
|