
Reorganize documentation into core/advanced/extraction sections for better navigation. Update terminal theme styles and add rich library for better CLI output. Remove redundant tutorial files and consolidate content into core sections. Add personal story to index page for project context. BREAKING CHANGE: Documentation structure has been significantly reorganized
179 lines
5.9 KiB
Markdown
179 lines
5.9 KiB
Markdown
# `SSLCertificate` Reference
|
||
|
||
The **`SSLCertificate`** class encapsulates an SSL certificate’s data and allows exporting it in various formats (PEM, DER, JSON, or text). It’s used within **Crawl4AI** whenever you set **`fetch_ssl_certificate=True`** in your **`CrawlerRunConfig`**.
|
||
|
||
## 1. Overview
|
||
|
||
**Location**: `crawl4ai/ssl_certificate.py`
|
||
|
||
```python
|
||
class SSLCertificate:
|
||
"""
|
||
Represents an SSL certificate with methods to export in various formats.
|
||
|
||
Main Methods:
|
||
- from_url(url, timeout=10)
|
||
- from_file(file_path)
|
||
- from_binary(binary_data)
|
||
- to_json(filepath=None)
|
||
- to_pem(filepath=None)
|
||
- to_der(filepath=None)
|
||
...
|
||
|
||
Common Properties:
|
||
- issuer
|
||
- subject
|
||
- valid_from
|
||
- valid_until
|
||
- fingerprint
|
||
"""
|
||
```
|
||
|
||
### Typical Use Case
|
||
1. You **enable** certificate fetching in your crawl by:
|
||
```python
|
||
CrawlerRunConfig(fetch_ssl_certificate=True, ...)
|
||
```
|
||
2. After `arun()`, if `result.ssl_certificate` is present, it’s an instance of **`SSLCertificate`**.
|
||
3. You can **read** basic properties (issuer, subject, validity) or **export** them in multiple formats.
|
||
|
||
---
|
||
|
||
## 2. Construction & Fetching
|
||
|
||
### 2.1 **`from_url(url, timeout=10)`**
|
||
Manually load an SSL certificate from a given URL (port 443). Typically used internally, but you can call it directly if you want:
|
||
|
||
```python
|
||
cert = SSLCertificate.from_url("https://example.com")
|
||
if cert:
|
||
print("Fingerprint:", cert.fingerprint)
|
||
```
|
||
|
||
### 2.2 **`from_file(file_path)`**
|
||
Load from a file containing certificate data in ASN.1 or DER. Rarely needed unless you have local cert files:
|
||
|
||
```python
|
||
cert = SSLCertificate.from_file("/path/to/cert.der")
|
||
```
|
||
|
||
### 2.3 **`from_binary(binary_data)`**
|
||
Initialize from raw binary. E.g., if you captured it from a socket or another source:
|
||
|
||
```python
|
||
cert = SSLCertificate.from_binary(raw_bytes)
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Common Properties
|
||
|
||
After obtaining a **`SSLCertificate`** instance (e.g. `result.ssl_certificate` from a crawl), you can read:
|
||
|
||
1. **`issuer`** *(dict)*
|
||
- E.g. `{"CN": "My Root CA", "O": "..."}`
|
||
2. **`subject`** *(dict)*
|
||
- E.g. `{"CN": "example.com", "O": "ExampleOrg"}`
|
||
3. **`valid_from`** *(str)*
|
||
- NotBefore date/time. Often in ASN.1/UTC format.
|
||
4. **`valid_until`** *(str)*
|
||
- NotAfter date/time.
|
||
5. **`fingerprint`** *(str)*
|
||
- The SHA-256 digest (lowercase hex).
|
||
- E.g. `"d14d2e..."`
|
||
|
||
---
|
||
|
||
## 4. Export Methods
|
||
|
||
Once you have a **`SSLCertificate`** object, you can **export** or **inspect** it:
|
||
|
||
### 4.1 **`to_json(filepath=None)` → `Optional[str]`**
|
||
- Returns a JSON string containing the parsed certificate fields.
|
||
- If `filepath` is provided, saves it to disk instead, returning `None`.
|
||
|
||
**Usage**:
|
||
```python
|
||
json_data = cert.to_json() # returns JSON string
|
||
cert.to_json("certificate.json") # writes file, returns None
|
||
```
|
||
|
||
### 4.2 **`to_pem(filepath=None)` → `Optional[str]`**
|
||
- Returns a PEM-encoded string (common for web servers).
|
||
- If `filepath` is provided, saves it to disk instead.
|
||
|
||
```python
|
||
pem_str = cert.to_pem() # in-memory PEM string
|
||
cert.to_pem("/path/to/cert.pem") # saved to file
|
||
```
|
||
|
||
### 4.3 **`to_der(filepath=None)` → `Optional[bytes]`**
|
||
- Returns the original DER (binary ASN.1) bytes.
|
||
- If `filepath` is specified, writes the bytes there instead.
|
||
|
||
```python
|
||
der_bytes = cert.to_der()
|
||
cert.to_der("certificate.der")
|
||
```
|
||
|
||
### 4.4 (Optional) **`export_as_text()`**
|
||
- If you see a method like `export_as_text()`, it typically returns an OpenSSL-style textual representation.
|
||
- Not always needed, but can help for debugging or manual inspection.
|
||
|
||
---
|
||
|
||
## 5. Example Usage in Crawl4AI
|
||
|
||
Below is a minimal sample showing how the crawler obtains an SSL cert from a site, then reads or exports it. The code snippet:
|
||
|
||
```python
|
||
import asyncio
|
||
import os
|
||
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
|
||
|
||
async def main():
|
||
tmp_dir = "tmp"
|
||
os.makedirs(tmp_dir, exist_ok=True)
|
||
|
||
config = CrawlerRunConfig(
|
||
fetch_ssl_certificate=True,
|
||
cache_mode=CacheMode.BYPASS
|
||
)
|
||
|
||
async with AsyncWebCrawler() as crawler:
|
||
result = await crawler.arun("https://example.com", config=config)
|
||
if result.success and result.ssl_certificate:
|
||
cert = result.ssl_certificate
|
||
# 1. Basic Info
|
||
print("Issuer CN:", cert.issuer.get("CN", ""))
|
||
print("Valid until:", cert.valid_until)
|
||
print("Fingerprint:", cert.fingerprint)
|
||
|
||
# 2. Export
|
||
cert.to_json(os.path.join(tmp_dir, "certificate.json"))
|
||
cert.to_pem(os.path.join(tmp_dir, "certificate.pem"))
|
||
cert.to_der(os.path.join(tmp_dir, "certificate.der"))
|
||
|
||
if __name__ == "__main__":
|
||
asyncio.run(main())
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Notes & Best Practices
|
||
|
||
1. **Timeout**: `SSLCertificate.from_url` internally uses a default **10s** socket connect and wraps SSL.
|
||
2. **Binary Form**: The certificate is loaded in ASN.1 (DER) form, then re-parsed by `OpenSSL.crypto`.
|
||
3. **Validation**: This does **not** validate the certificate chain or trust store. It only fetches and parses.
|
||
4. **Integration**: Within Crawl4AI, you typically just set `fetch_ssl_certificate=True` in `CrawlerRunConfig`; the final result’s `ssl_certificate` is automatically built.
|
||
5. **Export**: If you need to store or analyze a cert, the `to_json` and `to_pem` are quite universal.
|
||
|
||
---
|
||
|
||
### Summary
|
||
|
||
- **`SSLCertificate`** is a convenience class for capturing and exporting the **TLS certificate** from your crawled site(s).
|
||
- Common usage is in the **`CrawlResult.ssl_certificate`** field, accessible after setting `fetch_ssl_certificate=True`.
|
||
- Offers quick access to essential certificate details (`issuer`, `subject`, `fingerprint`) and is easy to export (PEM, DER, JSON) for further analysis or server usage.
|
||
|
||
Use it whenever you need **insight** into a site’s certificate or require some form of cryptographic or compliance check. |