**Important**: Avoid heavy tasks in `on_browser_created` since you don’t yet have a page context. If you need to *log in*, do so in **`on_page_context_created`**.
**Avoid Misusing Hooks**: Do not manipulate page objects in the wrong hook or at the wrong time, as it can crash the pipeline or produce incorrect results. A common mistake is attempting to handle authentication prematurely—such as creating or closing pages in `on_browser_created`.
> **Use the Right Hook for Auth**: If you need to log in or set tokens, use `on_page_context_created`. This ensures you have a valid page/context to work with, without disrupting the main crawling flow.
> **Identity-Based Crawling**: For robust auth, consider identity-based crawling (or passing a session ID) to preserve state. Run your initial login steps in a separate, well-defined process, then feed that session to your main crawl—rather than shoehorning complex authentication into early hooks. Check out [Identity-Based Crawling](../advanced/identity-based-crawling.md) for more details.
> **Be Cautious**: Overwriting or removing elements in the wrong hook can compromise the final crawl. Keep hooks focused on smaller tasks (like route filters, custom headers), and let your main logic (crawling, data extraction) proceed normally.
- Browser is up, but **no** pages or contexts yet.
- Light setup only—don’t try to open or close pages here (that belongs in `on_page_context_created`).
2.**`on_page_context_created`**:
- Perfect for advanced **auth** or route blocking.
- You have a **page** + **context** ready but haven’t navigated to the target URL yet.
3.**`before_goto`**:
- Right before navigation. Typically used for setting **custom headers** or logging the target URL.
4.**`after_goto`**:
- After page navigation is done. Good place for verifying content or waiting on essential elements.
5.**`on_user_agent_updated`**:
- Whenever the user agent changes (for stealth or different UA modes).
6.**`on_execution_started`**:
- If you set `js_code` or run custom scripts, this runs once your JS is about to start.
7.**`before_retrieve_html`**:
- Just before the final HTML snapshot is taken. Often you do a final scroll or lazy-load triggers here.
8.**`before_return_html`**:
- The last hook before returning HTML to the `CrawlResult`. Good for logging HTML length or minor modifications.
---
## When to Handle Authentication
**Recommended**: Use **`on_page_context_created`** if you need to:
- Navigate to a login page or fill forms
- Set cookies or localStorage tokens
- Block resource routes to avoid ads
This ensures the newly created context is under your control **before**`arun()` navigates to the main URL.
---
## Additional Considerations
- **Session Management**: If you want multiple `arun()` calls to reuse a single session, pass `session_id=` in your `CrawlerRunConfig`. Hooks remain the same.
- **Performance**: Hooks can slow down crawling if they do heavy tasks. Keep them concise.
- **Error Handling**: If a hook fails, the overall crawl might fail. Catch exceptions or handle them gracefully.
- **Concurrency**: If you run `arun_many()`, each URL triggers these hooks in parallel. Ensure your hooks are thread/async-safe.