diff --git a/README.md b/README.md index b270393e11c..ad0a49a148a 100755 --- a/README.md +++ b/README.md @@ -11,24 +11,29 @@

SeleniumBase

-

All-in-one Browser Automation Framework:
Web Crawling / Testing / Scraping / Stealth

- -

PyPI version SeleniumBase GitHub Actions SeleniumBase Docs SeleniumBase PyPI downloads

+

PyPI version SeleniumBase PyPI downloads
SeleniumBase GitHub Actions SeleniumBase Docs

+

Stealthy automation that passes every bot detection test.

+ +
+Verified using multiple different Chromium browsers:
Chrome, Chrome-for-Testing, Chromium, Edge, Brave.
+
+ +

πŸš€ Start | 🏰 Features | πŸŽ›οΈ Options | πŸ“š Examples | -πŸͺ„ Scripts | -πŸ“± Mobile +πŸ’» Scripts | +πŸ—Ύ Locale
-πŸ“˜ The API | - πŸ”  SyntaxFormats | +πŸ“— API | +πŸ“˜ Stealth API | + πŸ”  DesignPatterns | πŸ”΄ Recorder | -πŸ“Š Dashboard | -πŸ—Ύ Locale +πŸ“Š Dashboard
πŸŽ–οΈ GUI | πŸ“° TestPage | @@ -44,7 +49,7 @@ 🚎 Tours
πŸ€– CI/CD | -❇️ JSMgr | +🟨 JSMgr | 🌏 Translator | 🎞️ Presenter | πŸ–ΌοΈ Visual | @@ -52,9 +57,7 @@

-πŸ“Š SeleniumBase is a complete framework for browser automation and testing with Python and pytest. Includes stealth options and other advanced features. - -πŸ™ Stealth modes: UC Mode and CDP Mode can bypass bot-detection, handle CAPTCHAs, and call methods from the Chrome Devtools Protocol. CDP Mode includes Stealthy Playwright Mode, which makes Playwright stealthy. +πŸ™ CDP Mode bypasses bot-detection and handles CAPTCHAs by driving the browser directly through the Chrome DevTools Protocol. Includes Stealthy Playwright Mode, which extends these advanced anti-detection patches to Playwright scripts. πŸ“š The [SeleniumBase/examples/](https://github.com/seleniumbase/SeleniumBase/tree/master/examples) folder includes over 100 ready-to-run examples. Examples that start with `test_` or end with `_test.py`/`_tests.py` are specifically designed to run with `pytest`. Other examples run directly with raw `python` (those files generally start with `raw_` to avoid confusion). @@ -64,13 +67,62 @@

βš™οΈ Stealthy architecture flowchart:

-Stealthy architecture flowchart +Stealthy architecture flowchart (For maximum stealth, use CDP Mode, which includes Stealthy Playwright Mode) -------- -

πŸ“— This example scrapes Hacker News listings with Pure CDP Mode: +

πŸ“ This example verifies that Pure CDP Mode is stealthy on BrowserScan: + +```python +from seleniumbase import sb_cdp + +url = "https://www.browserscan.net/bot-detection" +sb = sb_cdp.Chrome(url, locale="en", ad_block=True) +sb.flash("Test Results", duration=3, pause=1) +sb.assert_element('strong:contains("Normal")') +print("Bot Not detected") +sb.flash('strong:contains("Normal")', duration=3, pause=2) +``` + +Stealthy architecture flowchart + +πŸ“ This also works as a drop-in replacement for Playwright (making it stealthy): + +```python +from playwright.sync_api import sync_playwright +from seleniumbase import sb_cdp + +sb = sb_cdp.Chrome(locale="en", ad_block=True) +endpoint_url = sb.get_endpoint_url() + +with sync_playwright() as p: + browser = p.chromium.connect_over_cdp(endpoint_url) + page = browser.contexts[0].pages[0] + page.goto("https://www.browserscan.net/bot-detection") + page.wait_for_timeout(500) + sb.flash("Test Results", duration=3, pause=1) + sb.assert_element('strong:contains("Normal")') + sb.flash('strong:contains("Normal")', duration=3, pause=2) +``` + +-------- + +For choosing which Chromium browser to use, you can set command-line options: + +```zsh +python SCRIPT.py --use-chromium # Use the unbranded Chromium browser +python SCRIPT.py --cft # Use Chrome-for-testing +python SCRIPT.py --edge # Use Microsoft Edge +python SCRIPT.py --brave # Use Brave browser +``` + +(Google Chrome is the default browser if not specified.) + +-------- + +

πŸ“ This example scrapes Hacker News listings with Pure CDP Mode: ```python from seleniumbase import sb_cdp @@ -80,12 +132,11 @@ sb = sb_cdp.Chrome(url) elements = sb.find_elements("span.titleline > a") for element in elements: print("* " + element.text) -sb.driver.stop() ``` -------- -

πŸ“— This example saves Google Search results with UC + CDP Mode:
(Results are saved as PDF, HTML, and PNG files)

+

πŸ“ This example saves Google Search results with UC + CDP Mode:
(Results are saved as PDF, HTML, and PNG files)

```python from seleniumbase import SB @@ -106,7 +157,7 @@ with SB(uc=True, test=True) as sb: -------- -

πŸ“— This example bypasses Cloudflare's challenge page with UC + CDP Mode: +

πŸ“ This example bypasses Cloudflare's challenge page with UC + CDP Mode: ```python from seleniumbase import SB @@ -128,7 +179,7 @@ with SB(uc=True, test=True, locale="en") as sb: ---- -

πŸ“— This example handles a CAPTCHA page with Pure CDP Mode: +

πŸ“ This example handles a CAPTCHA page with Pure CDP Mode: ```python from seleniumbase import sb_cdp @@ -139,12 +190,11 @@ sb.sleep(2) sb.solve_captcha() sb.highlight('h1:contains("GitLab")') sb.highlight('button:contains("Sign in")') -sb.driver.stop() ``` -------- -

πŸ“— This example tests an e-commerce site with pytest: +

πŸ“ This example tests an e-commerce site with pytest: ```python from seleniumbase import BaseCase @@ -174,7 +224,7 @@ class MyTestClass(BaseCase): -------- -

πŸ“— This example tests another e-commerce site with pytest: +

πŸ“ This example tests another e-commerce site with pytest: ```zsh pytest test_coffee_cart.py --demo @@ -188,7 +238,7 @@ pytest test_coffee_cart.py --demo -

πŸ“— This example covers multiple actions with pytest: +

πŸ“ This example covers multiple actions with pytest: ```zsh pytest test_demo_site.py diff --git a/examples/cdp_mode/ReadMe.md b/examples/cdp_mode/ReadMe.md index 2e0b43d3923..e7da2947e17 100644 --- a/examples/cdp_mode/ReadMe.md +++ b/examples/cdp_mode/ReadMe.md @@ -8,24 +8,15 @@

βš™οΈ Stealthy architecture flowchart:

-Stealthy architecture flowchart +Stealthy architecture flowchart ---- -### 🎞️ YouTube videos about CDP Mode: - - -

(Watch "Undetectable Automation 4" on YouTube! ▢️)

- ----- - - -

(Watch "Hacking websites with CDP" on YouTube! ▢️)

+### 🎞️ CDP Mode on YouTube: ----- - -

("Unlimited Free Web-Scraping with GitHub Actions" ▢️)

+ +

(Watch "Undetectable Automation: 5th Edition" on YouTube! ▢️)

---- @@ -758,6 +749,28 @@ element.get_parent() ---- +### 🎞️ YouTube videos about CDP Mode: + + +

(Watch "Undetectable Automation 4" on YouTube! ▢️)

+ +---- + + +

(Watch "Undetectable Automation: 5th Edition" on YouTube! ▢️)

+ +---- + + +

(Watch "Hacking websites with CDP" on YouTube! ▢️)

+ +---- + + +

("Unlimited Free Web-Scraping with GitHub Actions" ▢️)

+ +---- + SeleniumBase
SeleniumBase
diff --git a/examples/cdp_mode/playwright/ReadMe.md b/examples/cdp_mode/playwright/ReadMe.md index 411f9921cab..7ee4539a6f2 100644 --- a/examples/cdp_mode/playwright/ReadMe.md +++ b/examples/cdp_mode/playwright/ReadMe.md @@ -272,7 +272,7 @@ if __name__ == "__main__": #### 🎭 This flowchart shows how Stealthy Playwright Mode fits into CDP Mode: -Stealthy architecture flowchart +Stealthy architecture flowchart (See the [**CDP Mode** ReadMe](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/cdp_mode/ReadMe.md) for more information about that.) diff --git a/examples/cdp_mode/playwright/raw_browserscan_nested.py b/examples/cdp_mode/playwright/raw_browserscan_nested.py index d1eeb10f164..20e57919556 100644 --- a/examples/cdp_mode/playwright/raw_browserscan_nested.py +++ b/examples/cdp_mode/playwright/raw_browserscan_nested.py @@ -1,7 +1,7 @@ from playwright.sync_api import sync_playwright from seleniumbase import SB -with SB(uc=True, locale="en") as sb: +with SB(uc=True, locale="en", ad_block=True) as sb: sb.activate_cdp_mode() endpoint_url = sb.cdp.get_endpoint_url() @@ -9,8 +9,7 @@ browser = p.chromium.connect_over_cdp(endpoint_url) page = browser.contexts[0].pages[0] page.goto("https://www.browserscan.net/bot-detection") - page.wait_for_timeout(1000) - sb.cdp.flash("Test Results", duration=4) - page.wait_for_timeout(1000) + page.wait_for_timeout(500) + sb.cdp.flash("Test Results", duration=3, pause=1) sb.assert_element('strong:contains("Normal")') - sb.cdp.flash('strong:contains("Normal")', duration=4, pause=4) + sb.cdp.flash('strong:contains("Normal")', duration=3, pause=2) diff --git a/examples/cdp_mode/playwright/raw_browserscan_sync.py b/examples/cdp_mode/playwright/raw_browserscan_sync.py index fc9f00f20c2..4f2192d9aa9 100644 --- a/examples/cdp_mode/playwright/raw_browserscan_sync.py +++ b/examples/cdp_mode/playwright/raw_browserscan_sync.py @@ -1,15 +1,14 @@ from playwright.sync_api import sync_playwright from seleniumbase import sb_cdp -sb = sb_cdp.Chrome(locale="en") +sb = sb_cdp.Chrome(locale="en", ad_block=True) endpoint_url = sb.get_endpoint_url() with sync_playwright() as p: browser = p.chromium.connect_over_cdp(endpoint_url) page = browser.contexts[0].pages[0] page.goto("https://www.browserscan.net/bot-detection") - page.wait_for_timeout(1000) - sb.flash("Test Results", duration=4) - page.wait_for_timeout(1000) + page.wait_for_timeout(500) + sb.flash("Test Results", duration=3, pause=1) sb.assert_element('strong:contains("Normal")') - sb.flash('strong:contains("Normal")', duration=4, pause=4) + sb.flash('strong:contains("Normal")', duration=3, pause=2) diff --git a/examples/cdp_mode/playwright/raw_fingerprint_nested.py b/examples/cdp_mode/playwright/raw_fingerprint_nested.py new file mode 100644 index 00000000000..18411853d82 --- /dev/null +++ b/examples/cdp_mode/playwright/raw_fingerprint_nested.py @@ -0,0 +1,17 @@ +from playwright.sync_api import sync_playwright +from seleniumbase import SB + +with SB(uc=True, locale="en") as sb: + sb.activate_cdp_mode() + endpoint_url = sb.cdp.get_endpoint_url() + + with sync_playwright() as p: + browser = p.chromium.connect_over_cdp(endpoint_url) + page = browser.contexts[0].pages[0] + page.goto("https://demo.fingerprint.com/playground") + page.wait_for_timeout(500) + sb.cdp.flash('a[href*="browser-bot-detection"]', duration=3, pause=1) + bot_row_selector = 'table:contains("Bot") tr:nth-of-type(3)' + print(sb.get_text(bot_row_selector)) + sb.assert_text("Bot Not detected", bot_row_selector) + sb.cdp.flash(bot_row_selector, duration=3, pause=2) diff --git a/examples/cdp_mode/playwright/raw_fingerprint_sync.py b/examples/cdp_mode/playwright/raw_fingerprint_sync.py new file mode 100644 index 00000000000..9f22a811016 --- /dev/null +++ b/examples/cdp_mode/playwright/raw_fingerprint_sync.py @@ -0,0 +1,15 @@ +from playwright.sync_api import sync_playwright +from seleniumbase import sb_cdp + +sb = sb_cdp.Chrome(locale="en") +endpoint_url = sb.get_endpoint_url() + +with sync_playwright() as p: + browser = p.chromium.connect_over_cdp(endpoint_url) + page = browser.contexts[0].pages[0] + page.goto("https://demo.fingerprint.com/playground") + page.wait_for_timeout(500) + sb.flash('a[href*="browser-bot-detection"]', duration=3, pause=1) + bot_row_selector = 'table:contains("Bot") tr:nth-of-type(3)' + print(sb.get_text(bot_row_selector)) + sb.flash(bot_row_selector, duration=3, pause=2) diff --git a/examples/cdp_mode/raw_browserscan.py b/examples/cdp_mode/raw_browserscan.py index cbbc3e1c369..9028c381bd4 100644 --- a/examples/cdp_mode/raw_browserscan.py +++ b/examples/cdp_mode/raw_browserscan.py @@ -1,10 +1,9 @@ from seleniumbase import SB -with SB(uc=True, test=True, ad_block=True) as sb: +with SB(uc=True, test=True, locale="en", ad_block=True) as sb: url = "https://www.browserscan.net/bot-detection" sb.activate_cdp_mode(url) - sb.sleep(1) - sb.cdp.flash("Test Results", duration=4) - sb.sleep(1) + sb.cdp.flash("Test Results", duration=3, pause=1) sb.assert_element('strong:contains("Normal")') - sb.cdp.flash('strong:contains("Normal")', duration=4, pause=4) + print("Bot Not detected") + sb.cdp.flash('strong:contains("Normal")', duration=3, pause=2) diff --git a/examples/cdp_mode/raw_cdp_browserscan.py b/examples/cdp_mode/raw_cdp_browserscan.py new file mode 100644 index 00000000000..b318d2dee9e --- /dev/null +++ b/examples/cdp_mode/raw_cdp_browserscan.py @@ -0,0 +1,8 @@ +from seleniumbase import sb_cdp + +url = "https://www.browserscan.net/bot-detection" +sb = sb_cdp.Chrome(url, locale="en", ad_block=True) +sb.flash("Test Results", duration=3, pause=1) +sb.assert_element('strong:contains("Normal")') +print("Bot Not detected") +sb.flash('strong:contains("Normal")', duration=3, pause=2) diff --git a/examples/cdp_mode/raw_cdp_fingerprint.py b/examples/cdp_mode/raw_cdp_fingerprint.py new file mode 100644 index 00000000000..96a56739ae6 --- /dev/null +++ b/examples/cdp_mode/raw_cdp_fingerprint.py @@ -0,0 +1,9 @@ +from seleniumbase import sb_cdp + +url = "https://demo.fingerprint.com/playground" +sb = sb_cdp.Chrome(url) +sb.wait_for_element('a[href*="browser-bot-detection"]') +sb.flash('a[href*="browser-bot-detection"]', duration=3, pause=1) +bot_row_selector = 'table:contains("Bot") tr:nth-of-type(3)' +print(sb.get_text(bot_row_selector)) +sb.flash(bot_row_selector, duration=3, pause=2) diff --git a/examples/cdp_mode/raw_cdp_yc_news.py b/examples/cdp_mode/raw_cdp_yc_news.py index e9157e78ab5..55cb6ed0282 100644 --- a/examples/cdp_mode/raw_cdp_yc_news.py +++ b/examples/cdp_mode/raw_cdp_yc_news.py @@ -5,4 +5,3 @@ elements = sb.find_elements("span.titleline > a") for element in elements: print("* " + element.text) -sb.driver.stop() diff --git a/examples/cdp_mode/raw_fingerprint.py b/examples/cdp_mode/raw_fingerprint.py index 4aaf1a1e3e6..2092432d4e5 100644 --- a/examples/cdp_mode/raw_fingerprint.py +++ b/examples/cdp_mode/raw_fingerprint.py @@ -3,10 +3,8 @@ with SB(uc=True, test=True) as sb: url = "https://demo.fingerprint.com/playground" sb.activate_cdp_mode(url) - sb.sleep(1) - sb.cdp.highlight('a[href*="browser-bot-detection"]') + sb.cdp.flash('a[href*="browser-bot-detection"]', duration=3, pause=1) bot_row_selector = 'table:contains("Bot") tr:nth-of-type(3)' print(sb.get_text(bot_row_selector)) sb.assert_text("Bot Not detected", bot_row_selector) - sb.cdp.highlight(bot_row_selector) - sb.sleep(2) + sb.cdp.flash(bot_row_selector, duration=3, pause=2) diff --git a/examples/cdp_mode/raw_glassdoor.py b/examples/cdp_mode/raw_glassdoor.py index 631d004379c..43edfff5516 100644 --- a/examples/cdp_mode/raw_glassdoor.py +++ b/examples/cdp_mode/raw_glassdoor.py @@ -15,7 +15,7 @@ sb.sleep(0.5) sb.click('button[data-role-variant="primary"] span:contains("Search")') sb.sleep(2) - sb.click('[aria-label*="NASA"] img') + sb.click_if_visible('[aria-label*="NASA"] img') sb.sleep(2) print(sb.get_page_title()) sb.save_as_pdf_to_logs() diff --git a/examples/cdp_mode/raw_hilton.py b/examples/cdp_mode/raw_hilton.py index dc831a1dfd3..a30793966e9 100644 --- a/examples/cdp_mode/raw_hilton.py +++ b/examples/cdp_mode/raw_hilton.py @@ -16,7 +16,7 @@ sb.sleep(1.2) sb.click('button[aria-current="date"]') sb.sleep(1.2) - sb.click('button[data-testid="shop-modal-done-cta"]') + sb.click('[role="dialog"] button.btn--solid') sb.sleep(1.5) sb.click('button[data-testid="search-submit-button"]') sb.sleep(6.5) diff --git a/examples/presenter/uc_presentation_5.py b/examples/presenter/uc_presentation_5.py new file mode 100644 index 00000000000..6fd08a40d1b --- /dev/null +++ b/examples/presenter/uc_presentation_5.py @@ -0,0 +1,1026 @@ +# https://www.youtube.com/watch?v=R9HNsnbYh8o +from seleniumbase import BaseCase +BaseCase.main(__name__, __file__) + + +class UCPresentationClass(BaseCase): + def test_presentation_5(self): + self.open("data:,") + self.set_window_position(4, 40) + self._output_file_saves = False + self.create_presentation(theme="serif", transition="none") + self.add_slide("

Press SPACE to continue!

\n") + self.add_slide( + '' + ) + self.add_slide( + "

This continues the series that started here...

" + '' + ) + self.add_slide( + "

πŸ”Ή We're going to be using... πŸ”Ή


" + '' + ) + self.add_slide( + "

πŸ”Ή To bypass all the CAPTCHAs... πŸ”Ή

" + "

" + '' + ) + self.add_slide( + "

πŸ”Ή In all the places... πŸ”Ή



" + "βœ… Local Desktop Machine - (Eg. Mac/Windows)" + "

" + "βœ… Docker Container - (Eg. ubuntu:24.04)" + "

" + "βœ… Linux Server - (Eg. GitHub Actions)

" + ) + self.add_slide( + "

πŸ”Ή With live demos on major sites... πŸ”Ή" + "


" + "βœ… TikTok

" + "βœ… Facebook

" + "βœ… Amazon

" + "βœ… LinkedIn

" + "βœ… And many more!

" + ) + self.add_slide( + "

Get ready for some serious hacking!

" + '' + ) + self.add_slide( + "
Let's warm up with a few live demos
" + "of bypassing Cloudflare's Turnstile...


" + '
' + '' + ) + self.add_slide( + "With the tools I'll be covering in this video,
" + "you be able to web-scrape sites with confidence.
" + "

" + '' + '' + ) + self.add_slide( + "

This continues my Undetectable Automation series:

" + '' + ) + self.add_slide( + "

That all started with...

" + '' + ) + self.add_slide( + "
The Automation Entertainment Industry was born.
" + '' + ) + self.add_slide( + "

The field changes rapidly, so it's important to
" + "note that the first three videos are out-of-date.

" + '' + ) + self.add_slide( + "The first three used a modified Chromedriver
" + 'called "Undetected Chromedriver" for stealth.
' + '' + ) + self.add_slide( + "At some point, modifying Chromedriver
wasn't enough to make " + "WebDriver actions
stealthy because anti-bot services " + "improved.
" + '' + ) + self.add_slide( + "That led to the rise of CDP in order to bypass CAPTCHAs " + " and bot-detection services again.
" + '
' + "
(This was the focus the 4th Undetectable Automation " + "video.)
" + ) + self.add_slide( + "Lots of changes have happened since the last
" + "video in the Undetectable Automation series:
" + '' + ) + self.add_slide( + "

Here are some advancements since then:" + "



" + "

\n" + ) + self.add_slide( + '' + '' + "
" + ) + self.add_slide( + '' + ) + self.add_slide( + "

Compared to standard JS, CDP actions are stealthy!

" + '' + ) + self.add_slide( + "

πŸ”Ή CDP alone isn't enough for stealth... πŸ”Ή" + "


" + "βœ… The web browser's fingerprint must look like
" + "one coming from a human-controlled browser.
" + "

" + '' + ) + self.add_slide( + "
Browsers launched with regular Playwright or WebDriver" + "
have a poisoned fingerprint that results in bot-detection." + "

" + '' + ) + self.add_slide( + "
" + "To avoid this issue, you can launch the system's default " + "
browser... and THEN attach the automation framework!" + "

" + '' + ) + self.add_slide( + "

Some special configuration is needed" + "
to make this work in a stealthy way.

" + '' + ) + self.add_slide( + "

And you may need special methods" + "
for handling CAPTCHAs successfully

" + '' + ) + self.add_slide( + "

‡️ (That brings us to SeleniumBase..." + "
a framework that takes care of" + "
a lot of that work for you...)

" + ) + self.add_slide( + '' + ) + self.add_slide( + '' + ) + self.add_slide( + "Contrary to logical thinking,
" + "SeleniumBase no longer uses Selenium
" + "for some things (such as CDP Mode).
" + "
" + '

' + "
If that sounds confusing, " + "note that JavaScript does not use Java.
" + ) + self.add_slide( + "

SeleniumBase CDP Mode comes in 3 formats:" + "



" + "

\n" + ) + self.add_slide( + '' + ) + self.add_slide( + '

sb_cdp "sync" Stealthly Playwright Mode ' + 'example:

' + "
", + code=( + "from playwright.sync_api import sync_playwright" + "\n" + "from seleniumbase import sb_cdp\n\n" + 'sb = sb_cdp.Chrome()\n' + "endpoint_url = sb.get_endpoint_url()\n\n" + "with sync_playwright() as p:\n" + " browser = p.chromium.connect_over_cdp(endpoint_url)" + "\n" + " page = browser.contexts[0].pages[0]\n" + ' page.goto("https://copilot.microsoft.com")\n' + ' page.wait_for_selector("textarea#userInput")' + '\n' + ' query = "Playwright Python connect_over_cdp() ' + 'sync example"\n' + ' page.fill("textarea#userInput", query)\n' + ' page.wait_for_timeout(2000)\n' + ' page.click(\'button[' + 'data-testid="submit-button"]\')\n' + ' sb.sleep(5.25)\n' + ' sb.solve_captcha()\n' + " ...\n" + ), + ) + self.add_slide( + "
" + "It's time for that live demo where we use Stealthy
" + "Playwright Mode to bypass Copilot's CAPTCHA...
" + "

" + '' + ) + self.add_slide( + "
" + "Are you ready for another live demo?

" + "Let's do some Walmart scraping now...
" + ) + self.add_slide( + "
" + "Let's do another live demo
before returning to the learning." + "

" + "This time, we're web-scraping TikTok...

" + ) + self.add_slide( + "In addition to using CDP for controlling Chrome in a" + " stealthy way, you can also achieve stealth by using" + " tools that can control the mouse and keyboard." + "

PyAutoGUI is one such tool:" + '' + ) + self.add_slide( + "PyAutoGUI requires a headed browser to work." + "

" + " Since most Linux machines have headless displays that" + " don't support headed browsers, an external tool called" + " Xvfb must be used in order to simulate a headed browser" + " in a headless Linux environment..." + ) + self.add_slide( + '' + ) + self.add_slide( + "
Xvfb is automatically used when needed in SeleniumBase." + "
This makes stealthy automation easy from Linux servers." + "
" + '' + ) + self.add_slide( + "

Here's another GitHub Actions example:

" + '' + ) + self.add_slide( + "

I have a full YouTube tutorial on that:

" + '' + ) + self.add_slide( + "

Same story with stealthy Docker automation:

" + '' + ) + self.add_slide( + "

Note that Docker requires extra config to be stealthy!" + "

" + "These changes can be made from the Dockerfile." + "

" + '' + ) + self.add_slide( + "

For instance, some standard fonts are not
" + "installed from the default Docker config.

" + '' + ) + self.add_slide( + "

This is a problem because websites can see
" + "which fonts are installed on your system.

" + '' + ) + self.add_slide( + "

If your system is missing standard fonts,
" + "then websites know you're running from a server,
" + "and therefore they know you're using automation.

" + '' + ) + self.add_slide( + "

The SeleniumBase Dockerfile" + " has you covered:

" + '' + ) + self.add_slide( + "

It would be a big mistake to think that you can
" + "be stealthy in a regular Docker container without
" + "changing the default fonts and configuration...

" + '' + ) + self.add_slide( + "

In the spirit of online shopping,
" + "the next demo is on Nordstrom...

" + ) + self.add_slide( + "

And now for some Ralph Lauren...

" + ) + self.add_slide( + "

Let's wrap up this quick" + "
shopping session on Kohls...

" + ) + self.add_slide( + "
Now that we have the ingredients for stealth..." + "

" + "

\n" + ) + self.add_slide( + "We can use all that to bypass different CAPTCHAs!" + "

" + "Get ready for some live demos of that!" + "

" + '' + ) + self.add_slide( + "


πŸ”Ή One method call does it all: πŸ”Ή" + "




" + "

\n" + ) + self.add_slide( + "

sb.solve_captcha() handles all of these:" + "



" + "

\n" + ) + self.add_slide( + "It may seem a bit odd or illegal that
" + "we're bypassing all these CAPTCHAs...
" + "

" + "but things may not actually be as they seem..." + '' + ) + self.add_slide( + "

Important Notice:

" + "(Know the laws and legal implications of scraping!)" + '' + ) + self.add_slide( + '' + ) + self.add_slide( + '' + ) + self.add_slide( + '' + ) + self.add_slide( + "Now that we all know scraping public data is legal," + "

" + "let's go web-scraping on Facebook's public pages..." + ) + self.add_slide( + "If a site wants to hide its public data from scrapers," + "

" + "then it'll need to hide that data from public view..." + ) + self.add_slide( + "Note that if automation performs actions too quickly,
" + "websites may detect this as bot-traffic and block you...
" + "

" + "To avoid this:

" + "βœ… Space out your actions so that the" + "
automation moves at human-like speed.
" + "

" + "βœ… Random sleep() calls can help." + ) + self.add_slide( + "

This is what happens when some anti-bots detect you:

" + '' + ) + self.add_slide( + "

When that happens, your bot might not get a " + "chance to solve a CAPTCHA to prove its humanity.

" + '' + "


In other cases, your bot may face a challenge..." + "

" + ) + self.add_slide( + "

This is what happens when hCAPTCHA detects bots:" + "

" + '' + ) + self.add_slide( + "

This is what happens when reCAPTCHA detects bots:" + "

" + '' + ) + self.add_slide( + "

And this is what happens when Gandalf blocks you:" + "

" + '' + ) + self.add_slide( + "

Get ready for a web-scraping
" + "live demo on Amazon.com ...

" + ) + self.add_slide( + "

Let's take a look at that Amazon-scraping script:" + "

" + "
", + code=( + "from seleniumbase import SB" + "\n\n" + "with SB(uc=True, test=True, ad_block=True) as sb:" + "\n" + ' url = "https://www.amazon.com"\n' + " sb.activate_cdp_mode(url)\n" + " sb.sleep(2)\n" + " sb.click_if_visible('button" + "[alt=\"Continue shopping\"]')" + "\n" + " sb.sleep(2)\n" + ' sb.press_keys(\'input' + '[role="searchbox"]\', "TI-89\\n")\n' + ' sb.sleep(3)' + '\n' + ' for i in range(16):\n' + ' sb.cdp.scroll_down(16)\n' + ' print(sb.get_page_title())\n' + ' sb.save_as_pdf_to_logs()\n' + ' sb.save_page_source_to_logs()\n' + ' sb.save_screenshot_to_logs()\n' + ' print("Logs have been saved to: ' + './latest_logs/")\n' + ), + ) + self.add_slide( + "

Let's run two live demos on Nike.com:" + "



" + "

\n" + ) + self.add_slide( + "

Let's take a look at that Nike-scraping script:" + "

" + "
", + code=( + "from seleniumbase import SB" + "\n\n" + '' + 'with SB(uc=True, test=True, locale="en", pls="none") as sb:' + "\n" + ' url = "https://www.nike.com/"\n' + " sb.activate_cdp_mode(url)\n" + " sb.sleep(2.5)\n" + " " + 'sb.click(\'[data-testid="user-tools-container"] search\')' + "\n" + " sb.sleep(1.5)\n" + ' search = "Nike Air Force 1"\n' + ' sb.press_keys(\'input' + '[type="search"]\', search)\n' + ' sb.sleep(4)' + '\n' + ' details = \'ul[data-testid*="products"] ' + 'figure .details\'\n' + ' elements = sb.select_all(details)\n' + ' if elements:\n' + ' print(\'**** Found results for ' + '"%s": ****\' %% search)\n' + ' for element in elements:\n' + ' print("* " + element.text)\n' + ), + ) + self.add_slide( + "

Here's a friendly reminder that
" + "those sb.sleep() calls are strategic.

" + "Automating too quickly can get you into
" + "trouble on sites with bot-protection...
" + '' + ) + self.add_slide( + "

" + "Sending too much traffic to a website from the same
" + "IP address could also stop your automation runners.

" + '' + ) + self.add_slide( + "

And some IP ranges are flagged " + "by anti-bot services.


" + 'β›” Eg. "Non-residential" IP ranges (such as AWS)' + "

" + '' + ) + self.add_slide( + "Scraping a site from a non-residential
" + "IP range is a sure way to get caught...
" + ) + self.add_slide( + "

This would not be a real video about
" + "web-scraping unless I mentioned proxies.

" + "SeleniumBase has a proxy option for that." + "" + '' + ) + self.add_slide( + "

" + "Here's how to configure a proxy with SeleniumBase:" + "



\n' + ) + self.add_slide( + "
If you don't have a proxy, it's easy to get one.
" + "Lots of providers out there, like Bright Data.
" + "(Bright Data blogs about SeleniumBase a lot.)
" + "
" + '' + ) + self.add_slide( + "

You could also launch your own
" + "proxy server with SeleniumBase:

" + '

"sbase proxy"

' + "

(That's it!)

" + '' + ) + self.add_slide( + '

More configuration options for "sbase proxy":' + '

' + '' + ) + self.add_slide( + "

Since websites can detect your geolocation,
" + "it may be a good idea to change it at times.

" + "Fortunately, changing your geolocation/timezone
" + "can be done faster than changing your hair color.
" + '' + ) + self.add_slide( + "

" + "🌐 Here's how to configure those with SeleniumBase:" + "



\n' + ) + self.add_slide( + "

🌐 Timezone/Geolocation 🌏


" + "

Let's run a live demo to show how that works:" + "



" + "

\n" + ) + self.add_slide( + "

Another cool feature of CDP is the ability
" + "to intercept & modify requests in real time.

" + '' + ) + self.add_slide( + '' + ) + self.add_slide( + '' + ) + self.add_slide( + '' + ) + self.add_slide( + "

Time for a live demo of that...

" + "
(Intercepting/modifying requests)" + ) + self.add_slide( + "As you can see, SeleniumBase has all
" + "the CDP features you're looking for.

" + '' + ) + self.add_slide( + "If you're looking for powerful multi-threading,
" + "then ThreadPoolExecutor has you covered.
" + '' + ) + self.add_slide( + "

Sometimes you can defeat CAPTCHAs in advance
" + "if you already have the required cookies loaded
" + "in your web browser. (Eg: cf_clearance)
" + "

" + '' + ) + self.add_slide( + "

Getting the cf_clearance cookie is easy:" + "

Just go to a CF-protected site and take it!" + "" + '' + ) + self.add_slide( + "

" + "Get ready for a live demo of
" + "stealthy mobile emulation...

" + ) + self.add_slide( + "

For more info on stealthy mobile emulation,
" + " check out the YouTube video on it:" + "

" + '' + ) + self.add_slide( + "

There's also a follow-up mobile video
" + " that deals with changing the User Agent:" + "

" + '' + ) + self.add_slide( + "

" + "SeleniumBase has an option for using unbranded
" + "Chromium in place of regular Google Chrome.
" + "



\n" + '' + ) + self.add_slide( + "

" + "Here's how to use unbranded Chromium in scripts:" + "





\n' + "(Chromium is automatically downloaded
" + "if it hasn't yet been used by SeleniumBase)
" + ) + self.add_slide( + "

" + "SeleniumBase can download Chromium in advance:" + "



\n" + '

βœ… sbase get chromium

' + '' + ) + self.add_slide( + "" + "Unlike Google Chrome, unbranded Chromium still
" + "supports loading extensions via command flags.
" + "
\n" + '' + ) + self.add_slide( + "" + "Additionally, unbranded Chromium may be stealthier
" + "than regular Google Chrome on certain websites.
" + "


\n" + "

Get ready for a web-scraping
" + "live demo on Reddit.com ...

" + "

" + ) + self.add_slide( + '' + ) + self.add_slide( + "reCAPTCHA comes in many flavors:

" + '' + "

Some flavors are stronger than others." + ) + self.add_slide( + "

πŸ” Google reCAPTCHA πŸ”


" + '' + '' + "

" + ) + self.add_slide( + "

πŸ” Google reCAPTCHA πŸ”


" + '

' + "

reCAPTCHA is very effective at catching bots.

" + "




" + ) + self.add_slide( + "

πŸ” Google reCAPTCHA πŸ”


" + '' + "

" + "

Even though Google reCAPTCHA looks similar
" + "to Cloudflare Turnstile, they are very different!

" + '' + "

" + ) + self.add_slide( + "" + "This may explain why Google reCAPTCHA is better:" + "

" + "

Since Google makes both reCAPTCHA and Chrome," + " reCAPTCHA has access to more data than other CAPTCHAs," + " and can therefore detect bots better." + "



" + ) + self.add_slide( + "Many websites are using an out-of-date reCAPTCHA:" + "

" + '' + "

" + "(Sites like these are vulnerable to web-scrapers)" + "
" + ) + self.add_slide( + "

Get ready for a web-scraping
" + "live demo on Pokemon.com ...

" + ) + self.add_slide( + "

πŸ” Google reCAPTCHA πŸ”



" + "In the case of encountering an Enterprise V3
" + "reCAPTCHA... that's a very different story...
" + "


" + '' + "

" + ) + self.add_slide( + "

" + "Note: Calling open(url) from UC Mode
" + "automatically activates CDP Mode now.
" + "


" + ) + self.add_slide( + '' + ) + self.add_slide( + "

" + "Naming conventions used in the SeleniumBase repo:" + "



\n" + "" + ) + self.add_slide( + "

" + "Finding stealthy examples in the SeleniumBase repo:" + "



\n" + 'βœ… See "SeleniumBase/examples/cdp_mode/"
' + 'for all examples made for stealth & CAPTCHAs.


' + 'βœ… For the Stealthy Playwright Mode examples, see:
' + '"SeleniumBase/examples/cdp_mode/playwright/"
' + ) + self.add_slide( + "

" + "Let's make a quick trip to the SeleniumBase repo:" + "


\n" + '' + ) + self.add_slide( + "" + "It's time for more CAPTCHA-bypass!" + "

\n" + "Get ready for a live demo of
" + "bypassing a Cloudflare CAPTCHA
" + "on Cloudflare's own login page ..." + "
" + "

" + '' + ) + self.add_slide( + "
" + "If bypassing CAPTCHAs isn't exciting enough for you,
" + "then treat every CAPTCHA-bypass like scoring a goal...
" + "
" + '' + ) + self.add_slide( + "

πŸ€– You've reached the AI section of this video! πŸ€–" + "



" + '' + ) + self.add_slide( + "

πŸ€– I've actually been using a lot of AI already! πŸ€–" + "



" + "βœ… Many illustrations were AI-generated.
" + '' + ) + self.add_slide( + "

πŸ€– Thanks for all the free artwork, Gemini! πŸ€–" + "


" + '' + ) + self.add_slide( + "

Let's use Gemini's power to bypass CAPTCHAs!" + "



" + '' + ) + self.add_slide( + "

Every year the AI models get better and better" + "



" + '' + ) + self.add_slide( + "

Definitely an improvement over last year's models" + "


" + '' + ) + self.add_slide( + "

I also covered AI in my earlier video:" + "

" + 'Can AI tools help you with web-scraping & CAPTCHA-bypass?' + '
' + "


" + '
' + "
" + '(Spoiler alert: It depends on which AI you ask!)' + '
' + ) + self.add_slide( + "

It's time for the next live demo!

" + "


" + "

For that, we're scraping LinkedIn...

" + '' + ) + self.add_slide( + "

As you can see ...

" + "

" + "

SeleniumBase works well for web-scraping." + "

" + '' + ) + self.add_slide( + "

" + "If you need help creating scripts,
" + "just ask the AI! (maybe Gemini)
" + "


\n" + '
' + 'βœ… Or you can replicate a bunch of
' + 'AI agents to do the work for you...

' + ) + self.add_slide( + "

❓ Questions? ❓

" + "https://github.com/seleniumbase/SeleniumBase/discussions" + "

" + "

πŸ“Œ Found a bug? 🐞

" + "https://github.com/seleniumbase/SeleniumBase/issues" + "
" + ) + self.add_slide( + "

πŸ“Š Final remarks πŸ“£



" + "

πŸ› οΈ SeleniumBase gives you πŸ› οΈ
" + "the tools you need to succeed!" + "


" + "And tools to build lots of bots..." + "


" + ) + self.add_slide( + "
🏁 The End 🏁
" + '' + ) + self.begin_presentation(filename="uc_presentation.html") diff --git a/help_docs/ReadMe.md b/help_docs/ReadMe.md index 49839c38824..e38f6c5ef04 100644 --- a/help_docs/ReadMe.md +++ b/help_docs/ReadMe.md @@ -14,10 +14,10 @@ πŸ“š Examples | πŸ“± Emulator
-πŸͺ„ Console Scripts | +πŸ’» Console Scripts | 🌐 Grid
-πŸ“˜ Methods / APIs | +πŸ“— Methods / APIs | 🚎 Tours
πŸ”  Syntax Formats | @@ -26,7 +26,7 @@ ♻️ Boilerplates | πŸ—Ύ Locale Codes
-❇️ JS Manager | +🟨 JS Manager | πŸ–ΌοΈ Visual Testing
🌏 Translator | @@ -43,6 +43,8 @@ πŸŽ–οΈ GUI | πŸ‘€ UC Mode
+πŸ“˜ Stealth API +
πŸ™ CDP Mode

diff --git a/help_docs/how_it_works.md b/help_docs/how_it_works.md index 6a68889afc3..2b78c0cfff4 100644 --- a/help_docs/how_it_works.md +++ b/help_docs/how_it_works.md @@ -1,6 +1,6 @@ -

How SeleniumBase Works πŸ‘οΈ

+

How the SeleniumBase Test Framework Works πŸ‘οΈ

diff --git a/help_docs/js_package_manager.md b/help_docs/js_package_manager.md index 26390f356ec..f8c7d6c1a60 100644 --- a/help_docs/js_package_manager.md +++ b/help_docs/js_package_manager.md @@ -2,7 +2,7 @@

JS Package Manager and Code Generators

-

❇️ SeleniumBase lets you load JavaScript packages from any CDN link into any website via Python.

+

🟨 SeleniumBase lets you load JavaScript packages from any CDN link into any website via Python.

🎨 The following SeleniumBase solutions utilize this feature: @@ -29,7 +29,7 @@ cd examples/tour_examples pytest maps_introjs_tour.py --interval=1 ``` -

❇️ SeleniumBase includes powerful JS code generators for converting Python into JavaScript for using the supported JS packages. A few lines of Python in your tests might generate hundreds of lines of JavaScript.

+

🟨 SeleniumBase includes powerful JS code generators for converting Python into JavaScript for using the supported JS packages. A few lines of Python in your tests might generate hundreds of lines of JavaScript.

πŸ—ΊοΈ Here is some tour code in Python from maps_introjs_tour.py that expands into a lot of JavaScript.

@@ -44,21 +44,21 @@ self.export_tour(filename="maps_introjs_tour.js") self.play_tour() ``` -

❇️ For existing features, SeleniumBase already takes care of loading all the necessary JS and CSS files into the web browser. To load other packages, here are a few useful methods that you should know about:

+

🟨 For existing features, SeleniumBase already takes care of loading all the necessary JS and CSS files into the web browser. To load other packages, here are a few useful methods that you should know about:

```python self.add_js_link(js_link) ``` -

❇️ This example loads the IntroJS JavaScript library:

+

🟨 This example loads the IntroJS JavaScript library:

```python self.add_js_link("https://cdn.jsdelivr.net/npm/intro.js@5.1.0/intro.min.js") ``` -
❇️ You can load any JS package this way as long as you know the URL.
+
🟨 You can load any JS package this way as long as you know the URL.
-

❇️ If you're wondering how SeleniumBase does this, here's the full Python code from js_utils.py, which uses WebDriver's execute_script() method for making JS calls after escaping quotes with backslashes as needed:

+

🟨 If you're wondering how SeleniumBase does this, here's the full Python code from js_utils.py, which uses WebDriver's execute_script() method for making JS calls after escaping quotes with backslashes as needed:

```python def add_js_link(driver, js_link): @@ -78,19 +78,19 @@ def add_js_link(driver, js_link): driver.execute_script(script_to_add_js % js_link) ``` -

❇️ Now that you've loaded JavaScript into the browser, you may also want to load some CSS to go along with it:

+

🟨 Now that you've loaded JavaScript into the browser, you may also want to load some CSS to go along with it:

```python self.add_css_link(css_link) ``` -

❇️ Here's code that loads the IntroJS CSS:

+

🟨 Here's code that loads the IntroJS CSS:

```python self.add_css_link("https://cdnjs.cloudflare.com/ajax/libs/intro.js/2.9.3/introjs.css") ``` -

❇️ And here's the Python WebDriver code that makes this possible:

+

🟨 And here's the Python WebDriver code that makes this possible:

```python def add_css_link(driver, css_link): @@ -109,7 +109,7 @@ def add_css_link(driver, css_link): driver.execute_script(script_to_add_css % css_link) ``` -
❇️ Website tours are just one of the many uses of the JS Package Manager.
+
🟨 Website tours are just one of the many uses of the JS Package Manager.

πŸ›‚ The following example shows the JqueryConfirm package loaded into a website for creating fancy dialog boxes:

SeleniumBase @@ -127,7 +127,7 @@ pytest test_dialog_boxes.py

(Example from the Dialog Boxes ReadMe)

-
❇️ Since packages are loaded directly from a CDN link, you won't need other package managers like NPM, Bower, or Yarn to get the packages that you need into the websites that you want.
+
🟨 Since packages are loaded directly from a CDN link, you won't need other package managers like NPM, Bower, or Yarn to get the packages that you need into the websites that you want.
-------- diff --git a/help_docs/locale_codes.md b/help_docs/locale_codes.md index 44e80ce0f43..1f5e2982cf0 100644 --- a/help_docs/locale_codes.md +++ b/help_docs/locale_codes.md @@ -8,10 +8,10 @@ You can specify a Language Locale Code to customize web pages on supported websi pytest --locale=CODE # Example: --locale=ru ``` -From the ``SB()`` and ``Driver()`` formats, you can also set the ``locale_code`` arg like this: +From the `SB()` and `Driver()` formats, you can also set the `locale` arg like this: ```python -locale_code="CODE" # Example: SB(locale_code="en") +locale="CODE" # Example: SB(locale="en") ```

List of Language Locale Codes:

diff --git a/mkdocs_build/requirements.txt b/mkdocs_build/requirements.txt index 8b74c775abe..0f6b71c814d 100644 --- a/mkdocs_build/requirements.txt +++ b/mkdocs_build/requirements.txt @@ -1,12 +1,12 @@ # mkdocs dependencies for generating the seleniumbase.io website # Minimum Python version: 3.10 (for generating docs only) -regex>=2026.3.32 -pymdown-extensions>=10.21.2 -pipdeptree>=2.35.1 +regex>=2026.5.9 +pymdown-extensions>=10.21.3 +pipdeptree>=2.35.2 python-dateutil>=2.8.2 Markdown==3.10.2 -click==8.3.3 +click==8.4.0 ghp-import==2.1.0 watchdog==6.0.0 cairocffi==1.7.1 diff --git a/requirements.txt b/requirements.txt index db1f402f73a..b32e8c85fcc 100755 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ pip>=26.0.1;python_version<"3.10" -pip>=26.1;python_version>="3.10" +pip>=26.1.1;python_version>="3.10" packaging>=26.2 setuptools~=70.2;python_version<"3.10" setuptools>=82.0.1;python_version>="3.10" @@ -21,7 +21,7 @@ sbvirtualdisplay>=1.4.0 MarkupSafe>=3.0.3 Jinja2>=3.1.6 six>=1.17.0 -parse>=1.21.1 +parse>=1.22.0 parse-type>=0.6.6 colorama>=0.4.6 pyyaml>=6.0.3 @@ -29,12 +29,12 @@ pygments>=2.20.0 pyreadline3>=3.5.4;platform_system=="Windows" tabcompleter>=1.4.1 pdbp>=1.8.2 -idna>=3.13 +idna>=3.15 charset-normalizer>=3.4.7,<4 urllib3>=1.26.20,<2;python_version<"3.10" -urllib3>=1.26.20,<3;python_version>="3.10" +urllib3>=2.7.0,<3;python_version>="3.10" requests~=2.32.5;python_version<"3.10" -requests~=2.33.1;python_version>="3.10" +requests~=2.34.2;python_version>="3.10" sniffio==1.3.1 h11==0.16.0 outcome==1.3.0.post0 @@ -45,7 +45,7 @@ wsproto==1.2.0;python_version<"3.10" wsproto~=1.3.2;python_version>="3.10" websocket-client~=1.9.0 selenium==4.32.0;python_version<"3.10" -selenium==4.43.0;python_version>="3.10" +selenium==4.44.0;python_version>="3.10" cssselect==1.3.0;python_version<"3.10" cssselect>=1.4.0,<2;python_version>="3.10" sortedcontainers==2.4.0 @@ -60,7 +60,7 @@ pytest-html==4.0.2 pytest-metadata==3.1.1 pytest-ordering==0.6 pytest-rerunfailures==16.0.1;python_version<"3.10" -pytest-rerunfailures==16.1;python_version>="3.10" +pytest-rerunfailures==16.2;python_version>="3.10" pytest-xdist==3.8.0 parameterized==0.9.0 behave==1.2.6 @@ -70,7 +70,7 @@ pyotp==2.9.0 python-xlib==0.33;platform_system=="Linux" PyAutoGUI>=0.9.54;platform_system=="Linux" markdown-it-py==3.0.0;python_version<"3.10" -markdown-it-py==4.0.0;python_version>="3.10" +markdown-it-py==4.2.0;python_version>="3.10" mdurl==0.1.2 rich>=15.0.0,<16 @@ -78,7 +78,7 @@ rich>=15.0.0,<16 # ("pip install -r requirements.txt" also installs this, but "pip install -e ." won't.) coverage>=7.10.7;python_version<"3.10" -coverage>=7.13.5;python_version>="3.10" +coverage>=7.14.0;python_version>="3.10" pytest-cov>=7.1.0 flake8==7.3.0 mccabe==0.7.0 diff --git a/seleniumbase/__version__.py b/seleniumbase/__version__.py index 3353a900f9b..1a42e8d286d 100755 --- a/seleniumbase/__version__.py +++ b/seleniumbase/__version__.py @@ -1,2 +1,2 @@ # seleniumbase package -__version__ = "4.48.4" +__version__ = "4.49.0" diff --git a/seleniumbase/console_scripts/ReadMe.md b/seleniumbase/console_scripts/ReadMe.md index 8f2291d988e..6a64218bd70 100644 --- a/seleniumbase/console_scripts/ReadMe.md +++ b/seleniumbase/console_scripts/ReadMe.md @@ -1,6 +1,6 @@ -

Console Scripts πŸͺ„

+

Console Scripts πŸ’»

🌟 SeleniumBase console scripts can do many things, such as downloading web drivers, creating test directories with config files, activating the SeleniumBase Recorder, launching the SeleniumBase Commander, translating tests into other languages, running a Selenium Grid, and more. diff --git a/seleniumbase/fixtures/base_case.py b/seleniumbase/fixtures/base_case.py index ea2fab9129b..044986ead7d 100644 --- a/seleniumbase/fixtures/base_case.py +++ b/seleniumbase/fixtures/base_case.py @@ -5082,6 +5082,14 @@ def activate_cdp_mode(self, url=None, **kwargs): self.driver.uc_open_with_cdp_mode(url, **kwargs) else: self.get_new_driver(undetectable=True) + if self.browser == "edge": + raise Exception( + 'For stealth with Edge, use "Pure CDP Mode"! Eg:\n' + '```python\n' + 'from seleniumbase import sb_cdp\n' + 'sb = sb_cdp.Chrome(url, browser="edge")\n' + '```' + ) self.driver.uc_open_with_cdp_mode(url, **kwargs) self.cdp = self.driver.cdp if hasattr(self.cdp, "solve_captcha"): @@ -16893,6 +16901,11 @@ def tearDown(self): ) raise Exception(message) # *** Start tearDown() officially *** + if hasattr(self.driver, "_already_quit") and self.driver._already_quit: + logging.warning( + " The driver was already quit in a mode" + " that quits the driver automatically." + ) page_actions._reconnect_if_disconnected(self.driver) self.__slow_mode_pause_if_active() has_exception = self.__has_exception() diff --git a/seleniumbase/undetected/__init__.py b/seleniumbase/undetected/__init__.py index 889309e9a65..3b316a21671 100644 --- a/seleniumbase/undetected/__init__.py +++ b/seleniumbase/undetected/__init__.py @@ -517,8 +517,12 @@ def connect(self): with suppress(Exception): self.service.start() with suppress(Exception): - self.start_session() - time.sleep(0.0075) + already_quit = False + if hasattr(self, "_already_quit") and self._already_quit: + already_quit = True + if not already_quit: + self.start_session() + time.sleep(0.0075) with suppress(Exception): for window_handle in self.window_handles: self.switch_to.window(window_handle) @@ -559,6 +563,7 @@ def quit(self): try: logger.debug("Terminating the UC browser") os.kill(self.browser_pid, 15) + self._already_quit = True if "linux" in sys.platform: os.waitpid(self.browser_pid, 0) time.sleep(0.02) diff --git a/seleniumbase/undetected/cdp_driver/cdp_util.py b/seleniumbase/undetected/cdp_driver/cdp_util.py index b66e35143ae..2f3eea1964f 100644 --- a/seleniumbase/undetected/cdp_driver/cdp_util.py +++ b/seleniumbase/undetected/cdp_driver/cdp_util.py @@ -311,6 +311,7 @@ async def start( disable_csp: Optional[str] = None, # Disable content security policy extension_dir: Optional[str] = None, # Chrome extension directory use_chromium: Optional[str] = None, # Use the base Chromium browser + cft: Optional[str] = None, # Use the Chrome-for-Testing browser **kwargs: Optional[dict], ) -> Browser: """ @@ -528,6 +529,8 @@ async def start( browser_executable_path = bin_loc elif use_chromium or "--use-chromium" in arg_join: browser_executable_path = "_chromium_" + elif cft or "--cft" in arg_join: + browser_executable_path = "_cft_" if proxy is None and "--proxy" in arg_join: proxy_string = None if "--proxy=" in arg_join: diff --git a/seleniumbase/undetected/cdp_driver/config.py b/seleniumbase/undetected/cdp_driver/config.py index 0fc47b53fd5..48bd6971e84 100644 --- a/seleniumbase/undetected/cdp_driver/config.py +++ b/seleniumbase/undetected/cdp_driver/config.py @@ -2,12 +2,14 @@ import logging import os import pathlib +import platform import secrets import sys import tempfile import zipfile from contextlib import suppress from seleniumbase.config import settings +from seleniumbase.drivers import cft_drivers from seleniumbase.drivers import chromium_drivers from seleniumbase.fixtures import constants from seleniumbase.fixtures import shared_utils @@ -30,9 +32,8 @@ IS_MAC = shared_utils.is_mac() IS_LINUX = shared_utils.is_linux() IS_WINDOWS = shared_utils.is_windows() -CHROMIUM_DIR = os.path.dirname( - os.path.realpath(chromium_drivers.__file__) -) +CFT_DIR = os.path.dirname(os.path.realpath(cft_drivers.__file__)) +CHROMIUM_DIR = os.path.dirname(os.path.realpath(chromium_drivers.__file__)) class Config: @@ -147,6 +148,53 @@ def __init__( f"Defaulting to regular Chrome!" ) browser_executable_path = find_chrome_executable() + elif browser_executable_path == "_cft_": + from filelock import FileLock + binary_folder = None + if IS_MAC: + if shared_utils.is_arm_mac(): + binary_folder = "chrome-mac-arm64" + else: + binary_folder = "chrome-mac-x64" + elif IS_LINUX: + binary_folder = "chrome-linux64" + elif IS_WINDOWS: + if "64" in platform.architecture()[0]: + binary_folder = "chrome-win64" + else: + binary_folder = "chrome-win32" + binary_location = os.path.join(CFT_DIR, binary_folder) + gui_lock = FileLock(constants.MultiBrowser.DRIVER_FIXING_LOCK) + with gui_lock: + with suppress(Exception): + shared_utils.make_writable( + constants.MultiBrowser.DRIVER_FIXING_LOCK + ) + if not os.path.exists(binary_location): + from seleniumbase.console_scripts import sb_install + sys_args = sys.argv # Save a copy of sys args + sb_install.log_d( + "\nWarning: Chrome for Testing binary not found..." + ) + sb_install.main(override="cft") + sys.argv = sys_args # Put back original args + binary_name = binary_location.split("/")[-1].split("\\")[-1] + if binary_name == "Google Chrome for Testing.app": + binary_name = "Google Chrome for Testing" + binary_location += "/Contents/MacOS/Google Chrome for Testing" + elif binary_name in ["chrome-mac-arm64", "chrome-mac-x64"]: + binary_name = "Google Chrome for Testing" + binary_location += "/Google Chrome for Testing.app" + binary_location += "/Contents/MacOS/Google Chrome for Testing" + if os.path.exists(binary_location): + mock_keychain = True + browser_executable_path = binary_location + else: + print( + f"{binary_location} not found. " + f"Defaulting to regular Chrome!" + ) + browser_executable_path = find_chrome_executable() self._browser_args = browser_args self.browser_executable_path = browser_executable_path self.headless = headless diff --git a/setup.py b/setup.py index 438933e663b..7c740a9496c 100755 --- a/setup.py +++ b/setup.py @@ -74,7 +74,10 @@ setup( name="seleniumbase", version=about["__version__"], - description="A complete web automation framework for end-to-end testing.", + description=( + "Stealthy automation (via CDP Mode) that passes every bot detection " + "test with any Chromium browser you want. Includes a test framework." + ), long_description=long_description, long_description_content_type="text/markdown", url="https://github.com/seleniumbase/SeleniumBase", @@ -104,8 +107,23 @@ "webdriver", "seleniumbase", "sbase", - "crawling", + "web-crawling", + "web-scraping", "scraping", + "stealth", + "chromium", + "playwright", + "anti-detect", + "antidetect", + "undetected", + "bot-detection", + "fingerprint", + "recaptcha", + "cloudflare", + "turnstile", + "datadome", + "captcha", + "headless", ], classifiers=[ "Development Status :: 5 - Production/Stable", @@ -147,7 +165,7 @@ python_requires=">=3.9", install_requires=[ 'pip>=26.0.1;python_version<"3.10"', - 'pip>=26.1;python_version>="3.10"', + 'pip>=26.1.1;python_version>="3.10"', 'packaging>=26.2', 'setuptools~=70.2;python_version<"3.10"', # Newer ones had issues 'setuptools>=82.0.1;python_version>="3.10"', @@ -169,7 +187,7 @@ 'MarkupSafe>=3.0.3', "Jinja2>=3.1.6", "six>=1.17.0", - 'parse>=1.21.1', + 'parse>=1.22.0', 'parse-type>=0.6.6', 'colorama>=0.4.6', 'pyyaml>=6.0.3', @@ -177,12 +195,12 @@ 'pyreadline3>=3.5.4;platform_system=="Windows"', 'tabcompleter>=1.4.1', 'pdbp>=1.8.2', - 'idna>=3.13', + 'idna>=3.15', 'charset-normalizer>=3.4.7,<4', 'urllib3>=1.26.20,<2;python_version<"3.10"', - 'urllib3>=1.26.20,<3;python_version>="3.10"', + 'urllib3>=2.7.0,<3;python_version>="3.10"', 'requests~=2.32.5;python_version<"3.10"', - 'requests~=2.33.1;python_version>="3.10"', + 'requests~=2.34.2;python_version>="3.10"', 'sniffio==1.3.1', 'h11==0.16.0', 'outcome==1.3.0.post0', @@ -193,7 +211,7 @@ 'wsproto~=1.3.2;python_version>="3.10"', 'websocket-client~=1.9.0', 'selenium==4.32.0;python_version<"3.10"', - 'selenium==4.43.0;python_version>="3.10"', + 'selenium==4.44.0;python_version>="3.10"', 'cssselect==1.3.0;python_version<"3.10"', 'cssselect>=1.4.0,<2;python_version>="3.10"', 'sortedcontainers==2.4.0', @@ -208,7 +226,7 @@ 'pytest-metadata==3.1.1', 'pytest-ordering==0.6', 'pytest-rerunfailures==16.0.1;python_version<"3.10"', - 'pytest-rerunfailures==16.1;python_version>="3.10"', + 'pytest-rerunfailures==16.2;python_version>="3.10"', 'pytest-xdist==3.8.0', 'parameterized==0.9.0', 'behave==1.2.6', # Newer ones had issues @@ -218,7 +236,7 @@ 'python-xlib==0.33;platform_system=="Linux"', 'PyAutoGUI>=0.9.54;platform_system=="Linux"', 'markdown-it-py==3.0.0;python_version<"3.10"', - 'markdown-it-py==4.0.0;python_version>="3.10"', + 'markdown-it-py==4.2.0;python_version>="3.10"', 'mdurl==0.1.2', 'rich>=15.0.0,<16', ], @@ -235,7 +253,7 @@ # Usage: coverage run -m pytest; coverage html; coverage report "coverage": [ 'coverage>=7.10.7;python_version<"3.10"', - 'coverage>=7.13.5;python_version>="3.10"', + 'coverage>=7.14.0;python_version>="3.10"', 'pytest-cov>=7.1.0', ], # pip install -e .[flake8] @@ -262,7 +280,7 @@ "pdfminer": [ 'pdfminer.six==20251107;python_version<"3.10"', 'pdfminer.six==20260107;python_version>="3.10"', - 'cryptography==47.0.0', + 'cryptography==48.0.0', 'cffi==2.0.0', 'pycparser==2.23;python_version<"3.10"', 'pycparser==3.0;python_version>="3.10"',