讓playwright進入隱身(Stealth)防止被偵測為Bot

透過playwright自動化尋找資料時,常常會遇到網站使用防爬蟲技術,這些技術會檢測是否為自動化工具存取網站。為了避免被偵測,可以使用Playwright的隱身模式(Stealth Mode)。

安裝Playwright puppeteer stealth

這部分需要套件來支援隱身模式,網路上其實免費資源不多(最大宗屬於puppeteer-extra-plugin-stealth,但也有接近兩年沒有更新了),由於BOT偵測是貓捉老鼠的遊戲,所以套件應該會逐漸失效。

安裝指令如下:

#安裝擴充套件
npm install playwright-extra

#安裝隱身套件
npm install puppeteer-extra-plugin-stealth

使用Playwright隱身模式

基本上只可以躲過基本的WebDriver偵測,其他的就且看且走吧,畢竟套件已經兩年沒有更新應該是被放生了。透過下列程式碼可以使用隱身模式正確地的顯示You are not Chrome headless,但其實有更多精確測試BotBrowser的網站其實都測不過。

const { chromium } = require('playwright-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
chromium.use(pluginStealth());

let browser, context, page;
const browserContextOptions = {
  viewport: { width: 1920, height: 1080 },
};

(async () => {
  browser = await chromium.launch({
    headless: true,
    slowMo: 100,
  });
  context = await browser.newContext({
    ...browserContextOptions
  });
  page = await context.newPage();
  try {
    await page.goto('https://arh.antoinevastel.com/bots/areyouheadless');
    await page.waitForTimeout(5000);
    console.log('Finished!');
  } catch (error) {
    console.log(`Error: ${error.message}`);  
  }
  await page.screenshot({ path: `test.png` });
  await browser.close();
})();
Playwright StealthMode BotDetection Automation WebScraping PuppeteerExtraPluginStealth HeadlessBrowser Chromium PluginInstallation BrowserContextOptions Viewport Google HomePage Screenshot ErrorHandling NodeJs NpmInstall WebDriverDetection AntiBot