Selenium

Nos bajamos la imagen de selenium con los drivers de chrome y de firefox:

Chrome: http://chromedriver.chromium.org/downloads

docker run -ti iwanttobefreak/selenium

Entramos en modo interactivo para probarlo:

ipython

Python 2.7.13 (default, Sep 26 2018, 18:42:22) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]:

Ahora podemos lanzar comandos:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

Por ejemplo vamos a buscar un tren en la web de Renfe:

url = 'http://www.renfe.com'
driver.get(url)

Y grabamos la url en un fichero:

driver.save_screenshot('/tmp/selenium/renfe.png')

Ejemplo headless chrome con timeout en llamada get

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument("--headless")

url1 = 'http://10.255.255.1'
url1_timeout = 5

driver = webdriver.Firefox(firefox_options = options)

driver.set_page_load_timeout(url1_timeout)
driver.get(url1)

Ejemplo headless chrome con timeout en llamada get

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

url1 = 'http://10.255.255.1'
url1_timeout = 5
driver = webdriver.Chrome(desired_capabilities = options.to_capabilities())
driver.set_page_load_timeout(url1_timeout)
driver.get(url1)

Python firefox remote headless

Por definición siempre que es remote es headless

from selenium.webdriver import Firefox, FirefoxProfile, Remote

host = '172.30.10.18'
port = '4444'

host = selenium_hub_host
port = selenium_hub_port
url = f'{host}:{port}/wd/hub'
d = Remote(command_executor = url, desired_capabilities = desired_capabilities)
# MUY importante para evitar error:
# selenium.common.exceptions.ElementClickInterceptedException: Message: Element <input id="projects_" name="projects[]" type="checkbox"> is not clickable at point (179,16) because another element <a href="/"> obscures it
d.set_window_size(1920, 1080)

Python firefox remote

1. Arrancar el standalone

version: '3.7'

services:

 hub:
  container_name: hub
  image: selenium/standalone-firefox

2. Obtner la IP de ese contenedor

3. Probar

from selenium import webdriver
host = "192.168.3.44"
port = "4444"
desired_capabilities = {
                                        'browserName': 'firefox',
                                        'javascriptEnabled': True,
                                       }

self.driver = webdriver.Remote(command_executor =
                                      host + ':' + port + '/wd/hub',
                                      desired_capabilities =
                                                          desired_capabilities)

Python chrome headless local

IMPORTANTE: si no se especifica el tamaño de la pantalla puede no encontrar objetos en el DOM.

Ejemplo:

https://www.linkedin.com/search/results/companies/?keywords=autoescuela&origin=SWITCH_SEARCH_VERTICAL'

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'

options = Options()
options.add_argument('--headless')
options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(CHROMEDRIVER_PATH, chrome_options=options)

Python firefox headless local

Sin profile

IMPORTANTE: si no se especifica el tamaño de la pantalla puede no encontrar objetos en el DOM.

Ejemplo:

https://www.linkedin.com/search/results/companies/?keywords=autoescuela&origin=SWITCH_SEARCH_VERTICAL'

from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox

# Local headless
options = Options()
options.headless = True
driver = Firefox(options = options)

Con profile

from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox, FirefoxProfile

# Local headless
options = Options()
options.headless = True
driver = Firefox(firefox_profile = profile, options = options)

Python firefox headless local con profile y options

https://stackoverflow.com/a/52898225/11436137

from selenium import webdriver;
from selenium.webdriver.firefox.options import Options

cProfile = webdriver.FirefoxProfile();
dwnd_path = os.getcwd();
cProfile.add_preference('browser.download.folderList', '2');
cProfile.add_preference('browser.download.manager.showWhenStarting', 'false');
cProfile.add_preference('browser.download.dir', 'dwnd_path');
cProfile.add_preference('browser.helperApps.neverAsk.saveToDisk', 'application/octet-stream,application/vnd.ms-excel');
options = Options()
options.headless = True
driver = webdriver.Firefox(firefox_profile=cProfile, firefox_options=options, executable_path=r'C:\path\to\geckodriver.exe')

Errores

selenium.common.exceptions.ElementClickInterceptedException: Message: Element <input id="projects_" name="projects[]" type="checkbox"> is not clickable at point (179,16) because another element <a href="/"> obscures it

Causa: se ha iniciado un webdriver remoto sin especificar las dimensiones de la ventana

Solución:

driver.set_window_size(1920, 1080)

selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette

Causa:

* Se ha especificado un tamaño de ventana con set_window_size() (probablemente es irrelevante) * Se ha quedado sin memoria la instancia de Firefox

Solución: especificar variable “shm_size”. Ejemplo docker-compose:

 easyredmine-backup-selenium:
  container_name: easyredmine-backup-selenium
  image: selenium/standalone-firefox
  #restart: unless-stopped
  # Mandatory, to avoid "Message: Failed to decode response from marionette" error
  shm_size: ${SHM_SIZE}
  #environment:
  # - START_XVFB=False
  networks:
   network-easyredmine-backup:
    aliases:
     - easyredmine-backup-selenium
  volumes:
   - ${DOCKER_HOST_DOWNLOAD_DIR}:${DOCKER_CONTAINER_DOWNLOAD_DIR}

PROXY

Para firefox: webdriver.DesiredCapabilities.FIREFOX['proxy']
Para chrome: webdriver.DesiredCapabilities.CHROME['proxy']

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import *

PROXY = "172.17.0.1:3128"

webdriver.DesiredCapabilities.FIREFOX['proxy'] = {
    "httpProxy":PROXY,
    "ftpProxy":PROXY,
    "sslProxy":PROXY,
    "proxyType":"MANUAL"
}

options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

Sesiones

Firefox

Abrimos firefox en local. Escribimos en la barra about:profiles. Creamos un nuevo pofile y le asignamos un directorio. Lanzamos el profile y navegamos para que nos guarde información, por ejemplo el login de whatsapp.

Lanzamos selenium con la siguiente propiedad:

myprofile = webdriver.FirefoxProfile("<ruta a mi directorio de perfil>")
driver = webdriver.Firefox(myprofile)

Chrome

Simplemente le tenemos que indicar un directorio y ya lo graba ahí. Funciona con Chromedriver 70 a 73. Lo podemos descargar de:
https://chromedriver.storage.googleapis.com/index.html

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

CHROMEDRIVER_PATH = '/chromedriver70/chromedriver'

options = Options()
options.add_argument('user-data-dir=/selenium/session')

options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)

Whatsapp

Enviar mensaje

xpath = './/span[contains(@title, "Armando Bronca")]'
o = driver.find_element_by_xpath(xpath)
o.click()

xpath = './/div[contains(@class, "_3u328 copyable-text selectable-text")]'
o = driver.find_element_by_xpath(xpath)
o.send_keys('Mensaje enviado desde selenium')

xpath = './/button[contains(@class, "_3M-N-")]'
o = driver.find_element_by_xpath(xpath)
o.click()

Adjuntar fichero

xpath = './/span[contains(@title, "Armando Bronca")]'
o = driver.find_element_by_xpath(xpath)
o.click()

xpath = './/div[contains(@title, "Adjuntar")]'
o = driver.find_element_by_xpath(xpath)
o.click()

o = driver.find_element_by_xpath("//input[@type='file']")
o.send_keys(os.getcwd()+"/tmp/caron.png")

xpath = './/span[contains(@data-icon, "send-light")]'
o = driver.find_element_by_xpath(xpath)
o.click()

Padres e hijos

Tenemos el siguiente código:

<div class="floatingWindowDiv" style="position: absolute; z-index: 1001; width: 222px; visibility: visible; top: 444px; display: block; left: 678px;">
  <div style="display: block; z-index: 400; position: static; width: 222px; left: 0px; top: 0px;" tabindex="-1">
    <div style="display: none;" class="masterMenu DropDownLoading">
      <span>Please Wait</span></div>
      <div class="masterMenu DropDownValueList" tabindex="-1" style="display: block; height: 150px;" loadingcompleted="true">
        <div tabindex="-1" class="masterMenuItem promptMenuOption" title="(All Column Values)" style="">
          <div class="promptDropdownNoBorderDiv">
            <input name="saw_263580_c_1" class="checkboxRadioButton" value="*)nqgtac(*" id="saw_263580_c_1_ck0" aria-labelledby="saw_263580_c_1_ck0_cblabel" tabindex="-1" style="" type="checkbox">
            <label class="checkboxRadioButtonLabel" for="saw_263580_c_1_ck0" id="saw_263580_c_1_ck0_cblabel">(All Column Values)</label>
          </div>
        </div>
        <div tabindex="-1" class="masterMenuItem promptMenuOption" title="NULL">
          <div class="promptDropdownNoBorderDiv">
            <input name="saw_263580_c_1" class="checkboxRadioButton" value="*)nqgtn(*" id="saw_263580_c_1_ck1" aria-labelledby="saw_263580_c_1_ck1_cblabel" tabindex="-1" type="checkbox">
            <label class="checkboxRadioButtonLabel" for="saw_263580_c_1_ck1" id="saw_263580_c_1_ck1_cblabel" style="">NULL</label>
          </div>
        </div>
        <div tabindex="-1" class="masterMenuItem promptMenuOption" title="AMA DE CASA">
          <div class="promptDropdownNoBorderDiv">
            <input name="saw_263580_c_1" class="checkboxRadioButton" value="AMA DE CASA" id="saw_263580_c_1_ck2" aria-labelledby="saw_263580_c_1_ck2_cblabel" tabindex="-1" style="" type="checkbox">
            <label class="checkboxRadioButtonLabel" for="saw_263580_c_1_ck2" id="saw_263580_c_1_ck2_cblabel">AMA DE CASA</label>
          </div>
        </div>
        <div tabindex="-1" class="masterMenuItem promptMenuOption" title="AUTONOMO CUENTA">
          <div class="promptDropdownNoBorderDiv">
            <input name="saw_263580_c_1" class="checkboxRadioButton" value="AUTONOMO CUENTA" id="saw_263580_c_1_ck3" aria-labelledby="saw_263580_c_1_ck3_cblabel" tabindex="-1" style="" type="checkbox">
            <label class="checkboxRadioButtonLabel" for="saw_263580_c_1_ck3" id="saw_263580_c_1_ck3_cblabel">AUTONOMO CUENTA</label>
          </div>
        </div>
      </div>
   </div>
</div>

Queremos hacer click en el checkbox, que es un campo input (la caja verde), pero es dinámico y no tiene el identificador. El indentificador está dentro del div que tiene el texto en title que es la caja padre en rojo

Primero seleccionamos el bloque padre, buscando por div y title que queramos:

menu_click='NULL'
menu_click='AMA DE CASA'
menu_click='CUENTA AJENA FIJO'

xpath='//div[@title="' + menu_click + '"]'
padre = driver.find_element_by_xpath(xpath)

Ahora dentro de esa caja, selecctionamos el input para hacer click

xpath2='.//input[@type="checkbox"]'
hijo = padre.find_element_by_xpath(xpath2)
hijo.click()

Si a partir de un elemento queremos seleccionar el padre:
Seleccionamos el elemento:

xpath='//label[text()="AMA DE CASA"]'
obj = driver.find_element_by_xpath(xpath)

Y a partir de ese elemento el padre:

padre=obj.find_element_by_xpath("./..")

Grabar el contenido de la web en un archivo

Texto

https://stackoverflow.com/a/50420667/2695864

En mi caso fue muy útil porque no fui capaz, ni con chrome ni con firefox, de obtener el xpath de un popup que al hacer click en cualquier parte se cerraba.

html = driver.execute_script("return document.body.innerHTML;")
with open("login.html","w") as f:
    f.write(html)

Captura de pantalla

fichero = f"{datetime.datetime.now():%Y%m%d_%H%M%S}"
driver.set_window_size(1080,1800)
driver.save_screenshot("captura.png")
driver.save_screenshot("captura_" + fichero + ".png")

Legido Wiki

Table of Contents