Extracting Dynamic Content from an iFrame with Selenium in Python
Accessing content inside an iFrame can be tricky, especially when the content is loaded dynamically. In this blog post, we’ll walk through an example of how to navigate an iFrame, click on an interactive tab, and save the loaded content to a file using Selenium in Python. This example is particularly useful when dealing with web applications that embed their core data or statistics in an iFrame, like in a sports stats dashboard.
Overview of the Problem
For our example, we’re looking at a page containing an iFrame with embedded player statistics. To fully capture this content, we’ll:
- Access the main page containing the iFrame.
- Switch into the iFrame to interact with its content.
- Click on a tab within the iFrame to reveal the desired information.
- Extract the HTML content and save it locally.
Implementing the Code
Here’s the code we’ll use, including detailed explanations of each part.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
# Function to initialize the Chrome WebDriver
def setup_driver():
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.maximize_window()
return driver
# Function to access iFrame content, click on a tab, and save it
def fetch_iframe_content():
driver = setup_driver()
try:
# Step 1: Open the page containing the iFrame
url = 'https://estadisticascabb.gesdeportiva.es/partido/jYYXLuyG3WYVnbWC_RER9A==?a=1'
driver.get(url)
# Step 2: Wait until the iFrame loads
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, 'iframe')))
# Step 3: Switch to the iFrame
iframe = driver.find_element(By.TAG_NAME, 'iframe')
driver.switch_to.frame(iframe)
# Step 4: Click on the 'Statistics' tab within the iFrame
stats_tab = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "li.pestana-partido.pestana-estadisticas"))
)
stats_tab.click()
# Step 5: Wait for the stats content to load within the tab
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "table.tabla-estadisticas"))
)
# Step 6: Get the loaded page source within the iFrame
page_source = driver.page_source
# Step 7: Save the content to a file
with open("output.txt", "w", encoding='utf-8') as file:
file.write(page_source)
print("The iFrame content has been saved in 'output.txt'.")
finally:
# Close the driver
driver.quit()
# Run the function
if __name__ == "__main__":
fetch_iframe_content()
Code Breakdown
- Setup the Driver:
setup_driver()initializes the Chrome WebDriver usingwebdriver-managerto ensure that the ChromeDriver is correctly installed and configured. - Access the iFrame: We load the main page with
driver.get()and wait for the iFrame to load withWebDriverWait. Using an explicit wait ensures that the code waits until the element we’re interacting with is ready. - Switch to the iFrame: After locating the iFrame element, we switch the WebDriver’s focus into it, allowing us to interact with elements inside.
- Interact with the Tab: We use
WebDriverWaitandexpected_conditionsto wait until the desired tab is clickable. After it’s clicked, we add another wait to ensure the statistics content is fully loaded. - Extract and Save Content: With the page source loaded inside the iFrame, we save it to an HTML file, allowing us to inspect or parse it further.
Additional Tips and Considerations
- Avoid Using Time-Based Waits: In many cases,
time.sleep()isn’t ideal since it pauses for a fixed duration. Instead,WebDriverWaitchecks until the specific element appears, making the process more efficient. - Dynamic Content: If the content still doesn’t load as expected, you may need to identify a unique element within the statistics area and wait for it to appear. This ensures that you’re capturing the full data.
Potential Use Cases
This approach is ideal for scraping data from embedded content in web pages:
- Sports Statistics: Websites often embed player stats and game results in iFrames.
- Financial Dashboards: Interactive financial data or stock dashboards may be embedded and require specific access to extract.
Conclusion
Selenium makes it possible to interact with complex web structures like iFrames, allowing us to retrieve dynamic content. By integrating waits and switching between frames, we’ve created a solution that captures the data we need. This guide should serve as a strong foundation for similar projects involving embedded web content.