Data Engineering/python

EP19 | 고급 Python 활용 #8 | 웹 스크래핑 (BeautifulSoup, Selenium)

ygtoken 2025. 3. 19. 23:05

728x90

이 글에서 다루는 개념

웹 스크래핑(Web Scraping)은 웹사이트에서 데이터를 자동으로 수집하는 기술입니다.
이번 글에서는 다음 내용을 학습합니다.

웹 스크래핑 개념과 원리
BeautifulSoup을 사용한 HTML 파싱
Selenium을 사용한 동적 웹 데이터 크롤링
데이터 수집 후 CSV 파일로 저장

1️⃣ 웹 스크래핑(Web Scraping)이란?

📌 웹 스크래핑이란?

웹사이트에서 HTML 데이터를 가져와 필요한 정보를 추출하는 기술
뉴스 기사, 주식 데이터, 상품 가격 비교 등에 활용 가능

📌 웹 스크래핑 방식

방식	설명
requests + BeautifulSoup	정적(Static) 웹페이지에서 데이터 추출
Selenium	JavaScript로 생성된 동적(Dynamic) 웹페이지 처리 가능

2️⃣ BeautifulSoup을 사용한 HTML 데이터 추출

📌 설치 (BeautifulSoup과 requests)

pip install beautifulsoup4 requests

🔹 HTML 데이터 가져오기

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)

# HTML 파싱
soup = BeautifulSoup(response.text, "html.parser")

# 페이지 제목 가져오기
print(soup.title.text)

📌 출력 예시

Example Domain

🔹 특정 태그의 텍스트 가져오기

# <h1> 태그의 텍스트 추출
h1_text = soup.find("h1").text
print(h1_text)

🔹 모든 링크(<a> 태그) 가져오기

links = soup.find_all("a")

for link in links:
    print(link.get("href"))

📌 출력 예시

https://www.iana.org/domains/example

3️⃣ Selenium을 사용한 동적 웹페이지 크롤링

📌 설치 (Selenium과 웹드라이버)

pip install selenium

📌 Chrome 웹드라이버 다운로드

ChromeDriver 다운로드
다운로드 후 실행 파일을 Python 코드에서 사용해야 함

🔹 Selenium을 사용하여 웹페이지 열기

from selenium import webdriver

# 웹드라이버 실행 (Chrome 사용)
driver = webdriver.Chrome()

# 웹페이지 열기
driver.get("https://example.com")

# 페이지 제목 출력
print(driver.title)

# 브라우저 종료
driver.quit()

🔹 특정 요소 찾기 (find_element())

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

# 특정 태그 찾기
element = driver.find_element(By.TAG_NAME, "h1")
print(element.text)

driver.quit()

🔹 입력 상자에 텍스트 입력하고 버튼 클릭하기

from selenium.webdriver.common.keys import Keys

search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Python")
search_box.send_keys(Keys.RETURN)  # 엔터 입력

4️⃣ 웹 스크래핑 데이터 CSV 파일로 저장

📌 CSV 파일 저장 (pandas 활용)

pip install pandas

import pandas as pd

data = {"Title": ["Example Domain"], "URL": ["https://example.com"]}
df = pd.DataFrame(data)

df.to_csv("output.csv", index=False)

📌 CSV 파일 내용 (output.csv)

Title,URL
Example Domain,https://example.com

📌 실전 문제: 웹 스크래핑 연습하기

✅ 문제 1: requests와 BeautifulSoup을 사용하여 웹페이지 제목 가져오기

📌 https://example.com의 제목을 가져와 출력하세요.

import requests
from bs4 import BeautifulSoup
# 🔽 여기에 코드 작성

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")
print(soup.title.text)

✅ 문제 2: BeautifulSoup을 사용하여 모든 링크 가져오기

📌 웹페이지의 모든 <a> 태그에서 href 값을 출력하세요.

# 🔽 여기에 코드 작성

links = soup.find_all("a")

for link in links:
    print(link.get("href"))

✅ 문제 3: Selenium을 사용하여 Google 검색 실행하기

📌 Google에 "Python"을 검색하고 결과 페이지를 출력하세요.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

# 🔽 여기에 코드 작성

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("https://www.google.com")

search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Python")
search_box.send_keys(Keys.RETURN)

print(driver.title)
driver.quit()

✅ 문제 4: 웹 데이터를 CSV 파일로 저장하기

📌 웹페이지에서 가져온 데이터를 output.csv로 저장하세요.

import pandas as pd
# 🔽 여기에 코드 작성

import pandas as pd

data = {"Title": ["Example Domain"], "URL": ["https://example.com"]}
df = pd.DataFrame(data)

df.to_csv("output.csv", index=False)

728x90

저작자표시 비영리 변경금지 (새창열림)

'Data Engineering > python' 카테고리의 다른 글

EP21 \| 고급 Python 활용 #10 \| Spark와 Python을 활용한 대용량 데이터 처리 (0)	2025.03.19
EP20 \| 고급 Python 활용 #9 \| 데이터 자동화 및 작업 스케줄링 (Airflow) (0)	2025.03.19
EP18 \| 고급 Python 활용 #7 \| API 데이터 활용 (REST API, JSON 처리) (0)	2025.03.19
EP17 \| 고급 Python 활용 #6 \| Pandas를 활용한 고급 데이터 분석 (0)	2025.03.19
EP16 \| 고급 Python 활용 #5 \| 데이터 처리와 Pandas 기본 사용법 (0)	2025.03.19

현재글EP19 | 고급 Python 활용 #8 | 웹 스크래핑 (BeautifulSoup, Selenium)

YG Tech Blog

A blog about IT, covering topics from cloud computing and DevOps to Kubernetes and system architecture. Sharing insights, solutions, and best practices for modern IT professionals

파이썬, 쿠버네티스, Minio, DaemonSet, CI/CD, argocd, Security, 서비스_운영, Istio, DevOps, kubernetes, Cilium, gitops, Python, YAML, statefulset, 서비스메시, RAG, k8s, langchain,

Today :
Yesterday :

YG Tech Blog

EP19 | 고급 Python 활용 #8 | 웹 스크래핑 (BeautifulSoup, Selenium)

이 글에서 다루는 개념

1️⃣ 웹 스크래핑(Web Scraping)이란?

2️⃣ BeautifulSoup을 사용한 HTML 데이터 추출

🔹 HTML 데이터 가져오기

🔹 특정 태그의 텍스트 가져오기

🔹 모든 링크(<a> 태그) 가져오기

3️⃣ Selenium을 사용한 동적 웹페이지 크롤링

🔹 Selenium을 사용하여 웹페이지 열기

🔹 특정 요소 찾기 (find_element())

🔹 입력 상자에 텍스트 입력하고 버튼 클릭하기

4️⃣ 웹 스크래핑 데이터 CSV 파일로 저장

📌 실전 문제: 웹 스크래핑 연습하기

✅ 문제 1: requests와 BeautifulSoup을 사용하여 웹페이지 제목 가져오기

✅ 문제 2: BeautifulSoup을 사용하여 모든 링크 가져오기

✅ 문제 3: Selenium을 사용하여 Google 검색 실행하기

✅ 문제 4: 웹 데이터를 CSV 파일로 저장하기

'Data Engineering > python' 카테고리의 다른 글

'Data Engineering/python'의 다른글

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

EP19 | 고급 Python 활용 #8 | 웹 스크래핑 (BeautifulSoup, Selenium)

이 글에서 다루는 개념

1️⃣ 웹 스크래핑(Web Scraping)이란?

2️⃣ BeautifulSoup을 사용한 HTML 데이터 추출

🔹 HTML 데이터 가져오기

🔹 특정 태그의 텍스트 가져오기

🔹 모든 링크(<a> 태그) 가져오기

3️⃣ Selenium을 사용한 동적 웹페이지 크롤링

🔹 Selenium을 사용하여 웹페이지 열기

🔹 특정 요소 찾기 (find_element())

🔹 입력 상자에 텍스트 입력하고 버튼 클릭하기

4️⃣ 웹 스크래핑 데이터 CSV 파일로 저장

📌 실전 문제: 웹 스크래핑 연습하기

✅ 문제 1: requests와 BeautifulSoup을 사용하여 웹페이지 제목 가져오기

✅ 문제 2: BeautifulSoup을 사용하여 모든 링크 가져오기

✅ 문제 3: Selenium을 사용하여 Google 검색 실행하기

✅ 문제 4: 웹 데이터를 CSV 파일로 저장하기

'Data Engineering > python' 카테고리의 다른 글

'Data Engineering/python'의 다른글

관련글

티스토리툴바