๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ’ป IT·๊ธฐ์ˆ ·ํ†ต๊ณ„

[์›นํฌ๋กค๋ง 2ํƒ„] selenium webdriver๋ฅผ ํ™œ์šฉํ•œ ์ƒํ’ˆ ๋ฆฌ์ŠคํŠธ ํฌ๋กค๋ง

by nowgeun 2023. 1. 29.
728x90

์›นํฌ๋กค๋ง 1ํƒ„์—์„œ requests์™€ bs4์„ ์ด์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ์›น์‚ฌ์ดํŠธ ํฌ๋กค๋ง ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค๋ค˜์Šต๋‹ˆ๋‹ค.

 

[์›นํฌ๋กค๋ง 1ํƒ„] requests์™€ bs4๋ฅผ ํ™œ์šฉํ•œ ๋ ˆ์‹œํ”ผ ๋ชฉ๋ก ํฌ๋กค๋ง

ํšŒ์‚ฌ์—์„œ ํ˜น์€ ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•ด ์˜ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ข…์ข… ์žˆ์Šต๋‹ˆ๋‹ค. ์ œํ’ˆ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค๋˜์ง€, ์ตœ๊ทผ 2๋…„ ์น˜์˜ ๊ฒฝ์ œ ๋‰ด์Šค ๊ธฐ์‚ฌ๋ฅผ ๋ชจ์€๋‹ค๋˜์ง€ ๋“ฑ ์›น์‚ฌ์ดํŠธ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ

jakely.tistory.com

 

์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ข€ ๋” ๊ตฌ์กฐ๊ฐ€ ๋ณต์žกํ•˜๊ณ  Javascript๋ฅผ ํฌํ•จํ•œ ๋” ์–ด๋ ค์šด ์›น์‚ฌ์ดํŠธ ๊ตฌ์กฐ๋ฅผ selenium webdriver๋ฅผ ํ™œ์šฉํ•ด์„œ ํฌ๋กค๋งํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 

 

์ตœ๊ทผ์— ์นœํ• ์•„๋ฒ„์ง€๊ป˜์„œ ์ œ๊ฒŒ ์‚ฌ๋ฌด์šฉ์œผ๋กœ ์ €๋ ดํ•œ ๋…ธํŠธ๋ถ์„ ์•Œ์•„๋ด๋‹ฌ๋ผ๊ณ  ํ•˜์…จ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์˜ค๋Š˜์€ ์ œ๊ฐ€ ์ปดํ“จํ„ฐ ๋ฐ IT ๊ด€๋ จ ์ œํ’ˆ์„ ๊ตฌ๋งคํ•  ๋•Œ ์• ์šฉํ•˜๋Š” ๋‹ค๋‚˜์™€ ์›น์‚ฌ์ดํŠธ์—์„œ 20~40๋งŒ์› ๋Œ€์˜ ์ €๋ ดํ•œ ๋…ธํŠธ๋ถ ๋ชฉ๋ก์„ ํฌ๋กค๋งํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


์‚ฌ์ „ ์ค€๋น„

selenium ํŒจํ‚ค์ง€์˜ webdriver ๋ชจ๋“ˆ์€ ์›น๋ธŒ๋ผ์šฐ์ €๋ฅผ ์ง์ ‘ ์‹คํ–‰ํ•˜๋˜, python์œผ๋กœ ๋ช…๋ น์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค. ํŒŒ์ด์–ดํญ์Šค, ํฌ๋กฌ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ํฌ๋กฌ์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 

์šฐ์„ ์€ webdriver์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กœ๊ทธ๋žจ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์›น์‚ฌ์ดํŠธ์—์„œ ํฌ๋กฌ ๋ฒ„์ „์— ๋งž๋Š” chromedriver.exe๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. (ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € ๋˜ํ•œ ์„ค์น˜๊ฐ€ ๋˜์–ด์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค)

 

Chrome ์›น๋ธŒ๋ผ์šฐ์ €

๋”์šฑ ์Šค๋งˆํŠธํ•ด์ง„ Google๋กœ ๋” ๊ฐ„ํŽธํ•˜๊ณ  ์•ˆ์ „ํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ.

www.google.com

 

ChromeDriver - WebDriver for Chrome - Downloads

Current Releases If you are using Chrome version 110, please download ChromeDriver 110.0.5481.30 If you are using Chrome version 109, please download ChromeDriver 109.0.5414.74 If you are using Chrome version 108, please download ChromeDriver 108.0.5359.71

chromedriver.chromium.org

๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰์œผ๋กœ selenium ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ด ์ฃผ์‹œ๋ฉด ๋ชจ๋“  ์ค€๋น„๋Š” ๋์ด ๋‚ฉ๋‹ˆ๋‹ค.

pip install selenium

์†Œ์Šค์ฝ”๋“œ 

[chromedriver๋กœ ์ž๋™ํ™”๋œ ์›น๋ธŒ๋ผ์šฐ์ € ํ™œ์„ฑํ™”]

์‚ฌ์ „์ค€๋น„ ๋•Œ ๋‹ค์šด๋กœ๋“œํ•œ chromedriver์˜ ๊ฒฝ๋กœ๋ฅผ ์•„๋ž˜์— ์ž…๋ ฅํ•˜๊ณ  ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ์ž๋™ํ™”๋œ ์›น ๋ธŒ๋ผ์šฐ์ €๊ฐ€ ํŒ์—…์ด ๋˜๋ฉด์„œ ๋‹ค๋‚˜์™€ ์›น์‚ฌ์ดํŠธ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. 

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import time
import pandas as pd

driver_path = '/๋‚˜์˜ ๊ฒฝ๋กœ/chromedriver.exe'
driver = webdriver.Chrome(driver_path)

url = 'https://www.danawa.com/'

driver.get(url)

webdriver์„ ์‚ฌ์šฉํ•˜๋ฉด URL์ฐฝ ํ•˜๋‹จ์— "Chrome์ด ์ž๋™ํ™”๋œ ํ…Œ์ŠคํŠธ ์†Œํ”„ํŠธ์›จ์–ด์— ์˜ํ•ด ์ œ์–ด๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค" ๋ผ๊ณ  ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

 

[๊ฒ€์ƒ‰ ์„ค์ • ๋ฐ ํฌ๋กค๋ง ์ค€๋น„]

์•„๋ž˜์˜ ์ฝ”๋“œ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋‹ค๋‚˜์™€ ์›น์‚ฌ์ดํŠธ ๊ฒ€์ƒ‰์ฐฝ์— "๋…ธํŠธ๋ถ" ๊ฒ€์ƒ‰์–ด๋กœ ๊ฒ€์ƒ‰์„ ํ•œ ๋’ค, ์•„๋ž˜ ํ•„ํ„ฐ์— ์ตœ์†Œ๊ธˆ์•ก 20๋งŒ ์›, ์ตœ๋Œ€๊ธˆ์•ก 40๋งŒ ์›์„ ์„ค์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•œ ๋ฒˆ์— ๋ณด์ด๋Š” ๊ฒ€์ƒ‰๊ฒฐ๊ณผ๋ฅผ 30๊ฐœ์—์„œ 90๊ฐœ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ํฌ๋กค๋งํ•  ๋•Œ ํŽ˜์ด์ง€ ์ˆ˜๋ฅผ ์กฐ๊ธˆ ๋” ์ ๊ฒŒ ๋„˜๊ธธ ์ˆ˜ ์žˆ๊ฒŒ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. 

 

## ์›น ๋ธŒ๋ผ์šฐ์ง• ์ž๋™ํ™”
def search_item(product, lower, upper):    
    #๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ
    driver.find_element(By.CLASS_NAME, 'search__box').find_element(By.TAG_NAME, 'input').send_keys(product)
    #๊ฒ€์ƒ‰ ๋ฒ„ํŠผ 
    driver.find_element(By.XPATH, '//*[@id="srchFRM_TOP"]/fieldset/div[1]/button').click()
    #์ตœ์ € ๊ฐ€๊ฒฉ ์„ค์ •
    driver.find_element(By.XPATH, '//*[@id="priceRangeMinPrice"]').send_keys(lower)
    #์ตœ๊ณ  ๊ฐ€๊ฒฉ ์„ค์ •
    driver.find_element(By.XPATH, '//*[@id="priceRangeMaxPrice"]').send_keys(upper)
    #๊ฐ€๊ฒฉ ํ•„ํ„ฐ ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ
    driver.find_element(By.XPATH, '//*[@id="productListArea"]/div[3]/div[2]/div[1]/button').click()


product = '๋…ธํŠธ๋ถ'
lower = '200000' #20๋งŒ์›
upper = '400000' #40๋งŒ์›
search_item(product, lower, upper)

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ (2์ดˆ)
time.sleep(2)

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ 90๊ฐœ์”ฉ ๋ณด๊ธฐ
driver.find_element(By.XPATH, '//*[@id="productListArea"]/div[2]/div[2]/div[2]/select').click()
driver.find_element(By.CSS_SELECTOR, '#productListArea > div.prod_list_opts > div.view_opt > div.view_item.view_qnt > select > option:nth-child(3)').click()

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ (2์ดˆ)
time.sleep(2)

๊ธˆ์•ก ๋ฒ”์œ„ ์„ค์ • ๋ฐ ๋ชฉ๋ก ๊ฐœ์ˆ˜๊นŒ์ง€ ์„ค์ •๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

[๋ฐ์ดํ„ฐ ํฌ๋กค๋ง]

webdriver๋กœ ํฌ๋กค๋งํ•  ๋•Œ ์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ค์–‘ํ•ฉ๋‹ˆ๋‹ค. (์ƒ์„ธ ๊ฐ€์ด๋“œ๋Š” ๊ณต์‹ Documentation ์ฐธ๊ณ )

  • TAG_NAME
  • CSS_SELECTOR
  • ID
  • XPATH
  • LINK_TEXT
  • PARTIAL_LINK_TEXT
  • NAME
  • CLASS_NAME

๋ธŒ๋ผ์šฐ์ €์˜ html ์ฝ”๋“œ๋ฅผ ์‚ดํ”ผ๋ฉด์„œ ์–ด๋–ป๊ฒŒ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๊ฒƒ ์ธ์ง€ ๊ฐ์„ ๊ธธ๋Ÿฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. 

์•„๋ž˜์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๊ฐœ๋ฐœ์ž๋„๊ตฌ์™€ ์†Œ์Šค๊ฒ€์‚ฌ๋ฅผ ํ™œ์šฉํ•ด์„œ ์ œํ’ˆ ๋ชฉ๋ก์ด ์œ„์น˜ํ•œ HTML ํƒœ๊ทธ๊ฐ€ <ul class='product_list'>์ธ ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น HTML์˜ ์œ„์น˜๋ฅผ ์ฐพ์„ ๋•Œ๋Š” `find_element(By.CLASS_NAME, 'product_list')` ์™€ ๊ฐ™์ด class name์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. 

์ตœ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐ ์š”์†Œ๋“ค์„ ์ฐพ์œผ๋ ค๊ณ  ๋…ธ๋ ฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ œ๊ฐ€ ์–ด๋–ค ๋ฐฉ๋ฒ•์œผ๋กœ ์ œํ’ˆ๋ช…, ์ œํ’ˆ๊ฐ€๊ฒฉ, ์ œํ’ˆ์„ค๋ช…, ๋‹ค์Œ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€๋Š” ๋ฒ„ํŠผ ๋“ฑ์„ ์ฐพ์•˜๋Š”์ง€ ์ฝ”๋“œ๋กœ ํ•œ๋ฒˆ ๋ณด์‹œ๊ณ , ํ•ด๋‹น ๋ถ€๋ถ„์˜ HTML์„ ๋Œ€์กฐํ•ด ๋ณด์‹œ๋ฉด ์ดํ•ด๊ฐ€ ์ˆ˜์›”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

## ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํฌ๋กค๋ง
result_data = []

# ๊ฐ ์•„์ดํ…œ์˜ ์ •๋ณด ์ฐพ์•„์„œ ์ €์žฅ
def get_product_info(e):
    #์ œํ’ˆ๋ช…
    product_name = e.find_element(By.CLASS_NAME, 'prod_name').find_element(By.TAG_NAME, 'a').text.strip()
    #์ œํ’ˆ๋ช…์— ๋ถ™์€ ์ œํ’ˆ๋งํฌ
    product_link = e.find_element(By.CLASS_NAME, 'prod_name').find_element(By.TAG_NAME, 'a').get_attribute('href')
    #์ œํ’ˆ์„ค๋ช… ๋ฐ ์ŠคํŽ™
    product_spec = e.find_element(By.CLASS_NAME, 'prod_spec_set').text.strip()
    #์ œํ’ˆ ๊ฐ€๊ฒฉ๋ฆฌ์ŠคํŠธ
    product_pricelist = e.find_element(By.CLASS_NAME, 'prod_pricelist').find_elements(By.TAG_NAME, 'li')
    #์ œํ’ˆ ๊ฐ€๊ฒฉ๋“ค
    product_prices = [pp.find_element(By.CLASS_NAME, 'price_sect').text.split(" ")[0].strip() for pp in product_pricelist]
    #์ œํ’ˆ ๊ฐ€๊ฒฉ ์˜†์˜ ๋ฉ”๋ชจ๋ฆฌ ์ •๋ณด
    product_mems = [pp.find_element(By.CLASS_NAME, 'memory_sect').text.strip() for pp in product_pricelist]
    
    #์œ„์—์„œ ์ฐพ์€ ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ํ•ฉ์น˜๊ธฐ
    data = list(zip([product_name]*len(product_prices),
                       [product_spec]*len(product_prices),
                        [product_link]*len(product_prices),
                       product_prices,
                       product_mems))
    return data

# ๊ฒ€์ƒ‰๋œ ํŽ˜์ด์ง€์ˆ˜๋ฅผ ๋„˜๊ธฐ๋ฉฐ ํฌ๋กค๋ง ์ง„ํ–‰
for i in range(2,9999):
    # ๊ฒ€์ƒ‰๋œ ์•„์ดํ…œ ๋ฆฌ์ŠคํŠธ
    
    result_list = WebDriverWait(driver, 10).until(
    lambda x: x.find_element(By.CLASS_NAME, 
                             'product_list').find_elements(By.XPATH, "//li[starts-with(@id, 'productItem')]"))
    
    for r in result_list:
        result_data.extend(get_product_info(r))
        
    time.sleep(1) # 1์ดˆ ์—ฌ์œ  ์ฃผ๊ธฐ
    
    if (i-1)%10 == 0:  
        try:  #10n ํŽ˜์ด์ง€์—์„œ ๋„˜์–ด๊ฐ€๋Š” ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ์ผ๋•Œ
            driver.find_element(By.CLASS_NAME,'nav_next').click()  #๋‹ค์Œ๋ฒ„ํŠผ ํด๋ฆญ
            print(f'Data Collecting on Page {i}') 
        except:
            print('No More Pages')
            break
    else:
        try:  #๋‹ค์Œ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€๊ธฐ
            driver.find_element(By.CLASS_NAME,'number_wrap').find_element(By.PARTIAL_LINK_TEXT, f'{i}').click()
            print(f'Data Collecting on Page {i}')
        except:
            print('No More Pages')
            break
    time.sleep(5) # ํŽ˜์ด์ง€ ๋กœ๋”ฉ๊นŒ์ง€ 5์ดˆ ์ •๋„ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ

# pandas dataframe์œผ๋กœ ์ตœ์ข…์ ์œผ๋กœ ์ €์žฅ
df = pd.DataFrame(result_data, columns = ['์ œํ’ˆ๋ช…','์ŠคํŽ™','๋งํฌ','๊ฐ€๊ฒฉ','๋ฉ”๋ชจ๋ฆฌ/๋น„๊ณ '])
df.to_csv('๋…ธํŠธ๋ถ์ œํ’ˆ๋ฆฌ์ŠคํŠธ.tsv', sep='\t', encoding='cp949', quotechar='"')

 

๊ฒฐ๊ณผ


๋งˆ๋ฌด๋ฆฌ ๋ฐ ์†Œ์Šค์ฝ”๋“œ ์ „์ฒด

์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” selenium ๋ฐ webdriver๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์›น ํฌ๋กค๋ง์„ ํ•ด๋ดค์Šต๋‹ˆ๋‹ค. ์˜ˆ์ „์— ์‚ฌ์šฉํ–ˆ๋˜ selenium๊ณผ ํŒจํ‚ค์ง€๊ฐ€ ์กฐ๊ธˆ ๋ฐ”๋€Œ์–ด์„œ ์‚ฌ์šฉ๋ฒ•์„ ์ตํžˆ๋Š”๋ฐ ํ•œ ์‹œ๊ฐ„ ์ •๋„ ์†Œ์š”๋ฅผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ญ์‹œ ๋ฐฐ์›€์—๋Š” ๋์ด ์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค ใ…Žใ…Ž... 

 

๊ถ๊ธˆํ•˜์‹  ์ ์ด ์žˆ๊ฑฐ๋‚˜ ์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด ๋Œ“๊ธ€ ๋ถ€ํƒ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค!

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import time
import pandas as pd

driver_path = '/Users/jake1/Desktop/chromedriver.exe'
driver = webdriver.Chrome(driver_path)

url = 'https://www.danawa.com/'

driver.get(url)

## ์›น ๋ธŒ๋ผ์šฐ์ง• ์ž๋™ํ™”
def search_item(product, lower, upper):    
    #๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ
    driver.find_element(By.CLASS_NAME, 'search__box').find_element(By.TAG_NAME, 'input').send_keys(product)
    #๊ฒ€์ƒ‰ ๋ฒ„ํŠผ 
    driver.find_element(By.XPATH, '//*[@id="srchFRM_TOP"]/fieldset/div[1]/button').click()
    #์ตœ์ € ๊ฐ€๊ฒฉ ์„ค์ •
    driver.find_element(By.XPATH, '//*[@id="priceRangeMinPrice"]').send_keys(lower)
    #์ตœ๊ณ  ๊ฐ€๊ฒฉ ์„ค์ •
    driver.find_element(By.XPATH, '//*[@id="priceRangeMaxPrice"]').send_keys(upper)
    #๊ฐ€๊ฒฉ ํ•„ํ„ฐ ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ
    driver.find_element(By.XPATH, '//*[@id="productListArea"]/div[3]/div[2]/div[1]/button').click()


product = '๋…ธํŠธ๋ถ'
lower = '200000' #20๋งŒ์›
upper = '400000' #40๋งŒ์›
search_item(product, lower, upper)

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ (2์ดˆ)
time.sleep(2)

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ 90๊ฐœ์”ฉ ๋ณด๊ธฐ
driver.find_element(By.XPATH, '//*[@id="productListArea"]/div[2]/div[2]/div[2]/select').click()
driver.find_element(By.CSS_SELECTOR, '#productListArea > div.prod_list_opts > div.view_opt > div.view_item.view_qnt > select > option:nth-child(3)').click()

# ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ (2์ดˆ)
time.sleep(2)

## ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํฌ๋กค๋ง
result_data = []

# ๊ฐ ์•„์ดํ…œ์˜ ์ •๋ณด ์ฐพ์•„์„œ ์ €์žฅ
def get_product_info(e):
    #์ œํ’ˆ๋ช…
    product_name = e.find_element(By.CLASS_NAME, 'prod_name').find_element(By.TAG_NAME, 'a').text.strip()
    #์ œํ’ˆ๋ช…์— ๋ถ™์€ ์ œํ’ˆ๋งํฌ
    product_link = e.find_element(By.CLASS_NAME, 'prod_name').find_element(By.TAG_NAME, 'a').get_attribute('href')
    #์ œํ’ˆ์„ค๋ช… ๋ฐ ์ŠคํŽ™
    product_spec = e.find_element(By.CLASS_NAME, 'prod_spec_set').text.strip()
    #์ œํ’ˆ ๊ฐ€๊ฒฉ๋ฆฌ์ŠคํŠธ
    product_pricelist = e.find_element(By.CLASS_NAME, 'prod_pricelist').find_elements(By.TAG_NAME, 'li')
    #์ œํ’ˆ ๊ฐ€๊ฒฉ๋“ค
    product_prices = [pp.find_element(By.CLASS_NAME, 'price_sect').text.split(" ")[0].strip() for pp in product_pricelist]
    #์ œํ’ˆ ๊ฐ€๊ฒฉ ์˜†์˜ ๋ฉ”๋ชจ๋ฆฌ ์ •๋ณด
    product_mems = [pp.find_element(By.CLASS_NAME, 'memory_sect').text.strip() for pp in product_pricelist]
    
    #์œ„์—์„œ ์ฐพ์€ ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ํ•ฉ์น˜๊ธฐ
    data = list(zip([product_name]*len(product_prices),
                       [product_spec]*len(product_prices),
                        [product_link]*len(product_prices),
                       product_prices,
                       product_mems))
    return data

# ๊ฒ€์ƒ‰๋œ ํŽ˜์ด์ง€์ˆ˜๋ฅผ ๋„˜๊ธฐ๋ฉฐ ํฌ๋กค๋ง ์ง„ํ–‰
for i in range(2,9999):
    # ๊ฒ€์ƒ‰๋œ ์•„์ดํ…œ ๋ฆฌ์ŠคํŠธ
    
    result_list = WebDriverWait(driver, 10).until(
    lambda x: x.find_element(By.CLASS_NAME, 
                             'product_list').find_elements(By.XPATH, "//li[starts-with(@id, 'productItem')]"))
    
    for r in result_list:
        result_data.extend(get_product_info(r))
        
    time.sleep(1) # 1์ดˆ ์—ฌ์œ  ์ฃผ๊ธฐ
    
    if (i-1)%10 == 0:  
        try:  #10n ํŽ˜์ด์ง€์—์„œ ๋„˜์–ด๊ฐ€๋Š” ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ์ผ๋•Œ
            driver.find_element(By.CLASS_NAME,'nav_next').click()  #๋‹ค์Œ๋ฒ„ํŠผ ํด๋ฆญ
            print(f'Data Collecting on Page {i}') 
        except:
            print('No More Pages')
            break
    else:
        try:  #๋‹ค์Œ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€๊ธฐ
            driver.find_element(By.CLASS_NAME,'number_wrap').find_element(By.PARTIAL_LINK_TEXT, f'{i}').click()
            print(f'Data Collecting on Page {i}')
        except:
            print('No More Pages')
            break
    time.sleep(5) # ํŽ˜์ด์ง€ ๋กœ๋”ฉ๊นŒ์ง€ 5์ดˆ ์ •๋„ ๊ธฐ๋‹ค๋ ค์ฃผ๊ธฐ

# pandas dataframe์œผ๋กœ ์ตœ์ข…์ ์œผ๋กœ ์ €์žฅ
df = pd.DataFrame(result_data, columns = ['์ œํ’ˆ๋ช…','์ŠคํŽ™','๋งํฌ','๊ฐ€๊ฒฉ','๋ฉ”๋ชจ๋ฆฌ/๋น„๊ณ '])
df.to_csv('๋…ธํŠธ๋ถ์ œํ’ˆ๋ฆฌ์ŠคํŠธ.tsv', sep='\t', encoding='cp949', quotechar='"')
๋ฐ˜์‘ํ˜•

๋Œ“๊ธ€