[Web Crawling] Scrapy to S3

less than 1 minute read


Scrapy to S3

  • Install Packages
  • settings.py

Install Packages

pip3 install boto3
pip3 install scrapy-s3pipeline[s3]

settings.py

  • settings.py에 아래 항목 추가
  • AWS Key
AWS_ACCESS_KEY_ID = "AWS_ACCESS_KEY_ID"
AWS_SECRET_ACCESS_KEY = "AWS_SECRET_ACCESS_KEY"

ITEM_PIPELINES = {'s3pipeline.S3Pipeline': 100,  # Add this line.}

S3PIPELINE_URL = 's3://my-bucket/{name}/{time}/items.{chunk:07d}.jl.gz'

ref