Skip to main content
-2 votes
0 answers
14 views

Advice on Efficiently Tracking Product Data Updates (Python Web Scraping)

I’m working on a Python-based web scraping project that involves tracking updates to product table fields such as: { { price_history, chat_count_history, like_count_history, view_count_history, ...
Zaki Mohammed's user avatar
0 votes
0 answers
31 views

crawling with selenium, don't operate pagination

I execute crawling for collect data. everything is well without pagination. below code is problem code. i need your help. when i operate code -> maybe 30sec later appear error message ...
손승운's user avatar
0 votes
1 answer
36 views

Web Crawling in .net 8.0 for angular website

I want to crawl angular website - https://v16.angular.io/docs I have written a code for this var playwright = await Playwright.CreateAsync(); var browser = await playwright.Chromium.LaunchAsync(new ...
Vinit Mapari's user avatar
0 votes
0 answers
50 views

Scrapy crawl a website and lose some items

Here is my main spider class below import scrapy from xxx.items import WorkItem class XXXSpider(scrapy.Spider): name = "xxx" allowed_domains = ["example.com"] ...
Windy418's user avatar
0 votes
1 answer
59 views

scrape the html page after click on a div tag using BeautifulSoup

I got some troubles when scraping the questions and answers from websites: https://tech12h.com/bai-hoc/trac-nghiem-lich-su-12-bai-1-su-hinh-thanh-trat-tu-gioi-moi-sau-chien-tranh-gioi-thu-hai The ...
Dinosaur's user avatar
0 votes
0 answers
21 views

Simple crawler to scrape ICD-11 database using API requests

I tried to make this simple crawler to crawl down the entire ICD-11 database (https://icd.who.int/browse/2024-01/foundation/en#455013390) and collect all the titles and descriptions of all diseases, ...
Legion's user avatar
  • 474
1 vote
0 answers
28 views

Run spider programatically integrated with a crawl lib

I have the following code that runs a spider programatically: import asyncio from scrapy.crawler import CrawlerProcess from scrapy_webcrawler.spiders.spider import WebCrawlerSpider class ...
Luis Ferreira's user avatar
0 votes
0 answers
36 views

The specified selector is not loading in Apify

I'm building an Apify scraper to target transaction data on a dynamic webpage. The table containing these transactions loads asynchronously via AJAX, taking under 30 seconds. This is the page when ...
avall's user avatar
  • 1
-1 votes
1 answer
38 views

Python Script to crawl ADO Project for specific file and download it

I am trying to create a python script that will crawl Azure DevOps project for a file, and download it locally. However, I'm running into an issue where making the request to download the file isn't ...
Devin York DJ's user avatar
0 votes
0 answers
40 views

How to crawl data using Selenium and split cell values in Google Sheets

Currently, I am retrieving all the <td> values and saving them to Google Sheets For the page to be crawled, the date and rank are located as td > div > div > span, b So, when I ...
빵슈크림's user avatar
0 votes
0 answers
64 views

Scrapy: Preventing Data Persistence and Cross-Request Contamination

Intro I've edited the post to simplify and clarify the content and included the proposed solutions. All issues were resolved using download delays and dupefilter (as suggested by @wRAR). However, with ...
Isidre's user avatar
  • 1
0 votes
0 answers
15 views

Bing Search Engine Indexing

Have anyone encountered with the problem and fixed the issue where Bing search engine shows wrong Website name in the header of the search result? How to fix this issue? Google Chrome shows it right, ...
MSBOT Mongolia's user avatar
0 votes
1 answer
66 views

Querying athena aws the right way

i get a time out queriing https://commoncrawl.org/overview data with athena ... and if it succeed it will cost me 1000$ each query ... 5$ for each TB with 200 TB (?) ... actually too much This is, ...
fass33443423's user avatar
0 votes
0 answers
42 views

How can I keep Xvfb screen is kept alive for crawler that is using chrome and selenium webdriver after ssh session is closed

I made a crawler program with selenium webdriver and chrome. The target website blocks chrome headless mode, so I needed to use Xvfb. The crawler works on AWS EC2 (Amazon linux2023). While SSH session ...
tsuneo's user avatar
  • 15
0 votes
1 answer
76 views

Scraping/Crawling a website with multiple tabs using python

I am seeking assistance in extracting data from a website with multiple tabs and saving it in a .csv format using Python and Selenium. The website in question is: https://www.amfiindia.com/research-...
Starlord22's user avatar

15 30 50 per page
1
2 3 4 5
645