Newest 'web-crawler' Questions

-2 votes

0 answers

14 views

Advice on Efficiently Tracking Product Data Updates (Python Web Scraping)

I’m working on a Python-based web scraping project that involves tracking updates to product table fields such as: { { price_history, chat_count_history, like_count_history, view_count_history, ...

Zaki Mohammed

1

asked 8 hours ago

0 votes

0 answers

31 views

crawling with selenium, don't operate pagination

I execute crawling for collect data. everything is well without pagination. below code is problem code. i need your help. when i operate code -> maybe 30sec later appear error message ...

손승운

11

asked Jan 3 at 2:36

0 votes

1 answer

36 views

Web Crawling in .net 8.0 for angular website

I want to crawl angular website - https://v16.angular.io/docs I have written a code for this var playwright = await Playwright.CreateAsync(); var browser = await playwright.Chromium.LaunchAsync(new ...

Vinit Mapari

439

asked Dec 27, 2024 at 9:08

0 votes

0 answers

50 views

Scrapy crawl a website and lose some items

Here is my main spider class below import scrapy from xxx.items import WorkItem class XXXSpider(scrapy.Spider): name = "xxx" allowed_domains = ["example.com"] ...

Windy418

13

asked Dec 24, 2024 at 1:23

0 votes

1 answer

59 views

scrape the html page after click on a div tag using BeautifulSoup

I got some troubles when scraping the questions and answers from websites: https://tech12h.com/bai-hoc/trac-nghiem-lich-su-12-bai-1-su-hinh-thanh-trat-tu-gioi-moi-sau-chien-tranh-gioi-thu-hai The ...

Dinosaur

5

asked Dec 20, 2024 at 6:25

0 votes

0 answers

21 views

Simple crawler to scrape ICD-11 database using API requests

I tried to make this simple crawler to crawl down the entire ICD-11 database (https://icd.who.int/browse/2024-01/foundation/en#455013390) and collect all the titles and descriptions of all diseases, ...

Legion

474

asked Dec 15, 2024 at 0:17

1 vote

0 answers

28 views

Run spider programatically integrated with a crawl lib

I have the following code that runs a spider programatically: import asyncio from scrapy.crawler import CrawlerProcess from scrapy_webcrawler.spiders.spider import WebCrawlerSpider class ...

Luis Ferreira

41

asked Dec 12, 2024 at 19:00

0 votes

0 answers

36 views

The specified selector is not loading in Apify

I'm building an Apify scraper to target transaction data on a dynamic webpage. The table containing these transactions loads asynchronously via AJAX, taking under 30 seconds. This is the page when ...

avall

1

asked Dec 12, 2024 at 15:56

-1 votes

1 answer

38 views

Python Script to crawl ADO Project for specific file and download it

I am trying to create a python script that will crawl Azure DevOps project for a file, and download it locally. However, I'm running into an issue where making the request to download the file isn't ...

Devin York DJ

55

asked Dec 10, 2024 at 18:10

0 votes

0 answers

40 views

How to crawl data using Selenium and split cell values in Google Sheets

Currently, I am retrieving all the <td> values and saving them to Google Sheets For the page to be crawled, the date and rank are located as td > div > div > span, b So, when I ...

빵슈크림

9

asked Nov 27, 2024 at 10:29

0 votes

0 answers

64 views

Scrapy: Preventing Data Persistence and Cross-Request Contamination

Intro I've edited the post to simplify and clarify the content and included the proposed solutions. All issues were resolved using download delays and dupefilter (as suggested by @wRAR). However, with ...

Isidre

1

asked Nov 6, 2024 at 22:03

0 votes

0 answers

15 views

Bing Search Engine Indexing

Have anyone encountered with the problem and fixed the issue where Bing search engine shows wrong Website name in the header of the search result? How to fix this issue? Google Chrome shows it right, ...

MSBOT Mongolia

1

asked Oct 31, 2024 at 7:54

0 votes

1 answer

66 views

Querying athena aws the right way

i get a time out queriing https://commoncrawl.org/overview data with athena ... and if it succeed it will cost me 1000$ each query ... 5$ for each TB with 200 TB (?) ... actually too much This is, ...

fass33443423

117

asked Oct 20, 2024 at 6:28

0 votes

0 answers

42 views

How can I keep Xvfb screen is kept alive for crawler that is using chrome and selenium webdriver after ssh session is closed

I made a crawler program with selenium webdriver and chrome. The target website blocks chrome headless mode, so I needed to use Xvfb. The crawler works on AWS EC2 (Amazon linux2023). While SSH session ...

tsuneo

15

asked Oct 18, 2024 at 13:07

0 votes

1 answer

76 views

Scraping/Crawling a website with multiple tabs using python

I am seeking assistance in extracting data from a website with multiple tabs and saving it in a .csv format using Python and Selenium. The website in question is: https://www.amfiindia.com/research-...

Starlord22

159

asked Oct 15, 2024 at 11:05

Collectives™ on Stack Overflow

Advice on Efficiently Tracking Product Data Updates (Python Web Scraping)

crawling with selenium, don't operate pagination

Web Crawling in .net 8.0 for angular website

Scrapy crawl a website and lose some items

scrape the html page after click on a div tag using BeautifulSoup

Simple crawler to scrape ICD-11 database using API requests

Run spider programatically integrated with a crawl lib

The specified selector is not loading in Apify

Python Script to crawl ADO Project for specific file and download it

How to crawl data using Selenium and split cell values in Google Sheets

Scrapy: Preventing Data Persistence and Cross-Request Contamination

Bing Search Engine Indexing

Querying athena aws the right way

How can I keep Xvfb screen is kept alive for crawler that is using chrome and selenium webdriver after ssh session is closed

Scraping/Crawling a website with multiple tabs using python

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags