All Questions
Tagged with web-crawler parsing
142 questions
0
votes
1
answer
78
views
Why scrapy shell did not return an output?
I followed this tutorial because I wanted to learn web scraping.
https://www.datacamp.com/tutorial/making-web-crawlers-scrapy-python
When I got to the point of using CSS Selectors for Extraction, I ...
1
vote
0
answers
262
views
Python scraper for facebook marketplace isn't working anymore
Since yesterday, I encountered an issue where my facebook marketplace scraper ceased to fetch data, I'm currently using scrapy due to his features, am I doing any mistakes? Output has been shared on ...
-1
votes
1
answer
103
views
Scrapy cralwed 0 pages, scraped 0 item. What things should I check for the troubleshooting?
I'm trying to parse the post of this website to collect the texts for sentiment analysis. Here is the code that I'm working with.
# ~/dcscraper/dcscraper/spiders/spider.py
import scrapy
import ...
-2
votes
2
answers
104
views
How to parse HTML with ncbi nih?
import requests
import re
import pandas as pd
from bs4 import BeautifulSoup
cds_id = "NP_001339842.1"
fasta_url = ("https://www.ncbi.nlm.nih.gov/protein/%s/?report=fasta" %...
1
vote
1
answer
284
views
First Python Scrapy Web Scraper Not Working
I took the Data Camp Web Scraping with Python course and am trying to run the 'capstone' web scraper in my own environment (the course takes place in a special in-browser environment). The code is ...
0
votes
1
answer
158
views
beautifulsoup> how can I remove specific lines to get resultset with texts
*below is my html code and I hope I can only get the hd data inside of this table
I hope I can get necessary data and I would like to remove the data including "class="tltle">" ...
0
votes
0
answers
30
views
What is the proper term for recursively processing or "spidering" a JavaScript object with unknown keys?
As a mostly self-taught programmer, when I first learned Perl subroutines (like functions in other languages), it was common to write this kind of statement:
sub someFunction(@args) {
(el_1, el2, ...
0
votes
1
answer
48
views
nodejs crawler response.body vs response conversion to jquery?
i use crawler how is a builtin cheerio crawler inside my nodejs project.
my crawler start with a function like this (and like exemple in doc) :
let c = new Crawler({
maxConnections: 10,
callback: (...
1
vote
1
answer
444
views
Substring any kind of HTML String
i need to divide any kind of html code (string) to a list of tokens.
For example:
"<abc/><abc/>" #INPUT
["<abc/>", "<abc/>"] #OUTPUT
or
"<...
0
votes
1
answer
266
views
How to use Scrapy FormRequest in a loop
I'm trying to create a spider that would put the words from a list into a cite's search input one by one and then parse text from the resulting pages.
It works fine for one word, but I can't make it ...
0
votes
1
answer
1k
views
Filter Select Symfony Dom Crawler
I have html like this :
<select name="MySelect" id="MySelectID">
<option value="0">One</option>
<option value="1">Two</option&...
1
vote
2
answers
80
views
Nutch/Elastic Search terms definition
I used nutch and Elastisearch to crawl/parse 99 websites/links in order to index them in Elasicsearch so that I can use the search engine. It did crawl all the 99 websites/links but the end message I ...
0
votes
3
answers
206
views
how can i get text on web without any tag using bs4?
this is data structure
<div class = 'xxx' id = 'yyy'>
<div class id = 'zzz' class = 'kkk'>
<script type = 'bbb'>
// noise word
</script>
<strong class = '111'>...<...
1
vote
1
answer
268
views
how can i parsing some string which i want use bs4?
this is html architecture
<a href="/main/list.nhn?mode=LS2D&mid=shm&sid1=100&sid2=264" class="snb_s11 nclicks(lmn_pol.mnl,,1)">BlueHouse <span ...
0
votes
1
answer
2k
views
Symfony crawler get child nodes from getted childs in foreach?
I am working with Symfony's dom crawler.
XML looks like:
<Tours>
<Tour id="1">
<Termins>
<Termin>
...
And i have working code:
...