Skip to content
View adbar's full-sized avatar

Organizations

@deutschestextarchiv @zentrum-lexikographie

Block or report adbar

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
adbar/README.md

Hi there! πŸ‘‹

Links

⚑  Web   |   βœ  Blog   |   πŸ¦  Twitter   |   πŸŽž  Youtube   |   β˜•  Coffee

Activity

πŸ”­  Currently working on gathering texts on the Web and detecting word trends

Programming experience

πŸ–©  First programs written on a TI-83 Plus in TI-BASIC

Top Langs


Most popular blog posts

Pinned Loading

  1. trafilatura trafilatura Public

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Python 3.7k 268

  2. htmldate htmldate Public

    Fast and robust date extraction from web pages, with Python or on the command-line

    Python 122 26

  3. simplemma simplemma Public

    Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

    Python 146 12

  4. py3langid py3langid Public

    Forked from saffsd/langid.py

    Faster, modernized fork of the language identification tool langid.py

    Python 49 8

  5. courlan courlan Public

    Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

    Python 127 9

  6. German-NLP German-NLP Public

    Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

    453 66