Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for broken links #2128

Open
ddbeck opened this issue May 23, 2018 · 5 comments
Open

Test for broken links #2128

ddbeck opened this issue May 23, 2018 · 5 comments
Labels
enhancement Nice to have features. idle Issues and pull requests with no recent activity linter Issues or pull requests regarding the tests / linter of the JSON files.

Comments

@ddbeck
Copy link
Collaborator

ddbeck commented May 23, 2018

From the discussion on #1990, it'd be nice to have automatic checks on URLs that appear in the BCD data, both in mdn_urls and in notes.

I have a little bit of a proof of concept in the form of a script I use as a local pre-commit hook. Unfortunately, it's not at all clever, uses some non-standard tools, and it's a fish shell script, so it's not exactly portable. There's probably a right way to do this but I'm not sure what that would look like.

Additionally, previously related issues #1998 and #1703 imply that it may be desirable to have some broken links (e.g., MDN pages that should exist, but don't).

@ddbeck ddbeck added enhancement Nice to have features. linter Issues or pull requests regarding the tests / linter of the JSON files. labels May 23, 2018
@connorshea
Copy link
Contributor

connorshea commented May 23, 2018

Two other scripts also exist to do this, although I think they only check the links in the mdn_url attribute:

This should probably be a command that's separate from the lint command so the linter continues to run quickly.

There's also the problem of master failing because of changes on MDN when this project didn't change anything, which is less than ideal.

@ExE-Boss
Copy link
Contributor

ExE-Boss commented May 25, 2018

At the very least, the linter should be a bit more clever about detecting MDN URLs (eg. if it finds something that matches the \bhttps?://developer\.mozilla\.org/ RegExp, it then checks if it matches \bhttps://developer\.mozilla\.org/docs/, and if not, it will print an error).

Right now, it only prints an error if it finds \bhttps?://developer\.mozilla\.org/\w{2}-\w{2}/, which only detects MDN URLs with a language code in them, but not MDN URLs that have ddocs instead of docs in them.

Another thing that might be a good idea to test is if the mdn_urls are descending (eg. sub-features start with the mdn_url of the parent) and print a warning if not (not an error, because there are edge‑cases on MDN where this might not necessarily apply).

I’ll make a PR soon to test for the above rules.

@queengooborg
Copy link
Contributor

@ExE-Boss This is an old issue, though still an important one. Do you intend to follow up on this?

@ExE-Boss
Copy link
Contributor

I completely forgot about this, thanks for reminding me.

@queengooborg
Copy link
Contributor

Alright, so I have a PR that tests the HTTP response codes from every URL within BCD (and I do mean EVERY URL). However, I've found that it's slow...super slow, due to how many links there are. (The test took quite literally an hour to complete, which is partially due to a little redundancy I found; see #5243 for that). It looks like a total of 365 files currently have issues -- this is probably a project for us to tackle.

@github-actions github-actions bot added the idle Issues and pull requests with no recent activity label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Nice to have features. idle Issues and pull requests with no recent activity linter Issues or pull requests regarding the tests / linter of the JSON files.
5 participants
@ddbeck @connorshea @ExE-Boss @queengooborg and others