HugoSite / scanning a website for broken links in go
[ Blog, Projects, Open Source, Experience, Publications, Notes, Timeline, Contact ]

Scanning a Website for Broken Links in Go

December 29, 2025
Technical Web

Yes, I know there are paid and free tools for doing this. And yes, I know there are tools for this that I can run locally.

But this exercise allowed me to try out the well-designed Go package github.com/gocolly/colly.

Colly is a web scraping framework for Go.

Here is how I used it to quickly scan my website (the one you are on right now) for broken links.

First I defined a type for links to check and the URL of the page they appear on:

type link struct {
	Url     string
	PageUrl string
}

I also wrote a rudimentary function to check if a link is okay:

func checkLink(l link) bool {
	req, err := http.NewRequest("GET", l.Url, nil)
	req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0")
	if err != nil {
		return false
	}
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return false
	}
	defer resp.Body.Close()
	io.Copy(io.Discard, resp.Body)
	return resp.StatusCode >= 200 && resp.StatusCode < 400
}

Make sure to set a timeout on the default HTTP client.

http.DefaultClient.Timeout = 10 * time.Second

Next, let us define a worker function to check links as they are scanned from the website:

func checkLinks(links <-chan link) {
	seen := make(map[string]bool)
	for l := range links {
		_, ok := seen[l.Url]
		if !ok {
			seen[l.Url] = true
			if !checkLink(l) {
				// The link is broken, print an error message indicating the URL of the page it is on
				fmt.Printf("Broken link %s on page %s\n", l.Url, l.PageUrl)
			}
		}
	}
}

Finally, the function to crawl the website:

func crawl(domain string, links chan link) {
	c := colly.NewCollector(
		colly.AllowedDomains(domain), // Limit crawling to the site being scanned only
		colly.Async(true),
	)

	c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 2}) // Limit scan speed/parallelism

	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		link := e.Attr("href")
		link = e.Request.AbsoluteURL(link)
		if link == "" {
			return
		}

		links <- link{Url: link, PageUrl: e.Request.URL.String()}

		e.Request.Visit(link)
	})

	c.Visit("https://" + domain) // Start the crawl from the homepage
	c.Wait()
}

And finally, the main function to weave it all together:

func main() {
	http.DefaultClient.Timeout = 10 * time.Second

	links := make(chan link, 100)
	go checkLinks(links)

	crawl("hjr265.me", links)
}

And that’s it. I can run this program to identify any broken links on my website.

Colly, an easy-to-use scraping framework, makes it possible to do more than just detect broken links. I can want to perform other routine audits, like checking to see if my images are missing alt attributes, if my pages have the correct meta tags, and more.


This post is 97th of my #100DaysToOffload challenge. Want to get involved? Find out more at 100daystooffload.com.


Meta

All Tags • RSS Feed

My External Posts

  • Toptal Engineering Blog

Blogs I Follow

  • Jeff Geerling
  • Alex Ellis
  • Drew DeVault