Conas ata tu mo chairde? As the exams draw ever near and us students find ourselves very busy. I decided to make a small info scraper for urls. Nothing too hectic, just a normal procrastinatory measure to ensure I waste some more study time. It only took about 40 mintues to write so don't expect too much.
Currently it finds emails, href links, image sources, runs a whois, finds the host IP and checks for a robots.txt. I find it quite handy sometimes so hopefully you will have a need for it. For those interested, here is the regex I used for some of the scraping.
href links -> '/[^>]+hrefs*=s*["'](?!(?:#|javascripts*:))([^"']+)[^>]*>.*?/si'
img src -> '@
emails -> '/[a-z0-9_-+]{1,256}+@[a-z0-9-]{1,256}+.([a-z]{2,3})(?:.[a-z]{2})?/i'
I'm sure these could probably improved to find more, but they seem to work fine. The scraper is located HERE, check it out if you're interested ;)