Being a web developer, it’s often handy to crawl one of your sites and see if any links are broken, or are given plain 500 errors because something is broken.
A classic tool to do this with, is Xenu’s Link Sleuth. That tool is old though, no longer updated and it’s a pure GUI tool. Since I couldn’t find what I wanted as a ready to use command line tool, I got down and wrote my own. It took a while, but recently it became functional enough to be released as a v1 and open it up to the world as an open source tool.
So by this I present *drumroll*, the Sitecrawler command line based site crawler (yeah, I know, naming is hard).
What can it do?
Crawl a site by following all links in a page. It only crawls internal links and HTML content.
Crawls links only once. No crawling loops please.
Possibility to export the crawled links to a CSV file, containing the referrer links (handy for tracking 404’s).
Limit crawl time to a fixed number of minutes for large sites.
Set the number of parallel jobs to use for crawling.
Add a delay to throttle requests on slow sites, or sites with rate limits.
It’s written in .NET 6, so it runs on Windows, Mac and Linux. Check it out on GitHub for more details and downloads. It’s proven useful for me already, so I hope it does the same for you.
In some countries, you aren’t allowed to surf freely on the internet. Sites are censored, your traffic is monitored, your privacy and freedom are limited. Tools like the Tor browser help to bypass these internet blockades, but they rely on Tor-proxies. Running a Tor proxy takes some dedication and a certain technical background, but with the Snowflake project, anyone can help and be a part of the Tor network without effort.
All it takes is installing the Snowflake plugin in your browser, and you are set. When it’s active, you’ll be a middle-man getting restricted content from the web to censored users using the Tor browser. Don’t worry, there’s never a direct connection between you and whoever requested the page. It’s all routed through the Tor network, which anonymizes the traffic.
The plugin works on Chrome and Firefox. If you want more detailed info, check out the Tor wiki. You can even run a standalone version in Docker if you want to go full geek. ;)
So if you feel like giving some authoritarian dictator the finger, go ahead and install the plugin in your browser.
Facebook is probably the worst social media company out there, so it makes sense you don’t want their apps on your phone. But unfortunately your less privacy concerned friends are all gleefully using Facebook and Messenger and you don’t want to miss out.
I understand your pain. Here’s a simple guide to still use Zuck’s book on your phone, without the dreaded apps.
Step 1: get a new browser app
We’re going to use the mobile site, which works quite well. To separate all the Facebook traffic from our regular surfing habits and keep Zuck from snooping on us, we’ll use a completely different browser app.
Head over to the Google Play Store and search for “browser“. You’ll see a big list of browser apps, so you just have to pick one you’re not currently using. You are most likely using Chrome as your main browser, or the Samsung browser if you have a Samsung phone, so you can go for Firefox or the DuckDuckGo Privacy Browser as your alternative. Both are good browsers, and I’ve used them both for faking the Facebook. I even use Firefox as my main browser.
Step 2: open your newly acquired browser app, and surf to facebook.com.
After you log in, you’ll be able to use the mobile site pretty much like the app. Now, since this is a separate browser, you just leave your Facebook tab open. Next time you start your dedicated browser app for Facebook, you’ll be logged in already. Easy-peasy. Just don’t use this browser for anything else. If you do, Zuck will be able to follow you around on every site that has anything enabled related to Facebook or Instagram.
Step 3: set up messenger.
Messenger sucks because they want to force you to install the app when you use the mobile site to check your messages. There is a way around this though. Messenger still works on the desktop site aka your PC/laptop right? So we just have to tell Facebook we’re using that from our phone. You can do this by going to facebook.com in a second tab on your new browser. Now, you click the 3 dot-menu in the menu bar and activate the “Desktopsite” checkbox. The page will refresh and look pretty much the same, but now it thinks you’re visiting it from a desktop PC. Now open the Facebook hamburger menu, choose Messenger and voilà, there you have all your messages and contacts.
The trick is to leave this second tab open on your phone as well, so you have quick access to your messages whenever you like. After not using it for a while, you might end up with a message telling you to install the app again. This is because the tab refreshed and is back in mobile-mode. When this happens, just go back into the 3-dot menu of your browser and check the “Desktopsite” checkbox again. After reloading the page, you’re set again. A minor inconvenience for the added privacy of not having Zuck’s spy-apps on your phone if you ask me. ;)
Step 4: change the icon.
If you want to get fancy, now is the time to long-press the icon of your now dedicated Facebook-browser app and change the icon to… the Facebook icon perhaps? I also change the name to something more appropriate, like Fakebook for example.
Step 5: convince your friends to not use Facebook, WhatsApp or Instagram.
For my little gfpg project I wanted to put a simple static website online without having to set up and maintain a web server. I read about going serverless with a static site using S3 on AWS, but I wanted to try that on Azure instead. BLOB storage seemed the obvious alternative to S3, but it took some searching around and finding the right documentation on MSDN to get it all up and running.
If you’re on a similar quest to publish some static content to Azure BLOB storage as a serverless website, this short guide will help you along.
First of all we need to create an Azure BLOB storage account for the site. The most important part is to choose a general-purpose v2 Standard storage account, for the account kind. This is the only type that supports hosting a static website. Guess who didn’t do that.
Next thing is to enable static hosting of your files. This will create a $web folder in your storage account, which will be the root folder of your website. It’s that simple.
Copy your files into the $web folder using the Storage explorer blade in the Storage account menu, or the Storage explorer app. You can already test your site using the Azure endpoint.
The Storage explorer is a quick and easy way to upload and manage your files in the BLOB storage account.
You can stop here if this is a personal project and you don’t need HTTPS support or a custom domain. In my case, I did want to go all the way, so here’s how to get that working as well.
Get a domain name. Make it sassy ;). Make sure your domain registrar allows you to edit the CNAME records for your domain. This is pretty standard, but not all cheap web hosters allow this and you need it later on to hook up your domain to Azure.
Create an HTTPS certificate for your site on Azure with just a few clicks. I was afraid this was going to be hard but it’s so damn easy it’s beautiful. There really is no excuse anymore to let your site just sit there on HTTP these days.
Last thing to do is set up some caching rules for the CDN. We don’t want to be hitting the “slow” BLOB storage all the time and use the faster CDN instead. Depending on the option you chose for the CDN this will differ, but if you picked the Microsoft one you have to use the Standard rules engine to set your caching rules. If you picked Akamai or Verizon, you can use CDN caching rules instead. For a simple setup on the Microsoft CDN, go to the CDN settings Rules engine page, and set a global cache expiration rule to override and an expiration you like. After a few minutes you’ll see the cache header appear in your HTTP requests.
Here you can also create a rule to redirect HTTP traffic to HTTPS, so people don’t accidentally hit the insecure version.
One more tip on the CDN. You can also purge the CDN cache after you pushed an update to your site to apply the changes, before your CDN cache expires. This is handy if you’ve set a rather big expiration time, because you don’t expect the site to change very often.
From the CDN account, you can purge content on a specific path, or everything at once.
Years ago I ran into a website offering crude design advice. I thought it would be funny to make something similar for programming advice or guidelines. I started with a one-page website with a bunch of tips and then after a while forgot about it. Recently I ran into that project again and figured I might as well put it out here for the heck of it.
So here it is, some good fucking programming guidelines for you developers out there to have a laugh with, or perhaps even find a few useful tips and links in there. I swear, most of those tips are actually valid, even though they are presented in a tongue in cheek way.
So have fun with it. I know I did when I built the damn thing.