Knowledge is only real when shared.
Loading...
Knowledge is only real when shared.
July 2022
Considerations to make when developing websites or applications.
⚠️ The following post is still in draft state and will be finished soon.
This post covers security and privacy topics relevant for web development. First, the hot topic of privacy and especially tracking through cookies. As a web developer one wants to know when and which privacy measures have to be taken. Second, some considerations and recommendations for developing software with the use of third party code are discussed.
Cookies are a client-side feature implemented in the browser. However, cookies are wholly controlled from the server-side. With every request all the existing cookies are sent to the server and the server can edit them with the response. Cookies were used to persist local user data between browsing sessions but with the introduction of LocalStorage and SessionStorage this is no longer needed.
The only way cookies are nowadays used is to store a small string identifier that let's the server know when the page is accessed by a previous visitor. Attached to this identifier the server can store additional data like settings made to reflect earlier user choices when visiting the page again. The server has arbitrary control over what data they attach to this identifier. If you enter your name, address or email somewhere that data can also be stored. This stored data can be shared or sold to other people without your knowledge. To prevent such abuse that is hard to investigate from outside various privacy laws have been enacted. The next paragraph will focus more on these specific laws.
Because privacy laws are notoriously difficult and laborious to enfore cookies themselves have become the target. Consumers are very sensitive to how their data is handled and unintended leaks of user data have lead to massive reputation damages to companies. Since there is no clear boundary as to what can lead to a scandal in the eye of the public companies are trying to do everything they can to avoid mishaps and show how dedicated they are to this issue. The following are some cases of companies going an extra mile to show their dedication to privacy:
Google removing third party cookies from Chrome
Apple making apps tracking the user an opt-in feature
Facebook disabling their third party interface
The definition for cookies used in legal texts like the GDPR has nothing at all to do with the cookie technology built into the browser. Any request made to your server includes information like the IP Address, device information and the originating geo location which count as personal information. Since it's not possible to get consent from the user before a request is being sent the ways this is handled vary greatly.
When your client or server sets cookies that could be used for tracking this always needs to be documented and there needs to be a way for users to opt-out of tracking.
The Keep Logged In checkbox displayed alonside many login forms is a good way to get explicit cookie consent without displaying a pop-up on the landing page.
Analytics can be a sensitive topic as well. Almost any analytics approach will need access to information that can be deemed personal. However, when done sufficiently anonymous it's possible to gather valuable information while still respecting the privacy of the user. Analytics almost always requires a cookie to be set.
Popular Serverless hosting platform Vercel recently acquired Splitbee which offers privacy-friendly analytics through the use of first party cookies. Using this approach there is no need for cookie consent banners anymore.
When storing user data on your server one needs to adhere to a couple of privacy laws and outline the steps taken in a Privacy Policy. Such a privacy policy is only necessary if the website is sending personal information back to the server. Adding a privacy policy will not have any effect on the laws in place but can be used to avoid liability in some cases.
CCPA / GDPR
The following Privacy Policy Generator will automatically generate the necessary policy texts to be included on a website for you.
Which of the following user data do you use?
Three features of cryptography are godsent when it comes to privacy and should always be implemented whenever possible. The first is Public-Key Cryptography which provides a safe way to communicate between the user and your server no matter where the communication passes through. The second is encryption which allows data to be stored without the possibility to be read by anyone even if they have access to the stored data. The last is hashing through cryptographic one-way functions which prevents attackers from logging in to the application even in the case of a password breach.
This feature needs to be installed on the server side and secured connections can be identified by the use of the https://
protocol. Encrypting traffic used to cost quite a bit and was reserved to applications which transferred sensitive data. Let's Encrypt is offering free encryption to millions of websites already and integration at the click of a button is supported by a lot of hosting providers already. If your domain points to a site hosted for free on Vercel all traffic will automatically be secure.
Encryption can be used to prevent an attacker from gaining access to the user data you are storing on your server in case of a breach. In this case an attacker can access the stored information but cannot decrypt it as only you have access to the keys used for decryption which aren't stored anywhere on the server.
Encryption can also be used to prevent anybody but the user from gaining access to any data that is stored. With this type of encryption the data is already encrypted on the user side and cannot be read by the server. This type of encryption can be more complicated as the key to decrypt the data needs to be stored by the user themselves.
Using the node crypto library there are several encryption algorithms available. In the above example we use the AES-256 algorithm which is still considered safe. Similar to the salt used for hashing the encryption algorithm requires an Initialization Vector additionally to the key. This vector should be as random as possible, doesn't need to be private and it's best to create a new one for each encryption. The key on the other hand needs to stay private even in the case of a breach to ensure the attacker cannot decrypt the data. In order to keep the key private it usually has to be entered manually at the start of the server. The key will only be stored in memory making it inaccessible for attackers. However, if the server has to be restarted the key needs to be entered again, meaning it has to be stored somewhere else.
Hashing through the use of one-way functions is a way to safely store passwords in a way that even if an attacker steals the stored password hash there is no way they can login without knowing the actual password.
The way this works is that upon registration the password is sent to the server and run through a function which returns a hash (a string of the same size for any input). The password is discarded and only the hash is stored in the database. When the user wants to login they enter the password again and the server generates a new hash. If the new hash and the one in the database it's usually safe to assume the user has entered the correct password.
For some of the most popular hashing algorithms hackers have created huge lookup tables in order to achieve a reversal. This issue is mitigated through the use of a salt. A simple key that's specific to your application and will be added to any password before hashing. Assuming an attacker also gained access the the salt the only way to reverse a hash would be to create a lookup table from scratch which would take forever and would be very expensive.
Node has built in support for the popular MD4 and MD5 hash functions. Used to hash passwords both of these nowadays pose a significant security risk and shouldn't be used for this purpose. MD4 has been disabled altogether starting with node 17. SHA-256 and SHA-512 provide good options when used together with a salt and are built in to node.
Secure hashing and storage of passwords is a huge topic in itself. Security can be improved by using a separate salt for each password and running the hash algorithm by a predefined iteration count. All of these features are baked into a package called pbkdf2.
Theoretically, using dependencies from npm that are developed anonymously as Open Source software pose a significant security risk. Practically speaking and perhaps somewhat surprising these vectors are rarely exploited.
npm allows anyone to publish packages from their computer that can then be downloaded by other developers. Building the code happens locally and the distributed contents don't necessarly have to match the open source code published to GitHub. Most packages are run using node and by default have significant system access and the potential to wreck a lot of havoc in case they were corrupted.
As described in this Tweet npm is investigating ways to ensure published code is linked to the source code it claims to use. The idea is to achieve this by cryptographically linking a build to it's source with a new technology called sigstore.
With most popular packages the authors and their identity is known. People with a malicious intent are unlikely to create an online identity that's several years old. To gain more trust in a package you can do your own research and decide whether the identity of the public authors is credible. npm is unlikely to provide some kind of verification of identity and associated liability since there is currently no technology available to achieve this without major disruptions.
The most likely explanation why there hasn't been more abuse is that there simply are easier ways for hackers to cause trouble. In order to successfully publish malware to npm one would have to develop a package that looks useful, works and is successfully marketed. In the cases where malware was published to npm it was done so by gaining access to a foreign package that already had a significant install base and then publishing a bad version.
When publishing from your local computer it's important to make sure your system hasn't been compromised in the first place and enable two factor authentication on npm to make it harder for any other person to publish a new package. Share npm publish access with as few authors as possible. When publishing through a CI for example with the release-npm-action the tokens used will bypass two-factor authentication and can be used to publish arbitrary packages. Since these tokens are hidden behind your GitHub account it's important secure that account. GitHub also offers two-factor authentication and branches can be protected to make it harder to publish unwanted code that would later be automatically be released.
Before GitHub Actions is was common to commit the bundled code into the repository. Generated code is much harder to read and verify during a review and so allows for malicious code to me merged into the repository. Reviewers are likely to miss some undesired changes in a generated file as they are unreadable, often not displayed in the diff and usually very large. While bundles have disappeared there is still some generated code being committed. An example is the package-lock.json
which contains all the links where the packages are to be downloaded. By simply redirecting one download an attacker will be able to inject arbitraty code into the later distributed bundle. Whenever possible one should avoid committing generated code even at the cost a slight performance penalty. Unless, reviewers are aware that these files need to be verified carefully.
Currently, npm doesn't require any type of identification with which a malicious actor could be tracked down later when a malicious package was published. Places like the Apple App Store have been very successful at preventing bad software to be distributed. Such a process is of course way to much to ask for with an Open Source package registry like npm. Still, adding a small developer fee in order to identify a publisher and avoid spam could be a solution. More important and practical would be a way to make sure the published contents actually match the build output of the attached repository and that always a repository with the source code is required. To achieve this npm would have to only accept published artifacts from a GitHub action in the specified repository. Since npm has already been acquired by GitHub this is certainly a possibility.
Most of these improvements would make it significantly harder for developers to publish packages to npm. Because the simplicity has been one of the success factors for npm it's unlikely that any of these steps will be taken. This could however change quickly if lots of issues related to safety appear.
An additional cryptographic step to secure your repository are to only allow signed commits. To achieve this GPG needs to be set up before committing and the key has to be added on GitHub. The following Generating a GPG key documents how to generate such a key. For macOS the gpg CLI binary is available at gpgtools.org. With the UI it's enough to enter a password that later is required to commit and export they key while uploading the contents of the exported file to GitHub in the GPG keys settings. SourceTree will automatically discover the GPG installation and under Repository Settings... in the Security tab the key can be selected. When committing it's important to select Sign commit in the Commit Options.... The password is only required on the first commit and can later be stored in the keychain. To make use of signed commits make sure to select Require signed commits under the Branches tab in the repository settings. Note that branch protection rules will only be enforced for GitHub Team or Enterprise accounts. Signed commits will receive a green Verified label.
Docker provides a way to run code inside a sandbox so that even in the case of some dependency containing malware the attacker cannot access any other sensitive data on your computer. Docker runs software similarly to how a browser runs the code of a website. Instead of running directly on the main system the code is executed in a sandboxed environment and only has access to files inside this container. In the case of docker this sandbox is called a container and running such a container requires a certain overhead as compared to just running the code directly on the system where the system is the container itself.
Apart from some performance downsides and additionally required configuration, it's quite easy to move your web development build into a container. The web page can still be accessed normally from your browser through localhost
. The container will only expose things through HTTP which is run in the browser and therefore already sandboxed.
Docker requires two configuration files: a docker-compose.yml
and a Dockerfile
file. The compose file describes the container and how it's exposed while the Dockerfile
is more programmatic and tells the container the initialization steps to be run in sequence.
Setting the version in the first line of the above docker-compose.yml
is optional but useful for Docker to know which syntax the configuration has been written in. In this case we only need one service to build in the main folder and route the vite default port of 5173
to another one outside so that the application can be accessed from the regular browser through localhost:3000
later on. Volumes are used to transfer data between the source code and the files in the container. The first volume ensures that all changes made in the root folder are copied over to the /app
working directory inside the container. The second volume prevents the node_modules installed in the container to be overridden by missing local files.
The first thing to do in the Dockerfile
is to add node. The version can either be specific like node:18.8.0
or use the tags current
(stable release with latest features) or lts
(guarantees long-term support, useful for production releases) to default to the most recent version. Then we create a working directory. After that we copy over all source files into this new working directory. Then the dependencies are installed through npm. After that we expose the port that has already been routed out of the container in the docker-compose.yml
file. Last step is to run the development server so that the page is served on the port. This makes the website accessible from outside through localhost:3000
.
Optionally, a .dockerignore
file can be added to avoid copying over certain files.
The container is started from the Terminal by running docker compose up
. Node doesn't necessarly have be be installed on the main system and will now be downloaded again for this container. The website built inside the container can now be accessed through localhost:3000
. Changes made to the local folder outside of the container will automatically be copied over into the container where a rebuild of the website is triggered.
When using the above solution and making changes to the source code inside VSCode a lot of features will be missing as there are no node_modules
available. In order to use many VSCode features arbitrary code from inside node_modules
will be executed on the main system. This was exactly what we were trying to avoid with the introduction of a container. Luckily, VSCode can alsorun inside the container and only transfer the results outside to the editor. The following overview outlines how VS Code communicates with the container.