Nowadays, when you are starting a new project, instead of storing password on your own, probably you will choose a third party identity provider like Google Identity, Azure Active Directory, etc. But as a developer, it still good to know how the work is done under the hood.
Plain Text
The simplest way is to store the plain text password directly.
const Database = require('better-sqlite3');
const db = new Database(':memory:');
db.exec(`
CREATE TABLE IF NOT EXISTS users (
username VARCHAR(16) PRIMARY KEY,
password VARCHAR(16)
);
`);
function createAccount(username, password) {
const result = db.prepare(`SELECT username FROM users WHERE username=?;`).get(username);
if (result) throw new Error("user already exists");
db.prepare(`INSERT INTO users (username, password) VALUES (?, ?);`).run(username, password);
}
function login(username, password) {
const result = db.prepare(`SELECT username FROM users WHERE username=? AND password=?;`).get(username, password);
if (!result) throw new Error("invalid username or password");
}
createAccount("yaox023", "123456");
login("yaox023", "123456"); // success
// login("xxxx", "123456") // fail
db.close();
The problem of storing plain text password is that if there a leak on your database, then all users password are exposed, which is a huge risk.
Encrypted
So you don't want to save the plain text password, the first idea comes to mind is to encrypt it. So now we have a secret key. We use this secret key to encypted the password and store the encypted version. When a user try to login, we query the encrypted password, decrypt it with the secret key, then compare them.
const crypto = require("crypto");
const SECRET = 'secret';
const ALGORITHM = 'aes192';
const cipher = crypto.createCipher(ALGORITHM, SECRET);
const decipher = crypto.createDecipher(ALGORITHM, SECRET);
function encrypt(text) {
cipher.update(text, "utf8");
return cipher.final("hex");
}
function decrypt(text) {
decipher.update(text, "hex");
return decipher.final('utf8');
}
const text = "123456";
const encrypted = encrypt(text);
console.log(encrypted); // a70cb84e4e185d11e9b35fdd7153804b
console.log(decrypt(encrypted)); // 123456
The problem of the encrypted password is similiar to the plain text. The hacker which obtains the encrypted password can also obtain the secret key.
Hashed
So as long as the server can get the real password, no matter how the stored one is encrypted, there are risks that password can be leaked. So we need to find a way to ensure: 1. we can authenticate users as normal, 2. even if the hacker could get the stored password, he cannot use it to login. This is the perfect scenario for hash functions.
To put it simply, a hash function is any function that receive a text and return a text. If the input text is the same, the output text should be the same. If the input text is different, then the output should different too. And one more requirement, this process cannot be done in reverse. For example, if the password is 123456
, input it into a hash function, and we get an output abcdefg
. We store this output into database. Every time a user sends his credentials, we could use the same hash function and get the same output. So we havn't store the real password, and still we can authenticate the user. Because this process if not work in reverser, so even if a hacker get the stored hash value abcdefg
, he cannot get the real password, so he cannot use it to login.
MD5 and SHA-1 are typical hash functions. To make it more secure, we recommend to use SHA-256 algorithm to generate hash values.
const crypto = require('crypto');
const password = '123456';
const hashHex = crypto.createHash('sha256').update(password, 'utf8').digest('hex');
console.log(hashHex); // 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92
Now the hacker cannot get the real password directly from the hash value. But it does not means this is unbreakable.
Rainbow table is a technique that can be used to break this kind of password. The idea is, the hacker could computes a lot of hash values from passwords normally used first. This is the rainbow table which maps passwords to its corresponding hash values. So when the hacker somehow get the hashed value, then he can search it in the rainbow table to get the real password. As you can imagine, if the hacker has space big enough and equipment fast enough, he can build a real large rainbow table, which means bigger possibility to break the hash value.
Salt
We learn from the above that the rainbow table is prebuilt from passwords mostly used. People tends to have similiar habit to create passwords, like 123456
. So instead of compute hash values from passwords direcly, we can add a random value to the password, and then compute the hash value. This random value is called salt.
const crypto = require('crypto');
const password = '123456';
const salt = "secret-salt"
const hashHex = crypto.createHash('sha256').update(password + salt, 'utf8').digest('hex');
console.log(hashHex); // 02006a518a7861998857cb1c65b8c2593ef379e44d6ac5e1b4d10c48571ec705
As you can see, now the hacker need a more bigger rainbow table to break the hash value.
Time
Yes, with the salt added, now it is harder for hacker to hack the hash values. But with enough patience and super fast equipment, the hacker could still guess the password. Which means that the hacker could just brute-force directly.
So how to defend paswords from this kind of attack? Think about the brute force approach, the hacker needs to try many times to hit the real password. Because the equipment is very fast, so each try just take very little time. If we somehow make each guess to take much more time, then we can make this brute force approach cannot success in reasonable time.
So how to make the hash function takes more time? bcrypt is a hash function which is designed to tackle this problem.
const bcrypt = require('bcrypt');
const saltRounds = 12; // make this bigger, to make the hash function taking more time
const myPlaintextPassword = '123456';
const t1 = Date.now();
bcrypt.hash(myPlaintextPassword, saltRounds, function (err, hash) {
console.log(hash);
console.log(Date.now() - t1)
});
There is a balance need to be achieved here. We need to ensure that the normal authentication runs properly, so we can't let this hash function takes too much time.
Space
Another way to defend the brute force approach is to make the hash function takes up more memory. scrypt is a hash function which is designed to make this happen.
const scrypt = require("scrypt-js");
const password = Buffer.from("123456");
const salt = Buffer.from("aaabbb");
const N = 1024 * 1024, r = 8, p = 1;
const dkLen = 32;
const keyPromise = scrypt.scrypt(password, salt, N, r, p, dkLen);
keyPromise.then(function (key) {
console.log("Derived Key (async): ", key);
});