Decrypting the Cryptography

17/08/2018

From non-cryptographer for non-cryptographers

 

Hmm, these files must be encrypted

Hollywood loves to throw in some words from cryptographic terminology to make the protagonist look like a genius or to pull the audience into the illusion of enjoying a well thought script. Although they usually hire highly qualified technical people to back their story, terms that are flying around tend to be quite elementary.

What I propose is that anyone who had mediocre (or even less) math skills in high school can understand why these techniques (algorithms) are considered to be secure.

My typical breakfast before school

As a student, I was like Jake Harper from the series Two and a Half Men. To quote from Charlie Harper: A D-Student with a perpetual erqhu (instead of writing the first letter followed by * characters, I used Caeser Cipher. Hint: I shifted letters by 3). In my defence: I did not have the latter issue.

So if I can grasp the concept, you can too.

Some Terminology

  • Cipher: noun. a secret or disguised way of writing; a code. verb. put (a message) into secret writing.
  • Prime: is a natural number greater than 1 that has no positive divisors other than 1 and itself.
  • Modulo Operation: the remainder after division of one number by another: 7 (mod 5) = 2. 12-hour clock works in (mod 12).
  • A large number in cryptography is like number of protons in the universe (10⁸⁴ ) large. So when you see large in this writing, think of some number like that.

And a little math before we dig in

If you are still here, I probably had your curiosity, now I need your attention.

There are two fundamental concepts which form the backbone of most cryptographic algorithms:

Integer factorization is the decomposition of an integer to the product of smaller numbers. If these numbers are all prime, then the output is the prime factors of that integer. e.g, 18 = 2 * 3² (2 and 3 being the prime factors).

The Discrete Logarithm is the following: logb a = x or b^x = a (find x for given a and b). e.g, 3⁴ = 81

a = 3
b = 81
x = 4

The part of Discrete Logarithm that really shines is when we use it with modular arithmetic: 3^k ≡ 4(mod 7). The value of k could be 4, 10 and infinitely other solutions.

What these two have in common is that there are no known efficient algorithms for both prime factorization and discrete logarithm problem with carefully selected values.

Keys and Locks

The game of encrypting and decrypting is the same with keys and locks; you lock what you need to be safe and use the key to access that item. When we encrypt a piece of information (data), we simply put a lock on that information. Obviously, only way to access that information should be via our key; which is called decryption.

Since machines understand nothing but numbers (even worse than that they only understand binary — 10011010 — as you know), they translate everything to numbers: Mails, duck-face selfies, your favorite song, basically anything any machine produces as output is a number in its mind. Our key is also going to be a number. So let’s produce a key.

My subconscious mind just produced a key: 717. This is perfectly safe right? How could you know the key I just produced. I do not need any of that aforementioned math. But wait… What am I going to do with this number? How is this going to help me safely send and receive mails or messages? Well it is not. You can take your message m and multiply it with your secret key k and tell your friend the key so he can divide the result with your key and find m (just multiplying the message would compromise so much information to attackers every time you encrypt a new message, and if you use different key every time, you would need to remember all those keys). But if you can send the key safely why didn’t you just send the message without encrypting? Let me introduce to you…

Public-key Cryptography

What if we had two keys instead of one?

I will produce one key which will be public, meaning, that I will broadcast this key to everybody. My public key will have the power to lock my treasure but only that. And I will have another key, a private key, to unlock my treasure. Just imagine you lock your door with a blue key and unlock it with a red key.

People will use the blue (public) key to encrypt their messages before sending them to me. And I will use the red (private) key to decrypt their messages. This way everybody can cipher their messages (which I am the recipient) but as the holder of the red key, only I can decipher them.

Key Generation

In any cryptosystem (I haven’t seen a counter example so far but I may be wrong), prime numbers have tremendous role to play. The key generation process usually starts with producing a pseudorandom prime number. It is called pseudorandom for there is no geniune randomness in nature except for radioactive decay. The idea is to be as random as possible using particular algorithms and techniques. Randomness is important in the sense that there should be no algorithm to find that number in an efficient manner.

There is one slight problem about prime numbers: when they are large, we cannot know for sure that they are indeed primes.

Why do we want them to be large? Well, which one is easier: To guess a number between 1 and 10 or 1 and 1,000,000,000,000,000?

It is highly inefficient to perform a non-probabalistic primality test to a large number. That is why, probabilistic algorithms such as Miller-Rabbin are used to check if our random number is prime, if not try another. The obvious problem with these are, they sometimes gives us false-positives (a concrete number is generated instead of a prime).

RSA

Initials of Rivest, Shamir and Adleman forms the name for this widely used public-key cryptosystem. I picked RSA since it is widely used and is IMHO easier to understand than some of its peers.

Forming the dictionary (you can go back and forth here as you see them):

  • m denotes the message to be encrypted (it could be a text or a photograph as all information should be converted to numbers before any manipulation).
  • c denotes ciphertext: a message that is ciphered, encrypted (just as the last word of Charlie Harper quote above).
  • n is the modulo base. It is the product of two large primes p and q.
  • ?(n) (Carmichael’s Totient Function) is computed as: lcm(p − 1, q − 1) (lcm stands for least common multiple).
  • e stands for our public (blue) key: 1 < eλ(n).
  • d is our private (red) key: de ≡ 1 (mod λ(n)).

For more in depth explanation visit: RSA key generation.

RSA Encryption

The ciphertext is generated as the following:

Although e is said to be our public key, n also is public, it is just not a key :] The sender computes ciphertext with this function using the blue key and sends it to me.

RSA Decryption

I am now going to use my red key to unlock this sacred message as the following:

Here comes the magic: To compute d one needs to compute ?(n). And to compute ?(n) she just needs to know what p and q are. Everybody knows that n is the product of these two, so she can go head and cast prime factorization on them. Since there is no known efficient algorithm to solve that, she has to wait (with an average computer) usually more than the age of universe to find p and q. Or she can just try all possible numbers (called a brute force attack or a naive approach because it is pretty naive :] ) but again, she has to wait probably about 15 billion years or even more as keys get larger.

I can compute d easily with my two private sub-keys p and q (I called them sub-keys for they are only to compute my primary private key, red key everytime I receive a new ciphertext). After that, just take exponent of c and voila, I decrypted the message!

Why so secure?

Using modular arithmetic with discrete logarithm produces the unlimited (or limited by our large numbers) possibilities of solutions when in fact only one of which is the one that has the capability of decrypting our ciphers. It is like hiding our key in a room full of fake keys with only one authentic key. And it is not just the possibilities, there is no way to efficiently find all these solutions.

By picking p and q as primes, we make use of one of our most resistant and defensive spells, the prime factorization. And this also ensures that there are only two numbers that form our modulo base n.

Other Cryptosystems

Different systems solve different problems. Signature algorithms solve the problem of proving that you are who you say you are, just like real signatures. Usually signed with a private key and authenticity is checked by anyone with the public key. It is like the reverse of RSA: This time I encrypt (sign) with my private key and anyone can decrypt (check the signature) with the public key. There is no secret message or information in this case.

Some Signature Algorithms:

  • DSA (Digital Signature Algorithm)
  • ElGamal Signature Scheme

AES is also a popular encryption algorithm. You might have heard it in the series Mr. Robot. AES makes use of only a private key but not a public key. The aim of the algorithm is just the encrypt (generally your own) data and access it with your private key when you need to. It stops the rest of the world to access your data.

Cryptographic Hash Algorithms suchs as SHA makes use of hash functions which are one-way functions. You state that you used for instance SHA-1 to hash your message and share the ciphertext with others. The only way to produce the same ciphertext would be to give the same message as input. The beauty of hash functions is that they produce insanely different outputs with the tiniest differences in the messages which makes it close to impossible to find the message efficiently. Hash functions are the backbone of the new hype Blockchain Technology. You can read Anıl Akarsu’s article to gain insights about Blockchains.

The numbers suffixed to algorithm names as in AES-256 and SHA-1 typically denote the bit-wise length of keys used in the algorithm. 1 in SHA-1 stands for 128-bits which means 2¹²⁸. This is why I said think of the number of protons :]

Is this just the tip of the iceberg?

Probably. The rest is also unknown to me. I have linked wikipedia pages for more enthusiastic folks. I aimed to dive and show a little under the surface here, but of course to really, really grasp it, stepping on those math legos is a must. No pain, no gain right?