What is OAuth 2.0?
OAuth 2.0 is a common term in the web development community. But this can be bit complex to understand at first. OAuth 2.0 is simply a standard protocol for authorizing access to resources that are hosted in another application with the consent of the user. In this article we will try to understand what OAuth 2.0 is, what it is not, and how to implement it for your application.
What is OAuth 2.0
Let’s consider a scenario. You have found a new game online and smashing every level. You are really enjoying the game. At one point, the game offers you to post your scores in your Facebook profile. You want to do it, to show off to your friends how good you are in this game. The game wants to do this, to attract new players. Problem is, posting from your Facebook profile is only allowed for you. So how can the game post from your profile on your behalf?
In the early days of the internet, the game would ask you for your Facebook username and password. And you would give it to them. You can imagine how dangerous this is. You are giving the game developer the full access to your Facebook account. They can do whatever they want, read your private messages, and what not if they have malicious intention. Even if they don’t have any ill intention, your password will still be vulnerable in case of a database breach, because the game’s database would have your password saved in plain text. So, this is a big NO.
To solve this problem, the web developer community came up with the OAuth protocol. In OAuth protocol, the game app can ask for a limited access (only able to write post from your profile), and you can tell Facebook “Hey, allow them to post on my behalf”. And Facebook will then allow them to do just that, no other activities from your profile.
There are two versions of the OAuth protocol. The version 1 is outdated and not suitable for modern applications. OAuth 2 is now the standard and pretty much any modern application uses this. So, our discussion will be limited to version 2.
What OAuth 2.0 is not
OAuth protocol is supposed to be only for Authorization, not Authentication. The two words might sound similar, but they are not the same. Authentication means to prove your identity to the server. Authorization means to prove that you have permission to do the thing that you are requesting for. Authorization can happen without the server needing to know who you are. Sounds weird?
Suppose you need to withdraw money from your bank account. You can write a check and send anyone to the bank with the check. The bank doesn’t necessarily know who is withdrawing, the bank just knows, the check is valid, the signature is valid so must be an authorized access.
Though, OAuth 2.0 protocol wasn’t supposed be used for authentication, the developer’s community have extended it and come up with a way to use this for authentication. You might be familiar with “Sign in with Google” or “Sign in with Facebook” buttons in web applications, which underneath uses this protocol. We call this “OpenID Connect”, which we will not discuss in this article.
How it works
Now it’s time to explore how this protocol works. First you need to understand the following four roles,
- Resource Server: Resource server is the app that holds the protected data. Suppose a game wants to access your email contact list to send them an invite to play this game. Your contact list is the protected resource, and the email server is the ‘resource server’ that holds this data.
- Authorization Server: Authorization server is the app that manages the authorization process. This can be a separate app, or the same app as the resource server, depending on how the developer implemented it. For example, your Gmail contacts are managed by an app hosted in ‘contacts.google.com’ and the authorization is handled by a separate app hosted in ‘accounts.google.com’.
- Client: Client is the app that wants to access the protected resource. In our game example, the game is the client application. (The term ‘Client’ is often used to indicate ‘front end’ part of a web app, but don’t confuse this client as just the frontend. Here, we are indicating the whole third-party system as the client application.). The client usually registers itself first with the authorization server. The authorization server provides it with an ID and a secret. The server can authenticate the client during the authorization process through this client ID and secret combination.
- Resource Owner: Resource owner is the user to whom the protected resource belongs to. The user is the owner of the data. He/she must decide first if the permission should be given to the client app. If the user gives the consent, only then the authorization server would give the client the requested access.
In short, the client app asks the user (resource owner) for access to the protected resource. When resource owner grants access to the client app, the authorization server provides an access token to the client app. The client app then sends request to the resource server with that access token. The resource server asks the authorization server if this access token is valid. The authorization server checks the validity of the access token and informs the resource server. If the token is valid, then the resource server will accept this request from the client app.
The access token is a random string that doesn’t carry any meaning to the client (or it may carry some data, like a JWT token). The authorization server can recognize this token and extract the scopes of the token. Scope determines what actions are valid through this access token. For example, if only ‘read’ scope is defined for an access token, the client can only read the protected resources with this access token, but cannot write, modify, or delete any of the resources.
Usually, the access token has an expiry time for security reason. Another token named ‘refresh token’ is often provided along with the access token, which can be used to get a new access token when the current access token expires.
There are four flows through which the authorization process can be done. Let’s discuss the flows one by one.
Authorization Code Grant
This flow uses redirections a lot, so the client app must have access to a user agent (web browser) to do the redirections. Here are the steps for this flow,
- The client initializes the flow by taking the user (in a browser) to the authorization endpoint provided by the authorization server.
- The authorization server authenticates the user i.e., the user logs in to his/her account if not logged in already and then asks the user if he/she wants to grant the access the client app is asking for.
- The user reviews the request and grants or denies the request.
- If the user grants the access, the authorization server redirects the user browser to a redirection endpoint provided earlier by the client app, with an ‘authorization code’ in the query param. The code has a very short expiry time.
- The client app’s server then sends a request to the authorization server with the ‘authorization code’ received in the previous step (server to server communication, not in the user’s browser anymore)
- If the code matches, the authorization server responses back with an access token and a refresh token. The client then stores the tokens, and sends requests to the resource server when needed, with the access token.
Note that in this flow, the authorization server could have sent the access token at step 4, but instead sent a code, which was used in step 5 to get the actual access token. This is done because user’s browser is not the most secure place. Anyone in the middle can log the network traffic and log the URLs. If access token was given in the URL, the middle party can get the token too. Because server to server communication is more secure, when client server sends request to the authorization server from its backend, there is no one in the middle that can intercept the request. So even if the user’s browser is compromised with any malicious script, the token is still protected. In this flow we provide a refresh token too, so when access token expires, the client can use the refresh token to collect a new access token. The access token never gets exposed to the resource owner/the user so there is less chance of it getting stolen.
Implicit Grant
Not all apps have a server. Some apps are implemented in a browser only, with languages like JavaScript, or native apps that have some complications. Such apps cannot use the authorization code grant flow. For them, we can use the implicit grant flow.
The implicit grant flow is same as the authorization grant flow. The difference is, on step 4 of authorization grant flow, we provided an authorization code, but in implicit grant flow, we provide the access token directly. So there is no step 5. As discussed, this is less secure as the token will be exposed in the user’s end.
Resource Owner Password Grant
This may sound weird, but in this flow, the user provides his/her username and password to the client, and the client’s server then sends a request to the authorization server with the username and password. “I thought we use OAuth 2.0 protocol so that we don’t need to give our passwords to the third-party app” you may say. Yes, that’s what I discussed earlier, but this is an special case. This is supposed to be used by clients that are highly trusted by the resource owner, for example, the device operating system. This flow can be used to get an access token and refresh token once, and after that, the client won’t need the username and password anymore so should throw them away. The user can change his/her password too and the client app will be able to continue working as it no longer needs the password.
Client Credentials Grant
Client credentials doesn’t require any involvement from the user. Usually, the protected resource owner is the client app itself. Or the resource owner has established the consent with the authorization server previously in some way (for example, the app is an internal service that the user wishes to use). The client can directly request to the authorization server for access token using its own credentials i.e., it’s client ID and client secret. If valid, the authorization server responses with an access token. No refresh token is provided in this flow.
Practical Implementation
If you are interested to see how we can implement this protocol in an app, I have created a repo in GitHub following the standard specified in the RFC 6749 document. I have linked the corresponding document section in each part of the code so that you can see the practical implementation and check the specification to better understand why it is implemented this way.
I hope this helps you understand how OAuth 2.0 works and how to implement it in your own application!
(If you found this article helpful, give it some claps! If you want to connect with me, send me a request on Twitter@AhsanShihab_.)