hazza/jupyterhub

Fork 0

mirror of https://github.com/jupyterhub/jupyterhub.git synced 2025-10-09 19:13:03 +00:00

Files

Min RK e1e34a14a2 update docs for allow_all, allow_existing_users

2024-03-19 09:40:05 +01:00

14 KiB

Raw Blame History

(authenticators-reference)=

Authenticators

The {class}.Authenticator is the mechanism for authorizing users to use the Hub and single user notebook servers.

The default PAM Authenticator

JupyterHub ships with the default PAM-based Authenticator, for logging in with local user accounts via a username and password.

The OAuthenticator

Some login mechanisms, such as OAuth, don't map onto username and password authentication, and instead use tokens. When using these mechanisms, you can override the login handlers.

You can see an example implementation of an Authenticator that uses GitHub OAuth at OAuthenticator.

JupyterHub's OAuthenticator currently supports the following popular services:

Auth0
Bitbucket
CILogon
GitHub
GitLab
Globus
Google
MediaWiki
OpenShift

A generic implementation, which you can use for OAuth authentication with any provider, is also available.

The Dummy Authenticator

When testing, it may be helpful to use the {class}~.jupyterhub.auth.DummyAuthenticator. This allows for any username and password unless a global password has been set. Once set, any username will still be accepted but the correct password will need to be provided.

Additional Authenticators

Additional authenticators can be found on GitHub by searching for topic:jupyterhub topic:authenticator.

Technical Overview of Authentication

How the Base Authenticator works

The base authenticator uses simple username and password authentication.

The base Authenticator has one central method:

Authenticator.authenticate

{meth}.Authenticator.authenticate

This method is passed the Tornado RequestHandler and the POST data from JupyterHub's login form. Unless the login form has been customized, data will have two keys:

username
password

If authentication is successful the authenticate method must return either:

the username (non-empty str) of the authenticated user
or a dictionary with fields:
- name: the username
- admin: optional, a boolean indicating whether the user is an admin. In most cases it is better to use fine grained RBAC permissions instead of giving users full admin privileges.
- auth_state: optional, a dictionary of auth state that will be persisted
- groups: optional, a list of JupyterHub group memberships

Otherwise, it must return None.

Writing an Authenticator that looks up passwords in a dictionary requires only overriding this one method:

from traitlets import Dict
from jupyterhub.auth import Authenticator

class DictionaryAuthenticator(Authenticator):

    passwords = Dict(config=True,
        help="""dict of username:password for authentication"""
    )

    async def authenticate(self, handler, data):
        if self.passwords.get(data['username']) == data['password']:
            return data['username']

Normalize usernames

Since the Authenticator and Spawner both use the same username, sometimes you want to transform the name coming from the authentication service (e.g. turning email addresses into local system usernames) before adding them to the Hub service. Authenticators can define normalize_username, which takes a username. The default normalization is to cast names to lowercase

For simple mappings, a configurable dict Authenticator.username_map is used to turn one name into another:

c.Authenticator.username_map  = {
  'service-name': 'localname'
}

When using PAMAuthenticator, you can set c.PAMAuthenticator.pam_normalize_username = True, which will normalize usernames using PAM (basically round-tripping them: username to uid to username), which is useful in case you use some external service that allows multiple usernames mapping to the same user (such as ActiveDirectory, yes, this really happens). When pam_normalize_username is on, usernames are not normalized to lowercase.

Validate usernames

In most cases, there is a very limited set of acceptable usernames. Authenticators can define validate_username(username), which should return True for a valid username and False for an invalid one. The primary effect this has is improving error messages during user creation.

The default behavior is to use configurable Authenticator.username_pattern, which is a regular expression string for validation.

To only allow usernames that start with 'w':

c.Authenticator.username_pattern = r'w.*'

How to write a custom authenticator

You can use custom Authenticator subclasses to enable authentication via other mechanisms. One such example is using GitHub OAuth.

Because the username is passed from the Authenticator to the Spawner, a custom Authenticator and Spawner are often used together. For example, the Authenticator methods, {meth}.Authenticator.pre_spawn_start and {meth}.Authenticator.post_spawn_stop, are hooks that can be used to do auth-related startup (e.g. opening PAM sessions) and cleanup (e.g. closing PAM sessions).

Registering custom Authenticators via entry points

As of JupyterHub 1.0, custom authenticators can register themselves via the jupyterhub.authenticators entry point metadata. To do this, in your setup.py add:

setup(
  ...
  entry_points={
    'jupyterhub.authenticators': [
        'myservice = mypackage:MyAuthenticator',
    ],
  },
)

If you have added this metadata to your package, admins can select your authenticator with the configuration:

c.JupyterHub.authenticator_class = 'myservice'

instead of the full

c.JupyterHub.authenticator_class = 'mypackage:MyAuthenticator'

previously required. Additionally, configurable attributes for your authenticator will appear in jupyterhub help output and auto-generated configuration files via jupyterhub --generate-config.

(authenticator-allow)=

Allowing access

When dealing with logging in, there are generally two separate steps:

authentication: identifying who is logged in, and
authorization: deciding whether an authenticated user is logged in

{meth}Authenticator.authenticate is responsible for authenticating users. It is perfectly fine in the simplest cases for Authenticator.authenticate to be responsible for authentication and authorization, in which case authenticate may return None if the user is not authorized.

However, Authenticators also have have two methods {meth}~.Authenticator.check_allowed and {meth}~.Authenticator.check_blocked_users, which are called after successful authentication to further check if the user is allowed.

If check_blocked_users() returns False, authorization stops and the user is not allowed.

If check_allowed() returns True, authorization proceeds.

:::{versionadded} 5.0 {attr}Authenticator.allow_all and {attr}Authenticator.allow_existing_users are new in JupyterHub 5.0.

By default, allow_all is True when allowed_users is empty, and allow_existing_users is True when allowed_users is not empty. This is to ensure backward-compatibility, but subclasses are free to pick more restrictive defaults. :::

Overriding `check_allowed`

The base implementation of {meth}~.Authenticator.check_allowed checks:

if allow_all is True, return True
if username is in the allowed_users set, return True
else return False

If a custom Authenticator defines additional sources of allow configuration, such as membership in a group or other information, it should override check_allowed to account for this. allow_ configuration should generally be additive, i.e. if permission is granted by any allow configuration, a user should be authorized.

:::{note} For backward-compatibility, it is the responsibility of Authenticator.check_allowed() to check .allow_all. This is to avoid the backward-compatible default values from granting permissions unexpectedly. :::

If an Authenticator defines additional allow configuration, it must at least:

override check_allowed, and
override the default for allow_all

The default for allow_all in a custom authenticator should be one of False or a dynamic default matching something like if not any allow configuration specified. False is recommended for authenticators which source much larger pools of users than are typically allowed to access a Hub (e.g. generic OAuth providers like Google, GitHub, etc.).

For example, here is how PAMAuthenticator extends the base class to add allowed_groups:

from traitlets import default

@default("allow_all")
def _allow_all_default(self):
    if self.allowed_users or self.allowed_groups:
        # if any allow config is specified, default to False
        return False
    return True

def check_allowed(self, username, authentication=None):
    if self.allow_all:
        return True
    if self.check_allowed_groups(username, authentication):
        return True
    return super().check_allowed(username, authentication)

Important points to note:

overriding the default for allow_all is required to avoid allow_all being True when allowed_groups is specified, but allowed_users is not.
allow_all must be checked inside check_allowed
allowed_groups strictly expands who is authorized, it does not apply restrictions allowed_users. This is recommended for all allow_ configuration added by Authenticators.

Custom error messages

Any of these authentication and authorization methods may

from tornado import web

raise web.HTTPError(403, "informative message")

if you want to show a more informative login failure message rather than the generic one.

(authenticator-auth-state)=

Authentication state

JupyterHub 0.8 adds the ability to persist state related to authentication, such as auth-related tokens. If such state should be persisted, .authenticate() should return a dictionary of the form:

{
  'name': username,
  'auth_state': {
    'key': 'value',
  }
}

where username is the username that has been authenticated, and auth_state is any JSON-serializable dictionary.

Because auth_state may contain sensitive information, it is encrypted before being stored in the database. To store auth_state, two conditions must be met:

persisting auth state must be enabled explicitly via configuration
```
c.Authenticator.enable_auth_state = True
```
encryption must be enabled by the presence of JUPYTERHUB_CRYPT_KEY environment variable, which should be a hex-encoded 32-byte key. For example:
```
export JUPYTERHUB_CRYPT_KEY=$(openssl rand -hex 32)
```

JupyterHub uses Fernet to encrypt auth_state. To facilitate key-rotation, JUPYTERHUB_CRYPT_KEY may be a semicolon-separated list of encryption keys. If there are multiple keys present, the first key is always used to persist any new auth_state.

Using auth_state

Typically, if auth_state is persisted it is desirable to affect the Spawner environment in some way. This may mean defining environment variables, placing certificate in the user's home directory, etc. The {meth}Authenticator.pre_spawn_start method can be used to pass information from authenticator state to Spawner environment:

class MyAuthenticator(Authenticator):
    async def authenticate(self, handler, data=None):
        username = await identify_user(handler, data)
        upstream_token = await token_for_user(username)
        return {
            'name': username,
            'auth_state': {
                'upstream_token': upstream_token,
            },
        }

    async def pre_spawn_start(self, user, spawner):
        """Pass upstream_token to spawner via environment variable"""
        auth_state = await user.get_auth_state()
        if not auth_state:
            # auth_state not enabled
            return
        spawner.environment['UPSTREAM_TOKEN'] = auth_state['upstream_token']

Note that environment variable names and values are always strings, so passing multiple values means setting multiple environment variables or serializing more complex data into a single variable, e.g. as a JSON string.

auth state can also be used to configure the spawner via config without subclassing by setting c.Spawner.auth_state_hook. This function will be called with (spawner, auth_state), only when auth_state is defined.

For example: (for KubeSpawner)

def auth_state_hook(spawner, auth_state):
    spawner.volumes = auth_state['user_volumes']
    spawner.mounts = auth_state['user_mounts']

c.Spawner.auth_state_hook = auth_state_hook

(authenticator-groups)=

Authenticator-managed group membership

:::{versionadded} 2.2 :::

Some identity providers may have their own concept of group membership that you would like to preserve in JupyterHub. This is now possible with Authenticator.manage_groups.

You can set the config:

c.Authenticator.manage_groups = True

to enable this behavior. The default is False for Authenticators that ship with JupyterHub, but may be True for custom Authenticators. Check your Authenticator's documentation for manage_groups support.

If True, {meth}.Authenticator.authenticate and {meth}.Authenticator.refresh_user may include a field groups which is a list of group names the user should be a member of:

Membership will be added for any group in the list
Membership in any groups not in the list will be revoked
Any groups not already present in the database will be created
If None is returned, no changes are made to the user's group membership

If authenticator-managed groups are enabled, all group-management via the API is disabled.

pre_spawn_start and post_spawn_stop hooks

Authenticators use two hooks, {meth}.Authenticator.pre_spawn_start and {meth}.Authenticator.post_spawn_stop(user, spawner) to add pass additional state information between the authenticator and a spawner. These hooks are typically used auth-related startup, i.e. opening a PAM session, and auth-related cleanup, i.e. closing a PAM session.

JupyterHub as an OAuth provider

Beginning with version 0.8, JupyterHub is an OAuth provider.

14 KiB Raw Blame History