Merge pull request #4399 from minrk/more-db-doc

add some more detail and examples to database doc
2025-10-15 14:03:02 +00:00 · 2023-03-22 14:19:59 +01:00
parent 64a253dbef 1430e02fa8
commit c6b4577c0a
3 changed files with 39 additions and 11 deletions
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -114,7 +114,7 @@ jobs:
          fi
          if [ "${{ matrix.db }}" == "mysql" ]; then
              echo "MYSQL_HOST=127.0.0.1" >> $GITHUB_ENV
-              echo "JUPYTERHUB_TEST_DB_URL=mysql+mysqlconnector://root@127.0.0.1:3306/jupyterhub" >> $GITHUB_ENV
+              echo "JUPYTERHUB_TEST_DB_URL=mysql+mysqldb://root@127.0.0.1:3306/jupyterhub" >> $GITHUB_ENV
          fi
          if [ "${{ matrix.ssl }}" == "ssl" ]; then
              echo "SSL_ENABLED=1" >> $GITHUB_ENV
@@ -175,7 +175,7 @@ jobs:
              pip install "jupyter_server==${{ matrix.jupyter_server }}"
          fi
          if [ "${{ matrix.db }}" == "mysql" ]; then
-              pip install mysql-connector-python
+              pip install mysqlclient
          fi
          if [ "${{ matrix.db }}" == "postgres" ]; then
              pip install psycopg2-binary
--- a/docs/source/explanation/database.md
+++ b/docs/source/explanation/database.md
@@ -95,8 +95,14 @@ The Hub and its database are not involved in most requests to single-user server

 JupyterHub supports a variety of database backends via [SQLAlchemy][].
 The default is sqlite, which works great for many cases, but you should be able to use many backends supported by SQLAlchemy.
-Usually, this will mean PostgreSQL or MySQL, both of which are well tested with JupyterHub.
+Usually, this will mean PostgreSQL or MySQL, both of which are officially supported and well tested with JupyterHub, but others may work as well.
+See [SQLAlchemy's docs][sqlalchemy-dialect] for how to connect to different database backends.
+Doing so generally involves:

+1. installing a Python package that provides a client implementation, and
+2. setting [](JupyterHub.db_url) to connect to your database with the specified implementation
+
+[sqlalchemy-dialect]: https://docs.sqlalchemy.org/en/20/dialects/
 [sqlalchemy]: https://www.sqlalchemy.org

 ### Default backend: SQLite
@@ -109,14 +115,16 @@ For production systems, SQLite has some disadvantages when used with JupyterHub:

 - `upgrade-db` may not always work, and you may need to start with a fresh database
 - `downgrade-db` **will not** work if you want to rollback to an earlier
-  version, so backup the `jupyterhub.sqlite` file before upgrading
+  version, so backup the `jupyterhub.sqlite` file before upgrading (JupyterHub automatically creates a date-stamped backup file when upgrading sqlite)

 The sqlite documentation provides a helpful page about [when to use SQLite and
 where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).

 ### Picking your database backend (PostgreSQL, MySQL)

-When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement.
+When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement, which is used in some database upgrade steps.
+
+In general, you select your database backend with [](JupyterHub.db_url), and can further configure it (usually not necessary) with [](JupyterHub.db_kwargs).

 ## Notes and Tips

@@ -132,14 +140,25 @@ multiple processes which might try to access the file at the same time.
 ### PostgreSQL

 We recommend using PostgreSQL for production if you are unsure whether to use
-MySQL or PostgreSQL or if you do not have a strong preference. There is
-additional configuration required for MySQL that is not needed for PostgreSQL.
+MySQL or PostgreSQL or if you do not have a strong preference.
+There is additional configuration required for MySQL that is not needed for PostgreSQL.
+
+For example, to connect to a postgres database with psycopg2:
+
+1. install psycopg2: `pip instal psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
+2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
+3. configure [](JupyterHub.db_url):
+
+   ```python
+   c.JupyterHub.db_url = "postgres+psycopg2://my-postgres-server:5432/my-db-name"
+   ```
+
+[psycopg2-binary]: https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary

 ### MySQL / MariaDB

- You should use the `pymysql` sqlalchemy provider (the other one, MySQLdb,
-  isn't available for py3).
- You also need to set `pool_recycle` to some value (typically 60 - 300)
+- You should probably use the `pymysql` or `mysqlclient` sqlalchemy provider, or another backend [recommended by sqlalchemy](https://docs.sqlalchemy.org/en/20/dialects/mysql.html#dialect-mysql)
+- You also need to set `pool_recycle` to some value (typically 60 - 300, JupyterHub will default to 60)
  which depends on your MySQL setup. This is necessary since MySQL kills
  connections serverside if they've been idle for a while, and the connection
  from the hub will be idle for longer than most connections. This behavior
@@ -153,3 +172,12 @@ additional configuration required for MySQL that is not needed for PostgreSQL.
  correctly. Later versions of MariaDB and MySQL should set these values by
  default, as well as have a default `DYNAMIC` `row_format` and pose no trouble
  to users.
+
+For example, to connect to a mysql database with mysqlclient:
+
+1. install mysqlclient: `pip install mysqlclient`
+2. configure [](JupyterHub.db_url):
+
+   ```python
+   c.JupyterHub.db_url = "mysql+mysqldb://myuser:mypassword@my-sql-server:3306/my-db-name"
+   ```
--- a/jupyterhub/tests/test_db.py
+++ b/jupyterhub/tests/test_db.py
@@ -35,7 +35,7 @@ def generate_old_db(env_dir, hub_version, db_url):
        pkgs.append('sqlalchemy<2')

    if 'mysql' in db_url:
-        pkgs.append('mysql-connector-python')
+        pkgs.append('mysqlclient')
    elif 'postgres' in db_url:
        pkgs.append('psycopg2-binary')
    check_call([env_pip, 'install'] + pkgs)