Add capacity planning doc

2025-10-17 06:52:59 +00:00 · 2022-08-09 15:51:41 +02:00
parent 635f63c1cd
commit 1e9614b218
5 changed files with 247 additions and 0 deletions
--- a/docs/source/admin/capacity-planning.md
+++ b/docs/source/admin/capacity-planning.md
@@ -0,0 +1,246 @@
+# Capacity planning
+
+General capacity planning advice for JupyterHub is hard to give,
+because it depends almost entirely on what your users are doing,
+and what JupyterHub users do varies _wildly_ in terms of resource consumption.
+
+**There is no single answer to "I have X users, what resources do I need?" or "How many users can I support with this machine?"**
+
+Here are three _typical_ Jupyter use patterns that require vastly different resources:
+
+- negligible resources because computation is mostly idle,
+  e.g. students learning programming for the first time
+- very intense, sustained load, e.g. training machine learning models
+- _mostly_ idle, but needs a lot of resources for short periods of time
+  (interactive research often looks like this)
+
+But just because there's no single answer doesn't mean we can't help.
+So we have gathered here some useful information to help you make your decisions
+about what resources you need based on how your users work,
+including the relative invariants in terms of resources that JupyterHub itself needs.
+
+## JupyterHub infrastructure
+
+JupyterHub consists of a few components that are always running.
+These take up very little resources,
+especially relative to the resources consumed by users when you have more than a few.
+
+As an example, an instance of mybinder.org (running JupyterHub 1.5.0),
+running with typically ~100-150 users has:
+
+| Component | CPU (mean/peak) | Memory (mean/peak) |
+| --------- | --------------- | ------------------ |
+| Hub       | 4% / 13%        | (230 MB / 260 MB)  |
+| Proxy     | 6% / 13%        | (47 MB / 65 MB)    |
+
+So it would be pretty generous to allocate ~25% of one CPU core
+and ~500MB of RAM to overall JupyterHub infrastructure.
+
+The rest is going to be up to your users.
+Per-user overhead from JupyterHub is typically negligible
+up to at least a few hundred concurrent active users.
+
+![Hub component resource usage for mybinder.org](../images/mybinder-hub-components-cpu-memory.png)
+
+## Factors
+
+### Static vs elastic resources
+
+A big factor in planning resources is:
+**how much does it cost to change your mind?**
+If you are using a single shared machine with local storage,
+migrating to a new one because it turns out your users don't fit might be very costly,
+because you have to get a new machine, set it up, and maybe even migrate user data.
+
+On the other hand, if you are using ephemeral resources,
+such as node pools in Kubernetes,
+changing resource types costs close to nothing
+because nodes can automatically be added or removed as needed.
+
+Take that cost into account when you are picking how much memory or cpu to allocate to users.
+
+Static resources (like [the-littlest-jupyterhub][]) provide for more **stable, predictable costs**,
+but elastic resources (like [zero-to-jupyterhub][]) tend to provide **lower overall costs**
+(especially when deployed with monitoring allowing cost optimizations over time),
+but which are **less predictable**.
+
+[the-littlest-jupyterhub]: https://the-littlest-jupyterhub.readthedocs.io
+
+[zero-to-jupyterhub]: https://zero-to-jupyterhub.readthedocs.io
+
+(limits-requests)=
+
+### Limit vs Request
+
+Many scheduling tools like Kubernetes have two separate ways of allocating resources to users.
+A **Request** or **Reservation** describes how much resources are _set aside_ for each user.
+Often, this doesn't have any practical effect other than deciding when a given machine is considered 'full'.
+If you are using expandable resources like an autoscaling Kubernetes cluster,
+'requesting' more resources than fit on currently running nodes is when a new node is launched and added to the pool (a cluster **scale-up event**).
+If you are running on a single VM, this describes how many users you can run at the same time, full stop.
+
+A **Limit**, on the other hand, actually enforces a limit to how much resources any given user can consume.
+We'll see more information on what happens when users try to exceed their limits [below](oversubscription).
+
+In the strictest, safest case, you can have these two numbers be the same.
+That means that each user is _limited_ to fit within the resources allocated to it.
+This avoids **[oversubscription](oversubscription)** of resources (allowing use of more than you have available),
+at the expense (in a literal, this-costs-money sense) of reserving lots of usually-idle capacity.
+
+But when deploying JupyterHub,
+you will likely find that a relatively small fraction of users use lots more resources than others,
+making oversubscription attractive (to a point).
+
+Having a gap between the request and the limit means you can fit a number of _typical_ users on a node (based on the request),
+but still limit how much a runaway user can gobble up for themselves.
+
+(oversubscription)=
+
+### Oversubscribed CPU is okay, running out of memory is bad
+
+An important consideration when assigning resources to users is:
+
+> What happens when users need more than I've given them?
+
+A good summary to keep in mind: **when tasks don't get enough CPU, things are slow.
+When they don't get enough memory, things are broken.**
+Which means it's very important that users have enough memory,
+but much less important that they always have exclusive access to all the CPU they can use.
+
+This relates to [Limits and Requests](limits-requests),
+because these are the consequences of your limits and/or requests not matching what users actually try to use.
+
+A table of mismatched resource allocation situations and their consequences:
+
+| issue                                                    | consequence                                                                           |
+| -------------------------------------------------------- | ------------------------------------------------------------------------------------- |
+| Requests too high                                        | Unnecessarily high cost and/or low capacity.                                          |
+| CPU limit too low                                        | Poor performance experienced by users                                                 |
+| CPU oversubscribed (too-low request + too-high limit)    | Poor performance across the system; may crash, if severe                              |
+| Memory limit too low                                     | Servers killed by Out-of-Memory Killer (OOM); lost work for users                     |
+| Memory oversubscribed (too-low request + too-high limit) | System memory exhaustion - all kinds of hangs and crashes and weird errors. Very bad. |
+
+Note that the 'oversubscribed' problem case is where the request is lower than _typical_ usage,
+meaning that the total reserved resources isn't enough for the total _actual_ consumption.
+This doesn't mean that _all_ your users exceed the request,
+just that the _limit_ gives enough room for the _average_ user to exceed the request.
+
+### Example case for oversubscribe memory
+
+Take for example, this system and sampling of user behavior:
+
+- System memory = 8G
+- memory request = 1G, limit = 3G
+- typical 'heavy' user: 2G
+- typical 'light' user: 0.5G
+
+This will assign 8 users to those 8G of RAM (remember: only requests are used for deciding when a machine is 'full').
+As long as the total of 8 users _actual_ usage is under 8G, everything is fine.
+But the _limit_ allows a total of 24G to be used,
+which would be a mess if everyone used their full limit.
+But _not_ everyone uses the full limit, which is the point!
+
+This pattern is fine if 1/8 of your users are 'heavy' because _typical_ usage will be ~0.7G,
+and your total usage will be ~5G (1 × 2 + 7 × 0.5 = 5.5).
+
+But if _50%_ of your users are 'heavy' you have a problem because that means your users will be trying to use 10G (4 × 2 + 4 × 0.5 = 10),
+which you don't have.
+
+You can make guesses at these numbers, but the only _real_ way to get them is to measure (more [below](measuring)).
+
+### CPU:memory ratio
+
+Most of the time, you'll find that only one resource is the limiting factor for your users.
+Most often it's memory, but for certain tasks, it could be CPU (or even GPUs).
+
+Many cloud deployments have just one or a few fixed ratios of cpu to memory
+(e.g. 'general purpose', 'high memory', and 'high cpu').
+Setting your secondary resource allocation according to this ratio
+after selecting the more important limit results in a balanced resource allocation.
+
+For instance, some of Google Cloud's ratios are:
+
+| node type   | GB RAM / CPU core |
+| ----------- | ----------------- |
+| n2-highmem  | 8                 |
+| n2-standard | 4                 |
+| n2-highcpu  | 1                 |
+
+### Idleness
+
+Jupyter being an interactive tool means people tend to spend a lot more time reading and thinking than actually running resource-intensive code.
+This significantly affects how much _cpu_ resources a typical active user needs,
+but often does not significantly affect the _memory_.
+
+Ways to think about this:
+
+- More idle users means unused CPU.
+  This generally means setting your CPU _limit_ higher than your CPU _request_.
+- What do your users do when they _are_ running code?
+  Is it typically single-threaded local computation in a notebook?
+  If so, there's little reason to set a limit higher than 1 CPU core.
+- Do typical computations take a long time, or just a few seconds?
+  Longer typical computations means it's more likely for users to be trying to use the CPU at the same moment,
+  suggesting a higher _request_.
+- Even with idle users, parallel computation adds up quickly - one user fully loading 4 cores and 3 using almost nothing still averages to more than a full CPU core per user.
+- Long-running intense computations suggest higher requests.
+
+Again, using mybinder.org as an example—we run around 100 users on 8-core nodes,
+and still see fairly _low_ overall CPU usage on each user node.
+The limit here is actually Kubernetes' pods per node, not memory _or_ CPU.
+This is likely a extreme case, as many Binder users come from clicking links on webpages
+without any actual intention of running code.
+
+![mybinder.org node CPU usage is low with 50-150 users sharing just 8 cores](../images/mybinder-load5.png)
+
+## More tips
+
+### Start strict and generous, then measure
+
+A good tip, in general, is to give your users as much resources as you can afford that you think they _might_ use.
+Then, use resource usage metrics like prometheus to analyze what your users _actually_ need,
+and tune accordingly.
+Remember: **Limits affect your user experience and stability. Requests mostly affect your costs**.
+
+For example, a sensible starting point (lacking any other information) might be:
+
+```yaml
+request:
+  cpu: 0.5
+  mem: 2G
+limit:
+  cpu: 1
+  mem: 2G
+```
+
+(more memory if significant computations are likely - machine learning models, data analysis, etc.)
+
+Some actions
+
+- If you see out-of-memory killer events, increase the limit (or talk to your users!)
+- If you see typical memory well below your limit, reduce the request (but not the limit)
+- If _nobody_ uses that much memory, reduce your limit
+- If CPU is your limiting scheduling factor and your CPUs are mostly idle,
+  reduce the cpu request (maybe even to 0!).
+- If CPU usage continues to be low, increase the limit to 2 or 4 to allow bursts of parallel execution.
+
+(measuring)=
+
+### Measuring user resource consumption
+
+It is _highly_ recommended to deploy monitoring services such as [Prometheus][]
+and [Grafana][] to get a view of your users' resource usage.
+This is the only way to truly know what your users need.
+
+JupyterHub has some experimental [grafana dashboards][] you can use as a starting point,
+to keep an eye on your resource usage.
+Here are some sample charts from (again from mybinder.org),
+showing >90% of users using less than 10% CPU and 200MB,
+but a few outliers near the limit of 1 CPU and 2GB of RAM.
+This is the kind of information you can use to tune your requests and limits.
+
+![Snapshot from JupyterHub's Grafana dashboards on mybinder.org](../images/mybinder-user-resources.png)
+
+[prometheus]: https://prometheus.io
+[grafana]: https://grafana.com
+[grafana dashboards]: https://github.com/jupyterhub/grafana-dashboards
--- a/docs/source/images/mybinder-hub-components-cpu-memory.png
+++ b/docs/source/images/mybinder-hub-components-cpu-memory.png
--- a/docs/source/images/mybinder-load5.png
+++ b/docs/source/images/mybinder-load5.png
--- a/docs/source/images/mybinder-user-resources.png
+++ b/docs/source/images/mybinder-user-resources.png
--- a/docs/source/index-admin.rst
+++ b/docs/source/index-admin.rst
@@ -9,6 +9,7 @@ well as other information relevant to running your own JupyterHub over time.
   :maxdepth: 2

   troubleshooting
+   admin/capacity-planning
   admin/upgrading
   admin/log-messages
   changelog