mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-17 06:52:59 +00:00
Add capacity planning doc
This commit is contained in:
246
docs/source/admin/capacity-planning.md
Normal file
246
docs/source/admin/capacity-planning.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Capacity planning
|
||||
|
||||
General capacity planning advice for JupyterHub is hard to give,
|
||||
because it depends almost entirely on what your users are doing,
|
||||
and what JupyterHub users do varies _wildly_ in terms of resource consumption.
|
||||
|
||||
**There is no single answer to "I have X users, what resources do I need?" or "How many users can I support with this machine?"**
|
||||
|
||||
Here are three _typical_ Jupyter use patterns that require vastly different resources:
|
||||
|
||||
- negligible resources because computation is mostly idle,
|
||||
e.g. students learning programming for the first time
|
||||
- very intense, sustained load, e.g. training machine learning models
|
||||
- _mostly_ idle, but needs a lot of resources for short periods of time
|
||||
(interactive research often looks like this)
|
||||
|
||||
But just because there's no single answer doesn't mean we can't help.
|
||||
So we have gathered here some useful information to help you make your decisions
|
||||
about what resources you need based on how your users work,
|
||||
including the relative invariants in terms of resources that JupyterHub itself needs.
|
||||
|
||||
## JupyterHub infrastructure
|
||||
|
||||
JupyterHub consists of a few components that are always running.
|
||||
These take up very little resources,
|
||||
especially relative to the resources consumed by users when you have more than a few.
|
||||
|
||||
As an example, an instance of mybinder.org (running JupyterHub 1.5.0),
|
||||
running with typically ~100-150 users has:
|
||||
|
||||
| Component | CPU (mean/peak) | Memory (mean/peak) |
|
||||
| --------- | --------------- | ------------------ |
|
||||
| Hub | 4% / 13% | (230 MB / 260 MB) |
|
||||
| Proxy | 6% / 13% | (47 MB / 65 MB) |
|
||||
|
||||
So it would be pretty generous to allocate ~25% of one CPU core
|
||||
and ~500MB of RAM to overall JupyterHub infrastructure.
|
||||
|
||||
The rest is going to be up to your users.
|
||||
Per-user overhead from JupyterHub is typically negligible
|
||||
up to at least a few hundred concurrent active users.
|
||||
|
||||

|
||||
|
||||
## Factors
|
||||
|
||||
### Static vs elastic resources
|
||||
|
||||
A big factor in planning resources is:
|
||||
**how much does it cost to change your mind?**
|
||||
If you are using a single shared machine with local storage,
|
||||
migrating to a new one because it turns out your users don't fit might be very costly,
|
||||
because you have to get a new machine, set it up, and maybe even migrate user data.
|
||||
|
||||
On the other hand, if you are using ephemeral resources,
|
||||
such as node pools in Kubernetes,
|
||||
changing resource types costs close to nothing
|
||||
because nodes can automatically be added or removed as needed.
|
||||
|
||||
Take that cost into account when you are picking how much memory or cpu to allocate to users.
|
||||
|
||||
Static resources (like [the-littlest-jupyterhub][]) provide for more **stable, predictable costs**,
|
||||
but elastic resources (like [zero-to-jupyterhub][]) tend to provide **lower overall costs**
|
||||
(especially when deployed with monitoring allowing cost optimizations over time),
|
||||
but which are **less predictable**.
|
||||
|
||||
[the-littlest-jupyterhub]: https://the-littlest-jupyterhub.readthedocs.io
|
||||
|
||||
[zero-to-jupyterhub]: https://zero-to-jupyterhub.readthedocs.io
|
||||
|
||||
(limits-requests)=
|
||||
|
||||
### Limit vs Request
|
||||
|
||||
Many scheduling tools like Kubernetes have two separate ways of allocating resources to users.
|
||||
A **Request** or **Reservation** describes how much resources are _set aside_ for each user.
|
||||
Often, this doesn't have any practical effect other than deciding when a given machine is considered 'full'.
|
||||
If you are using expandable resources like an autoscaling Kubernetes cluster,
|
||||
'requesting' more resources than fit on currently running nodes is when a new node is launched and added to the pool (a cluster **scale-up event**).
|
||||
If you are running on a single VM, this describes how many users you can run at the same time, full stop.
|
||||
|
||||
A **Limit**, on the other hand, actually enforces a limit to how much resources any given user can consume.
|
||||
We'll see more information on what happens when users try to exceed their limits [below](oversubscription).
|
||||
|
||||
In the strictest, safest case, you can have these two numbers be the same.
|
||||
That means that each user is _limited_ to fit within the resources allocated to it.
|
||||
This avoids **[oversubscription](oversubscription)** of resources (allowing use of more than you have available),
|
||||
at the expense (in a literal, this-costs-money sense) of reserving lots of usually-idle capacity.
|
||||
|
||||
But when deploying JupyterHub,
|
||||
you will likely find that a relatively small fraction of users use lots more resources than others,
|
||||
making oversubscription attractive (to a point).
|
||||
|
||||
Having a gap between the request and the limit means you can fit a number of _typical_ users on a node (based on the request),
|
||||
but still limit how much a runaway user can gobble up for themselves.
|
||||
|
||||
(oversubscription)=
|
||||
|
||||
### Oversubscribed CPU is okay, running out of memory is bad
|
||||
|
||||
An important consideration when assigning resources to users is:
|
||||
|
||||
> What happens when users need more than I've given them?
|
||||
|
||||
A good summary to keep in mind: **when tasks don't get enough CPU, things are slow.
|
||||
When they don't get enough memory, things are broken.**
|
||||
Which means it's very important that users have enough memory,
|
||||
but much less important that they always have exclusive access to all the CPU they can use.
|
||||
|
||||
This relates to [Limits and Requests](limits-requests),
|
||||
because these are the consequences of your limits and/or requests not matching what users actually try to use.
|
||||
|
||||
A table of mismatched resource allocation situations and their consequences:
|
||||
|
||||
| issue | consequence |
|
||||
| -------------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
||||
| Requests too high | Unnecessarily high cost and/or low capacity. |
|
||||
| CPU limit too low | Poor performance experienced by users |
|
||||
| CPU oversubscribed (too-low request + too-high limit) | Poor performance across the system; may crash, if severe |
|
||||
| Memory limit too low | Servers killed by Out-of-Memory Killer (OOM); lost work for users |
|
||||
| Memory oversubscribed (too-low request + too-high limit) | System memory exhaustion - all kinds of hangs and crashes and weird errors. Very bad. |
|
||||
|
||||
Note that the 'oversubscribed' problem case is where the request is lower than _typical_ usage,
|
||||
meaning that the total reserved resources isn't enough for the total _actual_ consumption.
|
||||
This doesn't mean that _all_ your users exceed the request,
|
||||
just that the _limit_ gives enough room for the _average_ user to exceed the request.
|
||||
|
||||
### Example case for oversubscribe memory
|
||||
|
||||
Take for example, this system and sampling of user behavior:
|
||||
|
||||
- System memory = 8G
|
||||
- memory request = 1G, limit = 3G
|
||||
- typical 'heavy' user: 2G
|
||||
- typical 'light' user: 0.5G
|
||||
|
||||
This will assign 8 users to those 8G of RAM (remember: only requests are used for deciding when a machine is 'full').
|
||||
As long as the total of 8 users _actual_ usage is under 8G, everything is fine.
|
||||
But the _limit_ allows a total of 24G to be used,
|
||||
which would be a mess if everyone used their full limit.
|
||||
But _not_ everyone uses the full limit, which is the point!
|
||||
|
||||
This pattern is fine if 1/8 of your users are 'heavy' because _typical_ usage will be ~0.7G,
|
||||
and your total usage will be ~5G (1 × 2 + 7 × 0.5 = 5.5).
|
||||
|
||||
But if _50%_ of your users are 'heavy' you have a problem because that means your users will be trying to use 10G (4 × 2 + 4 × 0.5 = 10),
|
||||
which you don't have.
|
||||
|
||||
You can make guesses at these numbers, but the only _real_ way to get them is to measure (more [below](measuring)).
|
||||
|
||||
### CPU:memory ratio
|
||||
|
||||
Most of the time, you'll find that only one resource is the limiting factor for your users.
|
||||
Most often it's memory, but for certain tasks, it could be CPU (or even GPUs).
|
||||
|
||||
Many cloud deployments have just one or a few fixed ratios of cpu to memory
|
||||
(e.g. 'general purpose', 'high memory', and 'high cpu').
|
||||
Setting your secondary resource allocation according to this ratio
|
||||
after selecting the more important limit results in a balanced resource allocation.
|
||||
|
||||
For instance, some of Google Cloud's ratios are:
|
||||
|
||||
| node type | GB RAM / CPU core |
|
||||
| ----------- | ----------------- |
|
||||
| n2-highmem | 8 |
|
||||
| n2-standard | 4 |
|
||||
| n2-highcpu | 1 |
|
||||
|
||||
### Idleness
|
||||
|
||||
Jupyter being an interactive tool means people tend to spend a lot more time reading and thinking than actually running resource-intensive code.
|
||||
This significantly affects how much _cpu_ resources a typical active user needs,
|
||||
but often does not significantly affect the _memory_.
|
||||
|
||||
Ways to think about this:
|
||||
|
||||
- More idle users means unused CPU.
|
||||
This generally means setting your CPU _limit_ higher than your CPU _request_.
|
||||
- What do your users do when they _are_ running code?
|
||||
Is it typically single-threaded local computation in a notebook?
|
||||
If so, there's little reason to set a limit higher than 1 CPU core.
|
||||
- Do typical computations take a long time, or just a few seconds?
|
||||
Longer typical computations means it's more likely for users to be trying to use the CPU at the same moment,
|
||||
suggesting a higher _request_.
|
||||
- Even with idle users, parallel computation adds up quickly - one user fully loading 4 cores and 3 using almost nothing still averages to more than a full CPU core per user.
|
||||
- Long-running intense computations suggest higher requests.
|
||||
|
||||
Again, using mybinder.org as an example—we run around 100 users on 8-core nodes,
|
||||
and still see fairly _low_ overall CPU usage on each user node.
|
||||
The limit here is actually Kubernetes' pods per node, not memory _or_ CPU.
|
||||
This is likely a extreme case, as many Binder users come from clicking links on webpages
|
||||
without any actual intention of running code.
|
||||
|
||||

|
||||
|
||||
## More tips
|
||||
|
||||
### Start strict and generous, then measure
|
||||
|
||||
A good tip, in general, is to give your users as much resources as you can afford that you think they _might_ use.
|
||||
Then, use resource usage metrics like prometheus to analyze what your users _actually_ need,
|
||||
and tune accordingly.
|
||||
Remember: **Limits affect your user experience and stability. Requests mostly affect your costs**.
|
||||
|
||||
For example, a sensible starting point (lacking any other information) might be:
|
||||
|
||||
```yaml
|
||||
request:
|
||||
cpu: 0.5
|
||||
mem: 2G
|
||||
limit:
|
||||
cpu: 1
|
||||
mem: 2G
|
||||
```
|
||||
|
||||
(more memory if significant computations are likely - machine learning models, data analysis, etc.)
|
||||
|
||||
Some actions
|
||||
|
||||
- If you see out-of-memory killer events, increase the limit (or talk to your users!)
|
||||
- If you see typical memory well below your limit, reduce the request (but not the limit)
|
||||
- If _nobody_ uses that much memory, reduce your limit
|
||||
- If CPU is your limiting scheduling factor and your CPUs are mostly idle,
|
||||
reduce the cpu request (maybe even to 0!).
|
||||
- If CPU usage continues to be low, increase the limit to 2 or 4 to allow bursts of parallel execution.
|
||||
|
||||
(measuring)=
|
||||
|
||||
### Measuring user resource consumption
|
||||
|
||||
It is _highly_ recommended to deploy monitoring services such as [Prometheus][]
|
||||
and [Grafana][] to get a view of your users' resource usage.
|
||||
This is the only way to truly know what your users need.
|
||||
|
||||
JupyterHub has some experimental [grafana dashboards][] you can use as a starting point,
|
||||
to keep an eye on your resource usage.
|
||||
Here are some sample charts from (again from mybinder.org),
|
||||
showing >90% of users using less than 10% CPU and 200MB,
|
||||
but a few outliers near the limit of 1 CPU and 2GB of RAM.
|
||||
This is the kind of information you can use to tune your requests and limits.
|
||||
|
||||

|
||||
|
||||
[prometheus]: https://prometheus.io
|
||||
[grafana]: https://grafana.com
|
||||
[grafana dashboards]: https://github.com/jupyterhub/grafana-dashboards
|
BIN
docs/source/images/mybinder-hub-components-cpu-memory.png
Normal file
BIN
docs/source/images/mybinder-hub-components-cpu-memory.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1017 KiB |
BIN
docs/source/images/mybinder-load5.png
Normal file
BIN
docs/source/images/mybinder-load5.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 607 KiB |
BIN
docs/source/images/mybinder-user-resources.png
Normal file
BIN
docs/source/images/mybinder-user-resources.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.8 MiB |
@@ -9,6 +9,7 @@ well as other information relevant to running your own JupyterHub over time.
|
||||
:maxdepth: 2
|
||||
|
||||
troubleshooting
|
||||
admin/capacity-planning
|
||||
admin/upgrading
|
||||
admin/log-messages
|
||||
changelog
|
||||
|
Reference in New Issue
Block a user