mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-07 18:14:10 +00:00
309 lines
15 KiB
Markdown
309 lines
15 KiB
Markdown
# Capacity planning
|
||
|
||
General capacity planning advice for JupyterHub is hard to give,
|
||
because it depends almost entirely on what your users are doing,
|
||
and what JupyterHub users do varies _wildly_ in terms of resource consumption.
|
||
|
||
**There is no single answer to "I have X users, what resources do I need?" or "How many users can I support with this machine?"**
|
||
|
||
Here are three _typical_ Jupyter use patterns that require vastly different resources:
|
||
|
||
- **Learning**: negligible resources because computation is mostly idle,
|
||
e.g. students learning programming for the first time
|
||
- **Production code**: very intense, sustained load, e.g. training machine learning models
|
||
- **Bursting**: _mostly_ idle, but needs a lot of resources for short periods of time
|
||
(interactive research often looks like this)
|
||
|
||
But just because there's no single answer doesn't mean we can't help.
|
||
So we have gathered here some useful information to help you make your decisions
|
||
about what resources you need based on how your users work,
|
||
including the relative invariants in terms of resources that JupyterHub itself needs.
|
||
|
||
## JupyterHub infrastructure
|
||
|
||
JupyterHub consists of a few components that are always running.
|
||
These take up very little resources,
|
||
especially relative to the resources consumed by users when you have more than a few.
|
||
|
||
As an example, an instance of mybinder.org (running JupyterHub 1.5.0),
|
||
running with typically ~100-150 users has:
|
||
|
||
| Component | CPU (mean/peak) | Memory (mean/peak) |
|
||
| --------- | --------------- | ------------------ |
|
||
| Hub | 4% / 13% | (230 MB / 260 MB) |
|
||
| Proxy | 6% / 13% | (47 MB / 65 MB) |
|
||
|
||
So it would be pretty generous to allocate ~25% of one CPU core
|
||
and ~500MB of RAM to overall JupyterHub infrastructure.
|
||
|
||
The rest is going to be up to your users.
|
||
Per-user overhead from JupyterHub is typically negligible
|
||
up to at least a few hundred concurrent active users.
|
||
|
||
```{figure} /images/mybinder-hub-components-cpu-memory.png
|
||
JupyterHub component resource usage for mybinder.org.
|
||
```
|
||
|
||
## Factors to consider
|
||
|
||
### Static vs elastic resources
|
||
|
||
A big factor in planning resources is:
|
||
**how much does it cost to change your mind?**
|
||
If you are using a single shared machine with local storage,
|
||
migrating to a new one because it turns out your users don't fit might be very costly.
|
||
You will have to get a new machine, set it up, and maybe even migrate user data.
|
||
|
||
On the other hand, if you are using ephemeral resources,
|
||
such as node pools in Kubernetes,
|
||
changing resource types costs close to nothing
|
||
because nodes can automatically be added or removed as needed.
|
||
|
||
Take that cost into account when you are picking how much memory or cpu to allocate to users.
|
||
|
||
Static resources (like [the-littlest-jupyterhub][]) provide for more **stable, predictable costs**,
|
||
but elastic resources (like [zero-to-jupyterhub][]) tend to provide **lower overall costs**
|
||
(especially when deployed with monitoring allowing cost optimizations over time),
|
||
but which are **less predictable**.
|
||
|
||
[the-littlest-jupyterhub]: https://the-littlest-jupyterhub.readthedocs.io
|
||
|
||
[zero-to-jupyterhub]: https://z2jh.jupyter.org
|
||
|
||
(limits-requests)=
|
||
|
||
### Limit vs Request for resources
|
||
|
||
Many scheduling tools like Kubernetes have two separate ways of allocating resources to users.
|
||
A **Request** or **Reservation** describes how much resources are _set aside_ for each user.
|
||
Often, this doesn't have any practical effect other than deciding when a given machine is considered 'full'.
|
||
If you are using expandable resources like an autoscaling Kubernetes cluster,
|
||
a new node must be launched and added to the pool if you 'request' more resources than fit on currently running nodes (a cluster **scale-up event**).
|
||
If you are running on a single VM, this describes how many users you can run at the same time, full stop.
|
||
|
||
A **Limit**, on the other hand, enforces a limit to how much resources any given user can consume.
|
||
For more information on what happens when users try to exceed their limits, see [](oversubscription).
|
||
|
||
In the strictest, safest case, you can have these two numbers be the same.
|
||
That means that each user is _limited_ to fit within the resources allocated to it.
|
||
This avoids **[oversubscription](oversubscription)** of resources (allowing use of more than you have available),
|
||
at the expense (in a literal, this-costs-money sense) of reserving lots of usually-idle capacity.
|
||
|
||
However, you often find that a small fraction of users use more resources than others.
|
||
In this case you may give users limits that _go beyond the amount of resources requested_.
|
||
This is called **oversubscribing** the resources available to users.
|
||
|
||
Having a gap between the request and the limit means you can fit a number of _typical_ users on a node (based on the request),
|
||
but still limit how much a runaway user can gobble up for themselves.
|
||
|
||
(oversubscription)=
|
||
|
||
### Oversubscribed CPU is okay, running out of memory is bad
|
||
|
||
An important consideration when assigning resources to users is: **What happens when users need more than I've given them?**
|
||
|
||
A good summary to keep in mind:
|
||
|
||
> When tasks don't get enough CPU, things are slow.
|
||
> When they don't get enough memory, things are broken.
|
||
|
||
This means it's **very important that users have enough memory**,
|
||
but much less important that they always have exclusive access to all the CPU they can use.
|
||
|
||
This relates to [Limits and Requests](limits-requests),
|
||
because these are the consequences of your limits and/or requests not matching what users actually try to use.
|
||
|
||
A table of mismatched resource allocation situations and their consequences:
|
||
|
||
| issue | consequence |
|
||
| -------------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
||
| Requests too high | Unnecessarily high cost and/or low capacity. |
|
||
| CPU limit too low | Poor performance experienced by users |
|
||
| CPU oversubscribed (too-low request + too-high limit) | Poor performance across the system; may crash, if severe |
|
||
| Memory limit too low | Servers killed by Out-of-Memory Killer (OOM); lost work for users |
|
||
| Memory oversubscribed (too-low request + too-high limit) | System memory exhaustion - all kinds of hangs and crashes and weird errors. Very bad. |
|
||
|
||
Note that the 'oversubscribed' problem case is where the request is lower than _typical_ usage,
|
||
meaning that the total reserved resources isn't enough for the total _actual_ consumption.
|
||
This doesn't mean that _all_ your users exceed the request,
|
||
just that the _limit_ gives enough room for the _average_ user to exceed the request.
|
||
|
||
All of these considerations are important _per node_.
|
||
Larger nodes means more users per node, and therefore more users to average over.
|
||
It also means more chances for multiple outliers on the same node.
|
||
|
||
### Example case for oversubscribing memory
|
||
|
||
Take for example, this system and sampling of user behavior:
|
||
|
||
- System memory = 8G
|
||
- memory request = 1G, limit = 3G
|
||
- typical 'heavy' user: 2G
|
||
- typical 'light' user: 0.5G
|
||
|
||
This will assign 8 users to those 8G of RAM (remember: only requests are used for deciding when a machine is 'full').
|
||
As long as the total of 8 users _actual_ usage is under 8G, everything is fine.
|
||
But the _limit_ allows a total of 24G to be used,
|
||
which would be a mess if everyone used their full limit.
|
||
But _not_ everyone uses the full limit, which is the point!
|
||
|
||
This pattern is fine if 1/8 of your users are 'heavy' because _typical_ usage will be ~0.7G,
|
||
and your total usage will be ~5G (`1 × 2 + 7 × 0.5 = 5.5`).
|
||
|
||
But if _50%_ of your users are 'heavy' you have a problem because that means your users will be trying to use 10G (`4 × 2 + 4 × 0.5 = 10`),
|
||
which you don't have.
|
||
|
||
You can make guesses at these numbers, but the only _real_ way to get them is to measure (see [](measuring)).
|
||
|
||
### CPU:memory ratio
|
||
|
||
Most of the time, you'll find that only one resource is the limiting factor for your users.
|
||
Most often it's memory, but for certain tasks, it could be CPU (or even GPUs).
|
||
|
||
Many cloud deployments have just one or a few fixed ratios of cpu to memory
|
||
(e.g. 'general purpose', 'high memory', and 'high cpu').
|
||
Setting your secondary resource allocation according to this ratio
|
||
after selecting the more important limit results in a balanced resource allocation.
|
||
|
||
For instance, some of Google Cloud's ratios are:
|
||
|
||
| node type | GB RAM / CPU core |
|
||
| ----------- | ----------------- |
|
||
| n2-highmem | 8 |
|
||
| n2-standard | 4 |
|
||
| n2-highcpu | 1 |
|
||
|
||
(idleness)=
|
||
|
||
### Idleness
|
||
|
||
Jupyter being an interactive tool means people tend to spend a lot more time reading and thinking than actually running resource-intensive code.
|
||
This significantly affects how much _cpu_ resources a typical active user needs,
|
||
but often does not significantly affect the _memory_.
|
||
|
||
Ways to think about this:
|
||
|
||
- More idle users means unused CPU.
|
||
This generally means setting your CPU _limit_ higher than your CPU _request_.
|
||
- What do your users do when they _are_ running code?
|
||
Is it typically single-threaded local computation in a notebook?
|
||
If so, there's little reason to set a limit higher than 1 CPU core.
|
||
- Do typical computations take a long time, or just a few seconds?
|
||
Longer typical computations means it's more likely for users to be trying to use the CPU at the same moment,
|
||
suggesting a higher _request_.
|
||
- Even with idle users, parallel computation adds up quickly - one user fully loading 4 cores and 3 using almost nothing still averages to more than a full CPU core per user.
|
||
- Long-running intense computations suggest higher requests.
|
||
|
||
Again, using mybinder.org as an example—we run around 100 users on 8-core nodes,
|
||
and still see fairly _low_ overall CPU usage on each user node.
|
||
The limit here is actually Kubernetes' pods per node, not memory _or_ CPU.
|
||
This is likely a extreme case, as many Binder users come from clicking links on webpages
|
||
without any actual intention of running code.
|
||
|
||
```{figure} /images/mybinder-load5.png
|
||
mybinder.org node CPU usage is low with 50-150 users sharing just 8 cores
|
||
```
|
||
|
||
### Concurrent users and culling idle servers
|
||
|
||
Related to [][idleness], all of these resource consumptions and limits are calculated based on **concurrently active users**,
|
||
not total users.
|
||
You might have 10,000 users of your JupyterHub deployment, but only 100 of them running at any given time.
|
||
That 100 is the main number you need to use for your capacity planning.
|
||
JupyterHub costs scale very little based on the number of _total_ users,
|
||
up to a point.
|
||
|
||
There are two important definitions for **active user**:
|
||
|
||
- Are they _actually_ there (i.e. a human interacting with Jupyter, or running code that might be )
|
||
- Is their server running (this is where resource reservations and limits are actually applied)
|
||
|
||
Connecting those two definitions (how long are servers running if their humans aren't using them) is an important area of deployment configuration, usually implemented via the [JupyterHub idle culler service][idle-culler].
|
||
|
||
[idle-culler]: https://github.com/jupyterhub/jupyterhub-idle-culler
|
||
|
||
There are a lot of considerations when it comes to culling idle users that will depend:
|
||
|
||
- How much does it save me to shut down user servers? (e.g. keeping an elastic cluster small, or keeping a fixed-size deployment available to active users)
|
||
- How much does it cost my users to have their servers shut down? (e.g. lost work if shutdown prematurely)
|
||
- How easy do I want it to be for users to keep their servers running? (e.g. Do they want to run unattended simulations overnight? Do you want them to?)
|
||
|
||
Like many other things in this guide, there are many correct answers leading to different configuration choices.
|
||
For more detail on culling configuration and considerations, consult the [JupyterHub idle culler documentation][idle-culler].
|
||
|
||
## More tips
|
||
|
||
### Start strict and generous, then measure
|
||
|
||
A good tip, in general, is to give your users as much resources as you can afford that you think they _might_ use.
|
||
Then, use resource usage metrics like prometheus to analyze what your users _actually_ need,
|
||
and tune accordingly.
|
||
Remember: **Limits affect your user experience and stability. Requests mostly affect your costs**.
|
||
|
||
For example, a sensible starting point (lacking any other information) might be:
|
||
|
||
```yaml
|
||
request:
|
||
cpu: 0.5
|
||
mem: 2G
|
||
limit:
|
||
cpu: 1
|
||
mem: 2G
|
||
```
|
||
|
||
(more memory if significant computations are likely - machine learning models, data analysis, etc.)
|
||
|
||
Some actions
|
||
|
||
- If you see out-of-memory killer events, increase the limit (or talk to your users!)
|
||
- If you see typical memory well below your limit, reduce the request (but not the limit)
|
||
- If _nobody_ uses that much memory, reduce your limit
|
||
- If CPU is your limiting scheduling factor and your CPUs are mostly idle,
|
||
reduce the cpu request (maybe even to 0!).
|
||
- If CPU usage continues to be low, increase the limit to 2 or 4 to allow bursts of parallel execution.
|
||
|
||
(measuring)=
|
||
|
||
### Measuring user resource consumption
|
||
|
||
It is _highly_ recommended to deploy monitoring services such as [Prometheus][]
|
||
and [Grafana][] to get a view of your users' resource usage.
|
||
This is the only way to truly know what your users need.
|
||
|
||
JupyterHub has some experimental [grafana dashboards][] you can use as a starting point,
|
||
to keep an eye on your resource usage.
|
||
Here are some sample charts from (again from mybinder.org),
|
||
showing >90% of users using less than 10% CPU and 200MB,
|
||
but a few outliers near the limit of 1 CPU and 2GB of RAM.
|
||
This is the kind of information you can use to tune your requests and limits.
|
||
|
||

|
||
|
||
[prometheus]: https://prometheus.io
|
||
[grafana]: https://grafana.com
|
||
[grafana dashboards]: https://github.com/jupyterhub/grafana-dashboards
|
||
|
||
### Measuring costs
|
||
|
||
Measuring costs may be as important as measuring your users activity.
|
||
If you are using a cloud provider, you can often use cost thresholds and quotas to instruct them to notify you if your costs are too high,
|
||
e.g. "Have AWS send me an email if I hit X spending trajectory on week 3 of the month."
|
||
You can then use this information to tune your resources based on what you can afford.
|
||
You can mix this information with user resource consumption to figure out if you have a problem,
|
||
e.g. "my users really do need X resources, but I can only afford to give them 80% of X."
|
||
This information may prove useful when asking your budget-approving folks for more funds.
|
||
|
||
### Additional resources
|
||
|
||
There are lots of other resources for cost and capacity planning that may be specific to JupyterHub and/or your cloud provider.
|
||
|
||
Here are some useful links to other resources
|
||
|
||
- [Zero to JupyterHub](https://z2jh.jupyter.org) documentation on
|
||
- [projecting costs](https://z2jh.jupyter.org/en/latest/administrator/cost.html)
|
||
- [configuring user resources](https://z2jh.jupyter.org/en/latest/jupyterhub/customizing/user-resources.html)
|
||
- Cloud platform cost calculators:
|
||
- [Google Cloud](https://cloud.google.com/products/calculator/)
|
||
- [Amazon AWS](https://calculator.aws)
|
||
- [Microsoft Azure](https://azure.microsoft.com/en-us/pricing/calculator/)
|