> Coordinator sees Node A has significantly fewer rows (logical count) than the cluster average. It flags Node A as "underutilized."
Ok, so you are dealing with a classic - you measure A, but what matters is B. For "load" balancing a decent metric is, well, response time (and jitter).
For data partitioning - I guess number of rows is not the right metric? Change it to number*avg_size or something?
If you can't measure the thing directly, then take a look at stuff like "PID controller". This can be approach as a typical controller loop problem, although in 99% doing PID for software systems is an overkill.
otterley 1 hours ago [-]
It's not clear whether you're using Kubernetes, but the Kubernetes way of dealing with this problem is to declare a memory reservation (i.e., a request) along with the container specification. The amount of the reservation will be deducted from the host's available memory for scheduling purposes, regardless of whether the container actually consumes the reserved amount. It's also a best practice to configure the memory limit to be identical to the reservation, so if the container exceeds the reserved amount, the kernel will terminate it via the OOM killer.
Of course, for this to work, you have to figure out what that reserved amount should be. That is an exercise for the implementer (i.e., you).
> Attempting to enumerate every resource variable (CPU, IOPS, RSS, Disk, logical count) into a single scoring function feels like an NP-hard trap.
Yeah, don't do that. Figure out what resources your applications need and the declare them, and let the scheduler find the best node based on the requirements you've specified.
> We are trying to write a "God Equation" for our load balancer. We started with row_count, which failed. We looked at disk usage, but that doesn't correlate with RAM because of lazy loading.
A few things come to mind...
First, you're talking about a load balancer, but it's not clear that you're trying to balance load! A good metric to use for load balancing is one whose value is proportional to response latency.
It smells like you're trying to provision resources based on an optimistic prediction of your working set size. Perhaps you need a more pessimistic prediction. It might also be that you're relying too heavily on the kernel to handle paging, when what you really need is a cache tuned for your application that is scan-resistant, coupled with O_DIRECT for I/O.
bcoates 1 hours ago [-]
Memory pressure (and a lot of other overload conditions) usually makes latency worse--does that show up in your system? Latency backpressure is a pretty conventional thing to do. You're going to want some way to close the loop back to your load balancer, if you're doing open-loop control (sending a "fair share" of traffic to each node and assuming it can handle it) issues like you describe will keep coming up.
This is a Hard Problem and you might be trying to get away with an unrealistically small amount of overprovisioning.
wmf 51 minutes ago [-]
Have you measured Pressure Stall Information or active pages from /proc/meminfo?
Attempting to enumerate every resource variable (CPU, IOPS, RSS, Disk, logical count) into a single scoring function feels like an NP-hard trap.
That's perfect for machine learning.
Rendered at 20:54:47 GMT+0000 (Coordinated Universal Time) with Vercel.
Ok, so you are dealing with a classic - you measure A, but what matters is B. For "load" balancing a decent metric is, well, response time (and jitter).
For data partitioning - I guess number of rows is not the right metric? Change it to number*avg_size or something?
If you can't measure the thing directly, then take a look at stuff like "PID controller". This can be approach as a typical controller loop problem, although in 99% doing PID for software systems is an overkill.
Of course, for this to work, you have to figure out what that reserved amount should be. That is an exercise for the implementer (i.e., you).
See https://kubernetes.io/docs/concepts/configuration/manage-res...
> Attempting to enumerate every resource variable (CPU, IOPS, RSS, Disk, logical count) into a single scoring function feels like an NP-hard trap.
Yeah, don't do that. Figure out what resources your applications need and the declare them, and let the scheduler find the best node based on the requirements you've specified.
> We are trying to write a "God Equation" for our load balancer. We started with row_count, which failed. We looked at disk usage, but that doesn't correlate with RAM because of lazy loading.
A few things come to mind...
First, you're talking about a load balancer, but it's not clear that you're trying to balance load! A good metric to use for load balancing is one whose value is proportional to response latency.
It smells like you're trying to provision resources based on an optimistic prediction of your working set size. Perhaps you need a more pessimistic prediction. It might also be that you're relying too heavily on the kernel to handle paging, when what you really need is a cache tuned for your application that is scan-resistant, coupled with O_DIRECT for I/O.
This is a Hard Problem and you might be trying to get away with an unrealistically small amount of overprovisioning.
Attempting to enumerate every resource variable (CPU, IOPS, RSS, Disk, logical count) into a single scoring function feels like an NP-hard trap.
That's perfect for machine learning.