Think about scheduling measurements and processing them

This commit is contained in:
Peter J. Holzer 2022-11-27 10:19:37 +01:00
parent dd1b3056ac
commit cf32c72b8f
2 changed files with 83 additions and 0 deletions

35
doc/multiqueue Normal file
View File

@ -0,0 +1,35 @@
1 Each job has a period p and a maximum delay d (<= p)
At startup we start every job serially
(potential problem: What if this takes longer than the minimum
period? We could sort the jobs by p asc)
Alternatively enqueue every job at t=now
(potential problem: This may clump the jobs together more than
necessary)
In any case:
If a job just finished and there are runnable jobs, we start the next
one in the queue.
At every tick (1/second?) we check whether there are runnable jobs.
For each runnable job we compute an overdue score «(now - t) / d».
If the maximum score is >= random.random() we start that job.
This is actually incorrect. Need to adjust for the ticks. Divide score
by «d / tick_length»? But if we do that we have no guarantuee that the
job will be started with at most d delay. We need a function which
exceeds 1 at this point.
«score = 1 / (t + d - now)» works. It's a uniform distribution, which is
probably not ideal. I think I want the CDF to rise steeper at the start.
But I can adust that later if necessary.
We reschedule that job.
at t + p?
at now + p?
at x + p where x is computed from the last n start times?
I think this depends on how we schedule them initially: If we
started them serially they are probably already well spaced out, so
t + p is a good choice. If we all scheduled them immediately, it
isn't. The second probably drifts most. The third seems reasonable
in all cases.

48
doc/processing.pipe Normal file
View File

@ -0,0 +1,48 @@
have a list of dependencies
should probably be specified in a rather high level notation with
wildcards and substitutions and stuff.
E.g something like this
measure=bytes_used -> measure=time_until_full
This would match all series with measure=bytes_used and use them to
compute a new series with measure=time_until_full and all other
dimensions unchanged.
Thats looks too simple, but do I need anything more complicated?
Obviously I also need to specify the filter. And I may not even need
to specify the result as the filter can determine that itself.
Although that may be bad for reusability.
Similar for auxiliary inputs. In the example above we also need the
corresponding measure=bytes_usable timeseries. The filter can
determine that itself, but maybe it's better to specify that in the
rule?
At run-time we expand the rules and just use the ids.
I think we want to decouple the processing from the data aquisition, so
the web service should just write the changed timeseries into a queue.
Touch a file with the key as the name in a spool dir. The processor can
then check if there is anything in the spool dir and process it. The
result of the filter is then again added to the spool dir (make sure
there are no circular dependencies! Hmm, that's up to the user I guess?
Or each generated series could have a rate limit?)
In addition to filters (which create new data) we also need some kind of
alerting system. That could just be a filter which produces no data but
does something else instead, like sending an email. So I'm not sure
whether it makes sense to distinguish these.
We should record all the used inputs (recursively) for each generated
series (do we actually want to store the transitive closure or just the
direct inputs? We can expand that when necessary.). Doing that for each
datapoint is overkill, but we should mark each input with a "last seen"
timestamp so that we can ignore or scrub inputs which are no longer
used.
Do we need a negative/timeout trigger? I.e. if a timeseries which is
used as an input is NOT updated in time, trigger the filter anyway so
that it can take appropriate action? If we have that how to we filter
out obsolete measurements? We don't want to get alerted that we haven't
gotten any disk usage data from a long-discarded host for 5 years. For
now I think we rely on other still active checks to fail if a
measurement fails to run.