Think about scheduling measurements and processing them
This commit is contained in:
parent
dd1b3056ac
commit
cf32c72b8f
|
@ -0,0 +1,35 @@
|
||||||
|
1 Each job has a period p and a maximum delay d (<= p)
|
||||||
|
|
||||||
|
At startup we start every job serially
|
||||||
|
(potential problem: What if this takes longer than the minimum
|
||||||
|
period? We could sort the jobs by p asc)
|
||||||
|
Alternatively enqueue every job at t=now
|
||||||
|
(potential problem: This may clump the jobs together more than
|
||||||
|
necessary)
|
||||||
|
|
||||||
|
In any case:
|
||||||
|
|
||||||
|
If a job just finished and there are runnable jobs, we start the next
|
||||||
|
one in the queue.
|
||||||
|
|
||||||
|
At every tick (1/second?) we check whether there are runnable jobs.
|
||||||
|
For each runnable job we compute an overdue score «(now - t) / d».
|
||||||
|
If the maximum score is >= random.random() we start that job.
|
||||||
|
This is actually incorrect. Need to adjust for the ticks. Divide score
|
||||||
|
by «d / tick_length»? But if we do that we have no guarantuee that the
|
||||||
|
job will be started with at most d delay. We need a function which
|
||||||
|
exceeds 1 at this point.
|
||||||
|
«score = 1 / (t + d - now)» works. It's a uniform distribution, which is
|
||||||
|
probably not ideal. I think I want the CDF to rise steeper at the start.
|
||||||
|
But I can adust that later if necessary.
|
||||||
|
|
||||||
|
We reschedule that job.
|
||||||
|
at t + p?
|
||||||
|
at now + p?
|
||||||
|
at x + p where x is computed from the last n start times?
|
||||||
|
I think this depends on how we schedule them initially: If we
|
||||||
|
started them serially they are probably already well spaced out, so
|
||||||
|
t + p is a good choice. If we all scheduled them immediately, it
|
||||||
|
isn't. The second probably drifts most. The third seems reasonable
|
||||||
|
in all cases.
|
||||||
|
|
|
@ -0,0 +1,48 @@
|
||||||
|
have a list of dependencies
|
||||||
|
should probably be specified in a rather high level notation with
|
||||||
|
wildcards and substitutions and stuff.
|
||||||
|
|
||||||
|
E.g something like this
|
||||||
|
measure=bytes_used -> measure=time_until_full
|
||||||
|
This would match all series with measure=bytes_used and use them to
|
||||||
|
compute a new series with measure=time_until_full and all other
|
||||||
|
dimensions unchanged.
|
||||||
|
|
||||||
|
Thats looks too simple, but do I need anything more complicated?
|
||||||
|
Obviously I also need to specify the filter. And I may not even need
|
||||||
|
to specify the result as the filter can determine that itself.
|
||||||
|
Although that may be bad for reusability.
|
||||||
|
Similar for auxiliary inputs. In the example above we also need the
|
||||||
|
corresponding measure=bytes_usable timeseries. The filter can
|
||||||
|
determine that itself, but maybe it's better to specify that in the
|
||||||
|
rule?
|
||||||
|
|
||||||
|
At run-time we expand the rules and just use the ids.
|
||||||
|
|
||||||
|
I think we want to decouple the processing from the data aquisition, so
|
||||||
|
the web service should just write the changed timeseries into a queue.
|
||||||
|
Touch a file with the key as the name in a spool dir. The processor can
|
||||||
|
then check if there is anything in the spool dir and process it. The
|
||||||
|
result of the filter is then again added to the spool dir (make sure
|
||||||
|
there are no circular dependencies! Hmm, that's up to the user I guess?
|
||||||
|
Or each generated series could have a rate limit?)
|
||||||
|
|
||||||
|
In addition to filters (which create new data) we also need some kind of
|
||||||
|
alerting system. That could just be a filter which produces no data but
|
||||||
|
does something else instead, like sending an email. So I'm not sure
|
||||||
|
whether it makes sense to distinguish these.
|
||||||
|
|
||||||
|
We should record all the used inputs (recursively) for each generated
|
||||||
|
series (do we actually want to store the transitive closure or just the
|
||||||
|
direct inputs? We can expand that when necessary.). Doing that for each
|
||||||
|
datapoint is overkill, but we should mark each input with a "last seen"
|
||||||
|
timestamp so that we can ignore or scrub inputs which are no longer
|
||||||
|
used.
|
||||||
|
|
||||||
|
Do we need a negative/timeout trigger? I.e. if a timeseries which is
|
||||||
|
used as an input is NOT updated in time, trigger the filter anyway so
|
||||||
|
that it can take appropriate action? If we have that how to we filter
|
||||||
|
out obsolete measurements? We don't want to get alerted that we haven't
|
||||||
|
gotten any disk usage data from a long-discarded host for 5 years. For
|
||||||
|
now I think we rely on other still active checks to fail if a
|
||||||
|
measurement fails to run.
|
Loading…
Reference in New Issue