ltsdb/doc/processing.pipe

have a list of dependencies
    should probably be specified in a rather high level notation with
    wildcards and substitutions and stuff.

    E.g something like this
    measure=bytes_used -> measure=time_until_full
    This would match all series with measure=bytes_used and use them to
    compute a new series with measure=time_until_full and all other
    dimensions unchanged.

    Thats looks too simple, but do I need anything more complicated?
    Obviously I also need to specify the filter. And I may not even need
    to specify the result as the filter can determine that itself.
    Although that may be bad for reusability.
    Similar for auxiliary inputs. In the example above we also need the
    corresponding measure=bytes_usable timeseries. The filter can
    determine that itself, but maybe it's better to specify that in the
    rule?

    At run-time we expand the rules and just use the ids.

I think we want to decouple the processing from the data aquisition, so
the web service should just write the changed timeseries into a queue.
Touch a file with the key as the name in a spool dir. The processor can
then check if there is anything in the spool dir and process it. The
result of the filter is then again added to the spool dir (make sure
there are no circular dependencies! Hmm, that's up to the user I guess?
Or each generated series could have a rate limit?)

In addition to filters (which create new data) we also need some kind of
alerting system. That could just be a filter which produces no data but
does something else instead, like sending an email. So I'm not sure
whether it makes sense to distinguish these.

We should record all the used inputs (recursively) for each generated
series (do we actually want to store the transitive closure or just the
direct inputs? We can expand that when necessary.). Doing that for each
datapoint is overkill, but we should mark each input with a "last seen"
timestamp so that we can ignore or scrub inputs which are no longer
used.

Do we need a negative/timeout trigger? I.e. if a timeseries which is
used as an input is NOT updated in time, trigger the filter anyway so
that it can take appropriate action? If we have that how to we filter
out obsolete measurements? We don't want to get alerted that we haven't
gotten any disk usage data from a long-discarded host for 5 years. For
now I think we rely on other still active checks to fail if a
measurement fails to run.
Think about scheduling measurements and processing them 2022-11-27 10:19:37 +01:00			`have a list of dependencies`
			`should probably be specified in a rather high level notation with`
			`wildcards and substitutions and stuff.`

			`E.g something like this`
			`measure=bytes_used -> measure=time_until_full`
			`This would match all series with measure=bytes_used and use them to`
			`compute a new series with measure=time_until_full and all other`
			`dimensions unchanged.`

			`Thats looks too simple, but do I need anything more complicated?`
			`Obviously I also need to specify the filter. And I may not even need`
			`to specify the result as the filter can determine that itself.`
			`Although that may be bad for reusability.`
			`Similar for auxiliary inputs. In the example above we also need the`
			`corresponding measure=bytes_usable timeseries. The filter can`
			`determine that itself, but maybe it's better to specify that in the`
			`rule?`

			`At run-time we expand the rules and just use the ids.`

			`I think we want to decouple the processing from the data aquisition, so`
			`the web service should just write the changed timeseries into a queue.`
			`Touch a file with the key as the name in a spool dir. The processor can`
			`then check if there is anything in the spool dir and process it. The`
			`result of the filter is then again added to the spool dir (make sure`
			`there are no circular dependencies! Hmm, that's up to the user I guess?`
			`Or each generated series could have a rate limit?)`

			`In addition to filters (which create new data) we also need some kind of`
			`alerting system. That could just be a filter which produces no data but`
			`does something else instead, like sending an email. So I'm not sure`
			`whether it makes sense to distinguish these.`

			`We should record all the used inputs (recursively) for each generated`
			`series (do we actually want to store the transitive closure or just the`
			`direct inputs? We can expand that when necessary.). Doing that for each`
			`datapoint is overkill, but we should mark each input with a "last seen"`
			`timestamp so that we can ignore or scrub inputs which are no longer`
			`used.`

			`Do we need a negative/timeout trigger? I.e. if a timeseries which is`
			`used as an input is NOT updated in time, trigger the filter anyway so`
			`that it can take appropriate action? If we have that how to we filter`
			`out obsolete measurements? We don't want to get alerted that we haven't`
			`gotten any disk usage data from a long-discarded host for 5 years. For`
			`now I think we rely on other still active checks to fail if a`
			`measurement fails to run.`