Commit f7dcfaaa authored by Michał Woźniak's avatar Michał Woźniak

ingestion grace period now expressed in seconds, for consistency; README mostly complete

parent a746ad9c
# Docker Matomo Log Analytics
# Watchful Matomo Log Analytics (for Matomo 3.x)
Matomo's [log analytics](https://matomo.org/log-analytics/), dockerized and watching specified directories for logs to ingest.
Matomo's [log analytics](https://matomo.org/log-analytics/), automatically watching specified directories for logs to ingest.
the log analytics script is wrapped in an [`inotifywait`](https://linux.die.net/man/1/inotifywait) loop to automagically pick up logs from specified directories.
The log analytics script is wrapped in an [`inotifywait`](https://linux.die.net/man/1/inotifywait) loop to automagically pick up logs from specified directories.
## Environment variables
## Requirements
- `WATCH_PATHS` (default: `"/srv/logs/"`)
The `logwatch.py` script requires Matomo's `import_logs.py` (branch `3-x.dev`) log analytics script to be available for import. Since that script only runs on Python 2.7, so does this. Obviously requirements of the `import_logs.py` script need to be satisfied, plus `inotify_simple` and `signal` modules need to be available.
Whitespace-separated list of paths to watch. These *should* be directories, which will be watched recursively. Once an `inotify` event fires, all files matching the glob pattern `*.log` contained in this directory will be ingested, one by one.
## Operation
- `WATCH_DELAY` (default: `"0.5"`)
The script sets [inotify](https://en.wikipedia.org/wiki/Inotify) watches on the listed directories.
Delay between detecting changes and starting to process the files, in seconds (decimals are supported). Since `inotifywait` will detect the *first* change, if there is a large batch of changes happening (for example, a batch of large logfiles being copied into the directory), starting to load the files immediately would lead to unexpected results.
When files matching the `--logfiles-glob` pattern are detected, the script waits `--ingestion-grace-period` seconds after all activity stops and starts ingesting the batch of detected files one by one. Ingested files are either renamed (using `--prefix-ingested` and `--suffix-ingested`) or deleted (`--delete-ingested`).
## Volume
If an unrecoverable error occurs during ingestion of a file, the file is either renamed (using `--prefix-failed` and `--suffix-failed`) or deleted (`--delete-failed`) — unless `--exit-on-error` is used, in which case the script immediately exits with an error message.
The "`/srv/logs`" directory is exposed through the [`VOLUME` Dockerfile directive](https://docs.docker.com/engine/reference/builder/#volume), and is also configured as the default location to watch in `WATCH_PATHS`.
After all files in the batch are processed (either ingested or failed), Matomo's report processing is automagically triggered, unless `--no-auto-archive` is used.
While ingestion is in progress new files are *not* being added to the batch. Once processing of a batch ends, if there are any inotify events since processing started, all files matching the configured glob are added to a new batch, which is then processed. If there are no inotify events since processing of the batch started, script waits for new events.
## Usage
Run `./logwatch.py --help` to get help. All `import_logs.py` options are supported, plus these additional ones:
- `--logfiles-glob` (default: `"*.log"`)
Only files matching this shell glob expression will be ingested. It's
important to make sure that the glob does not match ingested files after
prefix and suffix is applied! See `--prefix-ingested` and `--suffix-ingested`.
- `--ingestion-grace-period` (default: `5`)
Delay (in seconds; fractions are supported) between noticing a logfile to be processed and starting ingesting it.
This is part of the built-in heuristic for determining that a file is not being modified
or moved anymore and can be safely ingested.
- `--delete-ingested` (default: False)
Delete successfully ingested logfiles.
- `--prefix-ingested` (default: `"ingested/"`)
Rename ingested logfiles using this prefix; prefix can indicate directories (in
which case it should contain '/'), and is then relative to the directory a given
logfile was originally in: when watching several directories, a prefix of
'ingested/' will place ingested files in './ingested/' subdirectories of
respective watched directories. Directories will be created if needed. This option
is ignored if `--delete-ingested` is used.
- `--suffix-ingested` (default: `".ingested"`)
Rename ingested logfiles using this suffix; it cannot contain any '/' characters.
This option is ignored if `--delete-ingested` is used.
- `--exit-on-error` (default: False)
Exit when ingestion errors are encountered.
- `--delete-failed` (default: False)
Delete logfiles which failed to be ingested.
- `--prefix-failed` (default: `"failed/"`)
Rename logfiles that failed to be ingested using this prefix; prefix can
have directories (in which case it should contain '/'), and is then relative
to the directory a given logfile was originally in: when watching several
directories, a prefix of 'failed/' will place such files in './failed/'
subdirectories of respective watched directories. Directories will be created
if needed. This prefix will also be used for files containing information
on what error was encountered and at which line.
This option is ignored if `--delete-failed` is used.
- `--suffix-failed` (default: `".failed"`)
Rename logfiles that failed to be ingested using this suffix; it cannot
contain any '/' characters. This option is ignored if `--delete-failed` is used.
- `--no-auto-archive` (default: True)
Do not automatically run auto-archiving of Matomo reports. By default
auto-archiving is triggered after a batch of logfiles is ingested
## Docker usage
Run the image with log directories you want to watch volume-mounted. Specify the options and directories to watch directly as the command (`logwatch.py` is the entrypoint script, and default command is `--help`).
### Example docker-compose service
```yaml
# loading nginx logfiles into matomo
logwatch:
build: https://git.rys.io/libre/watchful-matomo-log-analytics.git
volumes:
- "/var/log/old/nginx/:/logs/nginx/"
- "/var/log/old/apache/:/logs/apache/"
command:
- --prefix-ingested
- ingested/
- --prefix-failed
- failed/
- --enable-http-errors
- --enable-http-redirects
- --enable-static
- --enable-bots
- --logfiles-glob
- '*.gz'
- --url
- https://matomo.example.org
- --token-auth
- <matomo_token>
- /logs/nginx/
- /logs/apache/
```
......@@ -73,23 +73,33 @@ class Configuration(import_logs.Configuration):
# in case we want to add options
self.parser = self._create_parser()
# fix the basics
self.parser.usage='Usage: %prog [options] log_dir [ log_dir [...] ]'
self.parser.description="""Watch HTTP access log directories and import HTTP access logs to Matomo
log_dir is the path to a directory with server access log files (uncompressed, .gz, or .bz2).
You may also watch many log file directories at once.
By default, the script will try to produce clean reports and will exclude bots, static files, discard http error and redirects, etc. This is customizable, see below."""
self.parser.epilog="""About Watchful Matomo Log Analytics: https://git.rys.io/libre/watchful-matomo-log-analytics/ ;
This script is based on Matomo Server Log Analytics: https://matomo.org/log-analytics/"""
self.parser.add_option(
'--logfiles-glob',
dest='logfiles_glob',
default="*.log",
help="Only files matching this shell glob expression will be ingested. It's "
"important to make sure that the glob does not match ingested files after "
"prefix and suffix is applied!"
"important to make sure that the glob does not match already ingested files after "
"prefix and suffix is applied! See --prefix-ingested and --suffix-ingested."
)
self.parser.add_option(
'--ingestion-grace-period',
dest='ingestion_grace_period',
type='int',
default=5000,
help="Delay (in ms) between noticing a logfile to be processed and starting ingesting it."
"This is part of the built-in heuristic for determining that a file is not being modified "
"or moved anymore and can be safely ingested."
type='float',
default=5,
help="Delay (in seconds; fractions are supported) between noticing a logfile to be "
"processed and starting ingesting it. This is part of the built-in heuristic for "
"determining that a file is not being modified or moved anymore and can be "
"safely ingested."
)
self.parser.add_option(
......@@ -97,14 +107,14 @@ class Configuration(import_logs.Configuration):
dest='delete_ingested',
action='store_true',
default=False,
help="Delete ingested logfiles."
help="Delete successfully ingested logfiles."
)
self.parser.add_option(
'--prefix-ingested',
dest='prefix_ingested',
default="ingested/",
help="Rename ingested logfiles using this prefix; prefix can have directories (in "
help="Rename ingested logfiles using this prefix; prefix can indicate directories (in "
"which case it should contain '/'), and is then relative to the directory a given "
"logfile was originally in: when watching several directories, a prefix of "
"'ingested/' will place ingested files in './ingested/' subdirectories of "
......@@ -133,7 +143,7 @@ class Configuration(import_logs.Configuration):
dest='delete_failed',
action='store_true',
default=False,
help="Delete logfiles which failed to be ingsted."
help="Delete logfiles which failed to be ingested."
)
......@@ -398,7 +408,7 @@ if len(logfiles_busy) > 0:
# And see the corresponding events:
timestamp = time.time()
time.sleep(config.options.ingestion_grace_period / 1000)
time.sleep(config.options.ingestion_grace_period)
while True:
new_timestamp = time.time()
logging.debug("looped in %0.5fs" % (new_timestamp - timestamp))
......@@ -406,7 +416,7 @@ while True:
logfiles_free = list(set(logfiles_free + logfiles_busy))
logfiles_busy = []
for event in inotify.read(config.options.ingestion_grace_period, config.options.ingestion_grace_period):
for event in inotify.read(config.options.ingestion_grace_period * 1000, config.options.ingestion_grace_period * 1000):
if event.wd not in watches:
continue
f = os.path.join(watches[event.wd], event.name)
......@@ -556,5 +566,5 @@ while True:
# get the watches going again
setup_watches()
time.sleep(config.options.ingestion_grace_period / 1000)
time.sleep(config.options.ingestion_grace_period)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment