Up to: Tech WG

so Mastodon takes up space, right? How do we manage that?

Currently we just have all our storage on the same Linode compute instance that we run the mastodon instance from. That should probably change at some point to using S3-compatible object storage.

But for now, what the heck takes up all the space?

What Gets Stored

Database

The database is relatively small since it mostly stores text! For reference, as of August 15 (roughly 8 months after the instance started), it is 13.5GB, with statuses maching up 4.5GB of that.

Media Storage

The instance is currently configured to remove media (images, video, audio) from other instances from the cache after 1 day - after that, the media is redownloaded from the hosting instance when accessed.

Of course, we host all media from our members indefinitely, and actively encourage them to post big images and videos because that fills up their stat counters

 

Cache

Data from other instances that isn't status and other text data is stored in the system cache: mastodon/live/public/system/cache - this currently makes up most of the storage needs.

It includes

  • account data, including avatars and header images, which are stored indefinitely and thus need to be pruned
  • media from other instances (images, videos in posts)
  • custom emojis from other instances
  • link preview cards for embedded links

Object Storage

Our media is configured to be served via a reverse proxy at https://media.neuromatch.social

Proxy

First we set up the reverse proxy as described in the linode and masto docs - we modified the nginx configuration slightly to use a different cache than the regular masto cache, and because you can't declare a proxy_cache_path within the server block as the linode docs have.

proxy_cache_path /var/cache/nginx-object-storage keys_zone=CACHEOBJECT:10m inactive=7d max_size=10g;

server {
  # ...
  location @s3 {
    # ...
    proxy_cache CACHEOBJECT;
    # ...
  }
}

Transition

To transition data from our existing instance, we used rclone.

docs:

Install:

curl -O https://downloads.rclone.org/rclone-current-linux-amd64.zip
unzip rclone-current-linux-amd64.zip
cd rclone-*-linux-amd64

# Copy binary file

sudo cp rclone /usr/bin/
sudo chown root:root /usr/bin/rclone
sudo chmod 755 /usr/bin/rclone

# Install manpage

sudo mkdir -p /usr/local/share/man/man1
sudo cp rclone.1 /usr/local/share/man/man1/
sudo mandb

Managing Storage

Media

Media storage is configured from the Mastodon admin interface - https://neuromatch.social/admin/settings/content_retention

Warning: Do not set the "Content cache retention period" as it will remove posts and bosts from other servers! that includes removing the bookmarks, favorites, and boosts of our members! bad to do!

Cache Pruning

See Maintenance#Cache Pruning

We use tootctl to periodically prune data from the server by running these commands every month. At the moment this keeps our storage in a sustainable range:

# remote cover images from accounts that nobody on the instance is following
tootctl media remove --remove-headers

We might also want to add these commands that make sense to do, but dont' necessarily contribute a ton to our storage burden. Want to check with the rest of Tech WG first before I do these (-Jonny 23-08-15)

# remove remote accounts that no longer exist
# excludes accounts with confirmed activity in the last week in case the server is down
tootctl accounts cull

# remove files that do not belong to media attachments.
tootctl media remove-orphans

Logs

Logs end up being a surprisingly large contributor to storage burden!

Systemd

Most of the large logs are managed by systemd/journald. To prevent these from growing out of control, we set a global maximum disk usage of 5G, and then set a relatively small cap for each individual file so that they are rotated/archived frequently. We additionally delete all log files older than one month, if they manage to escape these caps.

In /etc/systemd/journald.conf

[Journal]
Storage=auto
Compress=yes
SystemMaxUse=5G
SystemMaxFileSize=64M
SystemMaxFiles=10
MaxRetentionSec=1month

After editing this configuration, reload it:

sudo systemctl restart systemd-journald

Nginx

See Wiki/Nginx#Log_Rotation

Reference