FAQ

Where is the data?

As a general rule-of-thumb, our data is always stored in the London region (eu-west-2), in either RDS (tier-0, MySQL) or Elasticsearch (tier-1). For the EURITO project we also use Neo4j (tier-1), and in the distant future we will use Neo4j for tier-2 (i.e. a knowledge graph).

Why don’t you use Aurora rather than MySQL?

Aurora is definitely cheaper for stable production and business processes but not for research and development. You are charged for every byte of data you have ever consumed. This quickly spirals out-of-control for big-data development. Maybe one day we’ll consider migrating back, once the situation stabilises.

Where are the production machines?

Production machines (EC2) run in Ohio (us-east-2).

Where is the latest config?

We use git-crypt to encrypt our configuration files whilst allowing them to be versioned in git (meaning that we can also rollback configuration). To unlock the configuration encryption, you should install git-crypt, then run bash install.sh from the project root, and finally unlock the configuration using the key found here.

Where do I start with Elasticsearch?

All Elasticsearch indexes (aka “databases” to the rest of the world), mappings (aka “schemas”) and whitelisting can be found here.

I’d recommend using PostMan for spinning up and knocking down indexes. Practice this on a new cluster (which you can spin up from the above link), and then practice PUT, POST and DELETE requests to PUT an index (remember: “database”) with a mapping (“schema”), inserting a “row” with POST and then deleting the index with DELETE. You will quickly learn that it’s very easy to delete everything in Elasticsearch.

Troubleshooting

I’m having problems using the config files!

We use git-crypt to encrypt our configuration files whilst allowing them to be versioned in git (meaning that we can also rollback configuration). To unlock the configuration encryption, you should install git-crypt, then run bash install.sh from the project root, and finally unlock the configuration using the key.

How do I restart the apache server after downtime?

sudo service httpd restart

How do I restart the luigi server after downtime?

sudo su - luigi

source activate py36

luigid --background --pidfile /var/run/luigi/luigi.pid --logdir /var/log/luigi

How do I perform initial setup to ensure the batchables will run?

  • AWS CLI needs to be installed and configured:

pip install awscli

aws configure

AWS Access Key ID and Secret Access Key are set up in IAM > Users > Security Credentials Default region name should be eu-west-1 to enable the error emails to be sent In AWS SES the sender and receiver email addresses need to be verified

  • The config files need to be accessible and the PATH and LUIGI_CONFIG_PATH need to be amended accordingly

How can I send/receive emails from Luigi?

You should set the environmental variable export LUIGI_EMAIL="<your.email@something>" in your .bashrc. You can test this with luigi TestNotificationsTask --local-scheduler --email-force-send. Make sure your email address has been registered under AWS SES.

How do I add a new user to the server?

  • add the user with useradd --create-home username
  • add sudo privileges following these instructions
  • add to ec2 user group with sudo usermod -a -G ec2-user username
  • set a temp password with passwd username
  • their home directory will be /home/username/
  • copy .bashrc to their home directory
  • create folder .ssh in their home directory
  • copy .ssh/authorized_keys to the same folder in their home directory (DONT MOVE IT!!)
  • cd to their home directory and perform the below
  • chown their copy of .ssh/authorized_keys to their username: chown username .ssh/authorized_keys
  • clone the nesta repo
  • copy core/config files
  • set password to be changed next login chage -d 0 username
  • share the temp password and core pem file

If necessary: - sudo chmod g+w /var/tmp/batch