FAQ¶
Where is the data?¶
As a general rule-of-thumb, our data is always stored in the London region (eu-west-2
), in either RDS (tier-0, MySQL) or Elasticsearch (tier-1). For the EURITO project we also use Neo4j (tier-1), and in the distant future we will use Neo4j for tier-2 (i.e. a knowledge graph).
Why don’t you use Aurora rather than MySQL?¶
Aurora is definitely cheaper for stable production and business processes but not for research and development. You are charged for every byte of data you have ever consumed. This quickly spirals out-of-control for big-data development. Maybe one day we’ll consider migrating back, once the situation stabilises.
Where are the production machines?¶
Production machines (EC2) run in Ohio (us-east-2).
Where is the latest config?¶
We use git-crypt
to encrypt our configuration files whilst allowing them to be versioned in git (meaning that we can also rollback configuration). To unlock the configuration encryption, you should install git-crypt
, then run bash install.sh
from the project root, and finally unlock the configuration using the key found here.
Where do I start with Elasticsearch?¶
All Elasticsearch indexes (aka “databases” to the rest of the world), mappings (aka “schemas”) and whitelisting can be found here.
I’d recommend using PostMan for spinning up and knocking down indexes. Practice this on a new cluster (which you can spin up from the above link), and then practice PUT
, POST
and DELETE
requests to PUT
an index (remember: “database”) with a mapping (“schema”), inserting a “row” with POST
and then deleting the index with DELETE
. You will quickly learn that it’s very easy to delete everything in Elasticsearch.
Troubleshooting¶
I’m having problems using the config files!¶
We use git-crypt
to encrypt our configuration files whilst allowing them to be versioned in git (meaning that we can also rollback configuration). To unlock the configuration encryption, you should install git-crypt
, then run bash install.sh
from the project root, and finally unlock the configuration using the key.
How do I restart the apache server after downtime?¶
sudo service httpd restart
How do I restart the luigi server after downtime?¶
sudo su - luigi
source activate py36
luigid --background --pidfile /var/run/luigi/luigi.pid --logdir /var/log/luigi
How do I perform initial setup to ensure the batchables will run?¶
- AWS CLI needs to be installed and configured:
pip install awscli
aws configure
AWS Access Key ID and Secret Access Key are set up in IAM > Users > Security Credentials
Default region name should be eu-west-1
to enable the error emails to be sent
In AWS SES the sender and receiver email addresses need to be verified
- The config files need to be accessible and the PATH and LUIGI_CONFIG_PATH need to be amended accordingly
How can I send/receive emails from Luigi?¶
You should set the environmental variable export LUIGI_EMAIL="<your.email@something>"
in your .bashrc
. You can test this with luigi TestNotificationsTask --local-scheduler --email-force-send
. Make sure your email address has been registered under AWS SES.
How do I add a new user to the server?¶
- add the user with
useradd --create-home username
- add sudo privileges following these instructions
- add to ec2 user group with
sudo usermod -a -G ec2-user username
- set a temp password with
passwd username
- their home directory will be
/home/username/
- copy
.bashrc
to their home directory - create folder
.ssh
in their home directory - copy
.ssh/authorized_keys
to the same folder in their home directory (DONT MOVE IT!!) cd
to their home directory and perform the below- chown their copy of
.ssh/authorized_keys
to their username:chown username .ssh/authorized_keys
- clone the nesta repo
- copy
core/config
files - set password to be changed next login
chage -d 0 username
- share the temp password and core pem file
If necessary:
- sudo chmod g+w /var/tmp/batch