Where is the data?

As a general rule-of-thumb, our data is always stored in the London region (eu-west-2), in either RDS (tier-0, MySQL) or Elasticsearch (tier-1). For the EURITO project we also use Neo4j (tier-1), and in the distant future we will use Neo4j for tier-2 (i.e. a knowledge graph).

Why don’t you use Aurora rather than MySQL?

Aurora is definitely cheaper for stable production and business processes but not for research and development. You are charged for every byte of data you have ever consumed. This quickly spirals out-of-control for big-data development. Maybe one day we’ll consider migrating back, once the situation stabilises.

Where are the production machines?

Production machines (EC2) run in Ohio (us-east-2).

Where is the latest config?

One of our priorities is to implement a decent config management system. Our latest read-only config for accessing the tier-0 (rawish SQL data) can be found here, and a fairly up-to-date config directory (which you can paste into nesta/core/config) can be found here. If you want to use exactly what Joel has been using, feel free to sudo cp from /home/ec2-user/nesta/nesta/core/config. For example, you can find the latest Elasticsearch indexes and endpoints here.

Where do I start with Elasticsearch?

All Elasticsearch indexes (aka “databases” to the rest of the world), mappings (aka “schemas”) and whitelisting can be found here.

I’d recommend using PostMan for spinning up and knocking down indexes. Practice this on a new cluster (which you can spin up from the above link), and then practice PUT, POST and DELETE requests to PUT an index (remember: “database”) with a mapping (“schema”), inserting a “row” with POST and then deleting the index with DELETE. You will quickly learn that it’s very easy to delete everything in Elasticsearch.


How do I restart the apache server after downtime?

sudo service httpd restart

How do I restart the luigi server after downtime?

sudo su - luigi

source activate py36

luigid --background --pidfile /var/run/luigi/ --logdir /var/log/luigi

How do I perform initial setup to ensure the batchables will run?

  • AWS CLI needs to be installed and configured:

pip install awscli

aws configure

AWS Access Key ID and Secret Access Key are set up in IAM > Users > Security Credentials Default region name should be eu-west-1 to enable the error emails to be sent In AWS SES the sender and receiver email addresses need to be verified

  • The config files need to be accessible and the PATH and LUIGI_CONFIG_PATH need to be amended accordingly

How do I add a new user to the server?

  • add the user with useradd --create-home username
  • add sudo privileges following these instructions
  • add to ec2 user group with sudo usermod -a -G ec2-user username
  • set a temp password with passwd username
  • their home directory will be /home/username/
  • copy .bashrc to their home directory
  • create folder .ssh in their home directory
  • copy .ssh/authorized_keys to the same folder in their home directory (DONT MOVE IT!!)
  • cd to their home directory and perform the below
  • chown their copy of .ssh/authorized_keys to their username: chown username .ssh/authorized_keys
  • clone the nesta repo
  • copy core/config files
  • set password to be changed next login chage -d 0 username
  • share the temp password and core pem file

If necessary: - sudo chmod g+w /var/tmp/batch