This should work on Linux, Mac, and Windows.
-
install docker
-
if you're on Windows, run
git config --global core.autocrlf false
-
clone this repo
-
unzip the data package zipfile into the
./data
directory
cd data
unzip /path/to/data.zip
- start services required for the build:
# docker build needs buildkit enabled
export DOCKER_BUILDKIT=1
# copy sample .env file
cp .env-sample .env
# starts the database and other services:
docker compose up --build -d
- to do a full run of the data build (loading and transforming the data):
# run the entire build
# on windows, add "winpty" to the front
docker compose run elt build-all
You'll see a "Finished build" message when it's finished. You can connect to the postgres database on localhost:5433 using a SQL client (like DBeaver) or use psql:
docker exec -it landlord-tracker-db-1 psql -U postgres
- to run the address report (replace 123 MAIN ST with a real address of an apartment complex):
docker compose run elt python3 -m engels.reports.report_address "123 MAIN ST"
- to stop the stack, run
docker compose down
. This stops the postgres database, but the data should be preserved and available agaion the next time you runcompose up
.
note: on windows, add "winpty" to the front of docker commands to get interactivity and avoid output buffering
To (re)run only the load or transforms as you are doing development:
# re-runs only the initial load of database tables
docker compose run elt load
# re-runs only the transforms
docker compose run elt transform
# you can also run bash in the container and run other things
docker compose run elt bash
After running the transform step, you can view the updated dbt-generated HTML documentation at http://localhost:8000 to get a thousand foot view of the pipelines.
For tables:
pre_
- data preprocessed from raw files
raw_
- raw aka source tables, imported into the database directly from extract files
int_
- intermediate tables; these only exist to help create other tables.
stg_
- staged data: these are cleaned versions of raw tables, suitable for querying.
ent_
- derived entity tables built from staging
vld_
- for validation