Fetching data and results

Fetching data and results#

The geoclaw_tsunami_tutorial/GTT/datasets directory contains this description of how large datasets (both input data and sample results) are handled in the tutorials, as well as a few simple examples, used mostly for testing purposes at the moment.

Using GeoClaw on a realistic problem often requires large data files as input, e.g. topo files with the topography DEMs and dtopo files prescribing earthquake motion. These data files are not included in the Github repository but must be downloaded separately in order to run the codes. Often these can be downloaded directly from the NCEI server or other existing code repositories, but in some cases may be stored on the geoclaw server. There are some notebooks included in the tutorial that will download and save these files (or even cropped and coarsened versions, as in the notebook Make topofiles for Copalis Beach). In some cases a script fetch_input_data.py is included in examples to fetch the required input data.

Some of the examples also have a script fetch_sample_results.py that fetches a subdirectory sample_results with examples of what should be produced by running the GeoClaw code, often the plots that result from also doing make plots or running other postprocessing scripts. There are two reasons for providing sample results:

  • Some examples take hours to run and you may want to be able to explore the results without having to run the code.

  • Building the Jupyter Book (i.e. converting a markdown file like $CLAW/datasets/README.md into the html version that you are now reading) is done on Github every time a change to these tutorials is pushed. If the resulting webpage should have figures that illustrate the results being discussed (such as Sample results for example1), then those plots need to be available on the computer building the html. Downloading the sample_results directory is done directly from a code cell at the top of the file $GTT/CopalisBeach/example1/results.md that executes $GTT/CopalisBeach/example1/fetch_sample_results.py.

The data repository and cache#

The top level of this repository geoclaw_tsunami_tutorial contains a subdirectory GTT (which is where $GTT points, if you followed the Suggested Workflow. It also contains a subdirectory $GTT_data_repository, which is nearly empty and just contains some tools for building the remote repository, which you can ignore.

A script like $GTT/CopalisBeach/example1/fetch_sample_results.py gives a list of files or directories to download (in this case only sample_results) and then uses tools in the module $GTT/common_code/GTT_tools.py to fetch the zip file from the online repository, unzip it, and put it in the proper location in GTT.