World Meteorological Organization

Country Profile Database

Deliverables

Technology

  • Javascript
  • c#
  • CakePHP
  • PostgreSQL
  • PostGIS
  • MySQL
  • MapBox custom maps

The World Meteorological Organization (WMO) is a specialized agency of the United Nations. It is the UN system's authoritative voice on the state and behaviour of the Earth's atmosphere, its interaction with the oceans, the climate it produces and the resulting distribution of water resources.

For the WMO, we significantly expanded, restructured and enhanced the Country Profile Database. The Country Profile Database is a web application that collects and displays information for each member country. The information is collected from a large number of very different data sources and presented in a unified web interface.

Background

Most countries are member countries of the World Meteorological Organization. Of each country, a large amount of data is available. Historically, this information has been stored in many different datasources — such as online databases, Excel files, or in other programs such as Filemaker Pro. Some other data has been collected for such a long time that they are only available as text files on an FTP server... which was an upgrade of the previous system that used daily radio transmissions and people keeping manual logs.

Due to the extremely fragmented nature of all this information, it was difficult to get an overview of which data and which knowledge was available within the organization. The goal of the Country Profile Database was to make all the data from all those different data sources available through one newly developed web interface.

Technology

The technology required to achieve this is split in three components: extracting the data from the source materials, normalizing the data and importing it into a new data structure, and finally using this data in a modern web application.

Data extraction

The data extraction part of this project was done by means of a periodically executed custom Windows application. The application is Windows-bound because it needs to access and execute applications that only run on Windows. The application as such is written in Microsoft C#, and contains logic such as “connect and log in to this FTP server, navigate to this directory, grab the most recent CSV file, convert it to XML, and upload it to our own server”. Other sources of data are Microsoft Access and FilemakerPro databases, Excel files, HTML tables on web pages, regular CSV files, directories of files on a network server, and data obtained as JSON through web APIs.

All of these data sources are converted to XML as an intermediary language, and then uploaded to our own server, where we normalize and import the resulting data.

Data import

The data obtained by the data extraction process is normalized and imported by the data importer. When necessary, information such as country designations are converted to a standardized format, which will make it easier to link information from different sources together. Then, the information is stored in a relational database (MySQL) for easy extraction.

Web application

The web application, finally, brings all this data together. Information is accessible by country or by information source. Where possible (if it concerns an online data source), a link to the original data is provided.

Data is cross-linked to allow for easier discovery, and presented in attractive interactive graphs or maps where possible.

Online maps

Some of the data is of a geographical nature. To make this data easier to comprehend, we wanted to display them on an interactive online map, so it's easier quickly get an overview of the data. A typical solution for such a problem would be to create a Google Maps widget with some custom data markers. However, in our case, that was not an option: Countries, country borders and country names and area designations are politically highly sensitive subjects. For that reason, we chose to create our own map using the excellent TileMill software, and to automatically update that map when new data became available.

The map itself lives in a file called an ‘.mbtiles’ file. When a user requests the online map, our purpose-built software processes this file so that the user is presented with just the information he or she needs. The end result is something like this:

Climate graphs

One of the available data sources that needed to displayed was historical climate data of the past 30 years. This data is gathered by individual weather stations, and the goal was to create an interactive graph for each of those individual weather stations. Some countries, such as the United States, have up to 250 individual weather stations in diverse places such as the south pole or in the desert. This poses some interesting problems.

First of all, it would be nice if it was possible to compare the different climate graphs to each other. This means we have to decide on common values for all graphs, for both the horizontal and the vertical axis — because otherwise comparing them would become difficult. But if you decide on common values for the temperature axis, and some of the data comes from the south pole, and other data comes from the desert, then the temperature range would have to go from -60°C and +40°C! This would render almost all the other graphs unusable: the range is SO big that you can't see the differences between different places anymore. There is a similar case for determining a scale to display measured results for rain/precipitation in India, for example. Since it contains the place with the worlds most downpour (>4000mm per year!), a common scale that would fit that number would literally flatline the precipitation graphs for other cities in India.

To deal with this issue, we adopted a different approach that calculates a range that works or almost works for at least 90% of the graphs. Outliers that don't fit are clearly marked and use a different scale.

Secondly, there is a performance issue with trying to embed 250 climate interactive graphs in a single web page. Try to imagine embedding 250 different YouTube video in a single web page, and you would see why this can become a problem.

In our initial setup, that we tested on a country with only 5 or 6 climate graphs, the graphs were added to the page and rendered/displayed once the page was loaded. This worked well and without much problems. However, on loading the country profile page for the United States, the browser would become completely unresponsive and have appeared to have crashed for up to a minute while it was busy initializing and rendering 250 interactive climate graphs at the same time.

We addressed this problem by keeping track of what the user was looking at, and postponing the rendering of any chart until it almost scrolled into view. This strategy worked quite well, but when you would scroll down (too) quickly, the page would stutter and freeze while it was trying to render the graphs you were now supposed to see. To alleviate this problem, a background rendering process was added that would start rendering climate graphs in the background at low intensity (occupying at most 33% of the user's CPU).

Third, it was necessary to be able to access the chart's underlying data. This could (one would think) easily be solved by including that data in a HTML table in the page's source code. However, when dealing with 250 charts, this inflates the HTML size to a few megabytes, it creates a humongous DOM, and the browser will try to render all of that — even if the resulting table isn't even visible by default. We got around this last set of problems by storing the chart's data in a JSON object and only inserting the HTML table when the user requests the data (by clicking the 'data' tab). This prevents the large download, the humongous DOM, and the browser's associated sluggishness.

Performance optimization

Initially, the Country Profile Database displayed all the correct information, but it was loading very very slowly. A quick analysis revealed that this was caused by a number of factors, one of which was that the page did at least one network request for each category of data. This meant that for a regular page at least 60 different network requests were made – half of those hitting the CakePHP server backend and/or the database. In terms of performance, this is one of the worst scenarios there is.

About half of those requests were AJAX requests initiated on page load. Each AJAX request went through the whole CakePHP stack to obtain data from the MySQL database and render a HTML fragment, which would then be inserted at the appropriate place in the DOM. We got rid of this byzantine architecture and made sure to only hit CakePHP once, if possible. The HTML that used to be obtained by AJAX request is now inserted into the correct place from the start. This eliminated 25 requests and significantly sped up the loading time. Additionally, we introduced a caching layer so that each page could be served from the cache instead of being generated (if possible). This reduced loading times even further.

Finally, we reduced the size of the HTML document we sent the user. The original page contained some very (very) large tables – some with thousands of rows. These were then made into interactive tables with pagination and search by using the excellent DataTables plugin. However, that also meant that the plugin would only display the first 10 or 20 records of the table, and the other hundreds of rows would be invisible even though they were initially transferred and rendered by the browser. We replaced this approach by a mechanism that stores the first page of the table's data in a JSON object, which is then used to dynamically create an interactive table when it scrolls into view. As soon as the interactive table is displayed, a background request is sent to pre-fetch the data on the following page. This way we prevent large HTML documents and large DOMs, the navigation still feels quick and snappy, and we only render the table when we really need to.

With this final optimization even the largest Country Profile Pages were getting acceptable response times.

Interested?

If you have a large project and you need a serious professional to extend or optimize it - look no further.