OSGeo Events, FOSS4G 2008

Font Size:  Small  Medium  Large

Providing access to terabytes of Earth Observation data in an international organization - infrastructure and services

Paul Hasenohr, Armin Burger

Building: Cape Town International Convention Centre
Room: Langjan Room (Room 2.6a)
Date: 2008-10-01 08:30 AM – 10:00 AM
Last modified: 2008-08-28


The Joint Research Centre (JRC) of the European Commission stores over 60 TB of low, medium, high and very high resolution satellite imagery in an heterogeneous manner. The scientific units within the JRC are the users of these data. Often they are also managing them either by choice or by lack of an alternative solution. An internal project called Community Image Data portal (CID) has been set up to rationalize the situation. One of its core activities was to create a central repository with catalogue, processing and dissemination facilities favouring the use of open source technologies and is now to keep it running and expand it further.
The user requirements have been collected by means of a survey last year and have been combined with the requirements from the IT department, from the management and finally from the CID team. We then decided on the approach to take in order to address these requirements and carry on with the implementation.
Earth Observation data users mainly require to have a central catalogue referencing all datasets available at the JRC, as well as a central archive which should have a back-up facility, provide fast file based access to data and be reliable. Furthermore, data in this central archive should be available via geographic web services and web mapping while using flexible authentication and authorization schemes. Meanwhile the IT department places a strong emphasis on network security as the data archived are available from outside the JRC via web services and from inside via file protocols. The CID team wishes to favour the use of open source software, to maximise the use of existing knowledge, and to choose a design allowing for the setup of scalable and modular systems (service-wise and performance-wise).
In order to address the storage and network security related requirements, we decided to opt for a NAS serving NFSv4 and CIFS protocols and to rely on the existing IT infrastructure for backups (dedicated NAS connected to a tape library). Minimal downtime and service reliability are ensured by having made almost all services highly available. This implies having an appropriate hardware infrastructure and setting-up all services in failover with possibly load balancing capacity. The flexibility required for authentication and authorization is met by keeping strictly separated the user directory, the authentication and authorization process and by having a Single Sign On to all web based services. With respect to data services, the approach taken has been to use open protocols (e.g. ftp, OGC) or standard ones such as ecwp and kml and to make sure that their implementation within the CID environment could allow for the enforcement of the Intellectual Property Rights applicable to the datasets being served.
After having gathered the requirements, analysed them and decided on the general approach to take, the actual implementation of the CID portal could start.
The hardware infrastructure currently consists of 22 servers out of which 14 are virtual ones running on two XenEnterprise servers. Apart from the two virtualization engines and three Windows servers, all machines are running Debian 4. The storage is provided by a NetApp FAS6030 purchased for this purpose. Due to the costs of this type of device it has been decided not to make it redundant for the time being. As a consequence this is the only critical component of the CID portal.
The base infrastructure relies on MySQL and PostgreSQL with PostGIS as database engines, on CAS, Kerberos, Samba and SASL for the authentication and on OpenLDAP as user directory.
The data dissemination services provided are WMS, WCS, KML (via PHPMapscript and Python Mapscript), CSW (via Geonetwork), ECWP (via Image Web Server) and FTP (via a version of PureFTP customized to work with virtual file structures). The web portal is built on Drupal, Typo3, p.mapper.

The paper will describe the user requirements, the overall architecture and its technical implementation with focus on the FOSS aspects. In conclusion we present a first review of user feedbacks.