Science Gateways Institute NSF NIA CERSER ECSU
SGI@ECSU
1
1
1
Vision, Goals, Science Grand Challenges
Science today is rarely an isolated activity. Researchers work in large, distributed, sometimes multinational teams. They use shared instruments and high-end computing facilities. They interact with real-time sensor data and catalogued data collections. Many have adapted to the increasingly digital and virtual nature of science and science teams in their own ways, with varying levels of sophistication. Often the goal is to accomplish an immediate task, with little time for planning a more robust infrastructure. We survey here some of the outstanding previous work and remaining challenges that motivate our proposed institute and our choices for advisory panel members.

Bioinformatics: CIPRES (Mark Miller)
DNA sequencing tools have provided a flood of new data that represent great opportunities for new scientific discoveries. However, the massive quantities of data overwhelm current pipelines and analysis methods. In phylogenetics, the limiting factor now is no longer sequence data, but computational power. The highly optimized analysis codes available through CIPRES have exploded in popularity. The gateway's reputation has spread virally. Users come from 6 continents. 1000 new users are supported every month with very low overhead. Over 25% of all XSEDE users charging jobs do so via the CIPRES gateway. As one user writes, "instead of 10 days to complete all the analyses with a few local computers, I have publication quality trees overnight." This type of enhanced productivity truly allows researchers to consider new science questions.

Astronomy: The Dark Energy Survey (August Evrard)
The science areas of cosmology and astrophysics are primarily sourced by light collected by large telescope projects. Like other science areas, two factors driving the need for science gateways are i) the "beyond petabyte" scale of the catalog and images being globally produced, and ii) the ability to generate sophisticated simulations that connect the light collected to the underlying physical and measurement processes, starting from the source and following the enormous path-length taken by radiation until it reaches the telescope itself. The large scale of such efforts naturally entails multinational teams. While challenges for data collection and distribution at this scale are widely recognized, science analysis support from simulations is emerging as a new challenge. Especially important for new science discovery will be the ability of simulations to react to data analysis in near real-time. Simulation support is currently done post-facto, reacting to the new data on timescales of months. Science gateways coupled to massive back-end computing could shorten that timescale to weeks or days, enabling new methodologies for survey analysis that could dramatically accelerate new discoveries in cosmology and astrophysics.

Biomedicine and Genomics: The Galaxy Project (James Taylor)
Galaxy [18-20] is used worldwide for data-intensive biomedical research and genomic analysis, with a variety of options for back-end computing platforms. Galaxy developers observed both the lack of software expertise of biomedical researchers (neither programming nor software engineering are taught as part of curricula) and the lack of reproducibility of simulations in the life sciences. Galaxy provides a large collection of robust tools, supports data provenance, captures the context of a simulation so that it may be both communicated and reproduced, and supports data sharing among researchers.

Nanotechnology: nanoHUB (Michael McLennan)
NanoHUB.org [21, 22], a science gateway for the nanotechnology community, has seen its number of active users double almost every year since its inception. In the past 12 months alone, nanoHUB.org served more than 396,000 visitors from 172 countries. Of those, a core audience of 192,000 spent at least 15 minutes on the site viewing seminars, downloading teaching materials, or interacting with simulation/modeling tools submitted by a worldwide community. More than 11,000 registered users launched a total of 405,000 simulation jobs during the past 12 months. nanoHUB.org has been widely used in both research and education. There are more than 700 citations in academic literature to nanoHUB.org and its tools, seminars, and other resources. nanoHUB.org resources have been used in 379 courses at 131 institutions of higher education.

Computational Chemistry: GridChem and ParamChem (Alex MacKerell)
Computational chemistry is a rapidly expanding field, spanning time and length scales as well as domains from material science to biology. When studying fatigue in materials analysis methods range from ab initio and atomistic methods to structural dynamics [23]. Science Gateways such as GridChem and ParamChem link multiple codes so that each scientist does not need to address input and output conversions. These gateways are also effective ways to launch and monitor large parameter optimization studies.

Earthquake Engineering: NEES (Rudi Eigenmann)
The NSF-sponsored George E Brown, Jr. Network for Earthquake Engineering Simulation (NEES) is another example of a thriving science gateway. In 2009, operations for the NEES project moved to Purdue University, at which point the existing web infrastructure was replaced with the HUBzero platform. Since then, the NEES.org site has grown to support more than 96,000 visitors and 20,000 active users per year. The NEES.org site offers 890 seminars, teaching materials, documents, and other resources, including 55 computational tools. The NEES.org site also houses data from more than 200 experimental projects in its Project Warehouse area, and 8 other well-known databases related to structural engineering and earthquake damage. [24]

Nuclear Physics: Advanced Workflows (James Vary)
Vary's group at Iowa State, with support from NSF's PetaApps and DOE's INCITE programs, uses some of the nation's largest supercomputers for their high-energy nuclear theory analyses. Vary is interested in improving the ability of graduate students and post docs to perform productive research with forefront tools. Large-scale data analysis and large-team collaborations are also areas where Vary is looking to science gateways for a solution.

Pharmacology: pharmaHUB
The pharmaHUB.org site for pharmaceutical engineering was seeded by NSF in 2007, but has grown to serve industry, academia, and the Food and Drug Administration (FDA). [25] In the past 12 months, pharmaHUB.org served more than 28,000 visitors and a core audience of 19,000 users. In partnership with the National Institute for Pharmaceutical Technology and Education (NIPTE) and the FDA, it houses an Excipients Knowledge Base with product reference data, test methods, and experimental measurements. This project is developing rigorous multi-scale models for predicting how a drug product, such as a tablet, will break apart and dissolve, and how it will make its way into the bloodstream of statistically varying virtual patients based on appropriate pharmacokinetic and pharmacodynamic models.

 
1
1