Summer 2010 STEM Summer Scholars at Indiana University-Bloomington
Evaluation of Cloud Storage for Preservation and Distribution of Polar Data
Mentors: Marlon Pierce, Yu (Marie) Ma, Xiaoming Gao, and Jun Wang
Abstract: The team goal was to find a service that could both store large amounts of data that Polar Grid has collected, and also be sure that the data will be preserved for researchers of the future to continue to use the data. For this reason, the team looked to a cloud storage service for the solution. Cloud storage is the storing of data that is accessible as a service by the use of a network.
In this case, the team decided to research online storage using Amazon Web Services (AWS) and researched what AWS was, how reliable it was, how much data could be stored, and if data would be lost over an extended period of time. AWS is a cloud computing platform that is offered by Amazon.com that is made up of different computing services that are also known as web services. Within AWS, there is a service called the Simple Storage Service (S3) that is a user-friendly way of storing data over the Internet. The project shifted to investigate more about what is S3 and if it provided the services needed to aid PolarGrid. There were questions pertaining to S3 that the group researched. One of the questions was the guarantee of the reliability that S3 mentioned in their Service Level Agreement, which is the service terms promised to the user. Also, there was mentioning of a “durability” guarantee of the service by 99.9999999%. What did Amazon mean by “durability”? What does that percentile guarantee? Is that percentile guaranteed over a lifetime or only a few days? What is the likelihood of losing irreplaceable field data over various time scales (years, decades, and longer)?
Financially, the group was to investigate how cost efficient it would be for Polar Grid to use this service. Polar Grid uses 26 Terabytes and over 300,000 files, and it was the duty of the group to investigate how Polar Grid would be charged. Would be for how much data will be stored, how much time the data will be stored in this service, or both. For this project, the aim of the group was to have these questions answered so that Polar Grid may have a secure place to store its mounds amount of data.
Spring 2009-2010 Undergraduate Research Experience (website)
A Comparision of Job Duration Utilizing High Performance Computing on a Distibuted Grid
Mentor: Je'aime Powell
The Polar Grid team was taksed with testing the central manager system on Elizbeth City State University to ensure that it was prepared for grid computing. This was achieved by installing the Condor 7.4.0 client on iMac workstations computers located in Dixon Hall, Lane Hall, and E.V. Wilkins on the campus of Elizabeth City State University. Condor allowed jobs to be submitted to the central manager and distributed to one or more nodes. The job that the team submitted to Condor was compiled Sieve of Eratosthenes in C++ code. This code generated prime numbers from 0 to 500,00 and was essential in testing the job submission process. The compiled code that was used in the script files was submitted to the central manager through Condor. These jobs were then distributed to available nodes for processing.
After each successful job submission, log files were created to record statistical data. The data was of the elapsed time it took to process each individual job. The data from these tables were imported to Minitab, which was a statistical analysis software package. An analysis of variance (ANOVA) was then performed to determine if the elapsed times of the submissions varied within a 5 percent level of significance. From ANOVA, statistical evidence proved that by increasing the number of nodes, the elapsed time would decrease; therefore showing a performance increase.
Summer 2009 Undergraduate Research Experience (website)
A Comparative Analysis of Localized Command Line Execution, Remote Execution through Command Line, and Torque Submissions of Matlab(R) Scripts for the Charting of CReSIS Flight Path Data
Mentor: Je' aime Powell
Abstract: The Polar Grid team was tasked with providing the Center for the Remote Sensing of Ice Sheets (CReSIS) with data that would allow signal processing through the CReSIS Synthetic Aperture RADAR Processor (CSARP) to utilize clustered computing resources without the need of MATLAB’s® proprietary Distributed Computing Environment. This research centered on the use of MATLAB® through command line, and scripted distribution through TORQUE high performance computing scheduling.
The team used flight path information from the Greenland 2007 field deployment. This data was imported into MATLAB® so that they could be converted from text files into actual MATLAB® script files. With these MEX files, the team was able to create a script within MATLAB® that could plot the flight path data into a graph with the axes of the graph being labeled latitude for the x-axis and longitude for the y-axis.
The team took the master script for the creation of the chart and ran jobs through the command line of MATLAB® to Madogo [Elizabeth City State University’s Cluster] and Quarry [Indiana University’s Cluster]. The team was then able to compare execution times from the jobs of Madogo versus Quarry. A second comparison was then tested with TORQUE job submission versus MATLAB® submission to see which performed with greater efficiency. Lastly the average execution times of all three data sets were statistically compared with a 5% significance level to determine if there was a statistically significant difference between the use of command line jobs verses TORQUE submission. The paper focuses upon the procedure used in order to complete the research along with the conclusion reached.
Spring semester 2009 Undergraduate Research Experience (website)
A Study of pH, Salinity, and Clarity of Water Samples from Various Locations Around the World
Mentor: Kaiem L. Frink
Abstract: Water Quality content varies from community to community and from throughout the world. A number of variances are natural differences caused by nature and some are induced by humans. The research team processed and compared the water quality results against the North Carolina Administrative Code, past CERSER research group results, local surrounding counties, statewide, countrywide and International water quality standards for samples from Austin Texas, Barcelona Spain, Pasquotank County, North Carolina, Indianapolis, Indiana & Lake Cavalier Portsmouth, Virginia. Microsoft Excel which had statistical software capabilities within this package were used to display team results. (In order to illustrate these differences between Austin Texas, Barcelona Spain, Pasquotank County, North Carolina, Indianapolis, Indiana & Lake Cavalier Portsmouth, Virginia). The team examined and utilized LaMotte testing kits, then measured pH which displayed acidity. Salinity, Dissolved Oxygen, Alkalinity, Hardness, Nitrate, Conductivity, DiST WP was also tested. The team utilized Google Earth for creating an aerial dataset utilizing KMZ scripting functionality. The team also linked and configured the Microsoft Excel Spreadsheet to communicate with Google Earth. The 2009 Water Quality Team developed the WSR Forms (Water sample request form) to record and maintain Water Quality data integrity.
|