resource management portal · nisha kurkure c-dac, india [email protected] . c-dac in hpc . high...
TRANSCRIPT
Resource Management Portal “CHReME” with Web Interface for
Scientific Users
Nisha Kurkure
C-DAC, India
C-DAC IN
HPC
High Performance Computing
National PARAM SuperComputing Facility.
10 Gbps PARAMnet III.
Reconfigurable Computing Systems (RCS): Bio-informatics sequence search commercial deployment.
Design and Deployment of High end HPC systems at various premiere institutions like IITM-Pune, JNU Delhi, TIFR Mumbai, BDU Tiruchirapally, NIO Goa, NCL Pune.
Maintenance and Application Support for HPC systems at University of Hyderabad, NCMRWF, JNU Delhi, IIT Delhi, NIO Goa, IITM Pune.
Deployed Grid File System prototype on Grid test bed between Pune and Bangalore. GFS Addresses data location independence for HPC applications.
End-to-End Solutions
Enabling Technologies
HPC, Language Computing, Speech Technology, e-Security, Geomatics,
Ubiquitous Computing, Embedded Systems, VLSI, Broadband &
Wireless, Software Technologies, …
Science & Engineering, Strategic Sectors, Health, e-Governance,
Education, Power & Industrial Sector, Agriculture, Rural Areas
C-DAC Activities
Need to develop CHReME…
Growth of Parallel Applications has posed a serious challenge of efficient, easy usage and management of resources of HPC systems
In domain specific organizations, Installation and porting of relevant applications is very difficult as the activity is computational in nature
Very Tedious for users to run heterogeneous scientific application codes using command line
Facilitate ease of use of large clusters as well as their increased adoption
Ability to address Large and varied Data sets efficiently
Handle the work load Ever increasing users (Novices????)
About CHReME – Salient Features
Developed on an open source platform and ecosystem
Fully integrated environment with Industry standard Schedulers
User friendly web based GUI providing access to various cluster resources
Submission, monitoring and management of jobs through GUI
Easy and secure access to all the cluster resources from remote host machine
Optimum Utilization of Cluster Resources
CHReME – Snapshot
CHReME – Additional Features
Creation and Management of various Cluster Resources viz. Queue, Parallel Environment and Users
Timely Email notification regarding job status
Provides application specific interface for pre-compiled applications in various domains viz. Weather Forecasting
Fully customizable - Scope for integrating numerous HPC Applications
Weather Research Forecast Model (WRF)
The Weather Research and Forecasting (WRF)
Model is designed to serve both operational forecasting and atmospheric research needs
It features multiple dynamical cores, a 3-
dimensional variational (3DVAR) data assimilation system, and a software
architecture allowing for computational parallelism as well as system extensibility
WRF is suitable for a broad spectrum of applications across scales ranging from meters
to thousands of kilometers
WRF & CHReME Portal
CHReME WRF Portal is an integrated solution with WRF application execution
interface
CHReME WRF Portal guides users through the entire cycle of execution i.e.
Pre-processing (WPS)
Running WRF model
Post-processing of the model
GUI facilitates users with a modular and seamless approach for WRF execution by
Creating various workflows
Storing the workflow execution stages
Setting up of environment variables
Submission of the WRF execution jobs
Monitoring status of the workflows
Visualization of the output.
WRF Portal: Workflow Creation Snapshot
WRF Portal: Workflow Creation
For creating a new workflow, user has to specify a name for the workflow
Workflow name is a single name given to all the steps the user performs for running WRF model i.e.
Geogrid
Ungrib
Metgrid
Real.exe
Wrf.exe
After entering the workflow name, the user will be directed to the step of configuring the workflow details.
Initially once, the user will create a workflow and perform all the steps, further the user has to just load the workflow and run the model
WRF Portal: Geogrid run snapshot
WRF Portal - Geogrid run
The user shall enter the values required for running the geogrid
The user can view the namelist.wps file which is present in the WPS directory.
The namelist.wps file is edited by the portal according to the values entered by the user
For browsing the folder path the user is provided with a browsing feature
After pressing Execute button, the user will be redirected to a page where the user can see the running status of geogrid, the job of running geogrid is done through the scheduler.
Following are the outputs created after running geogrid.
The netcdf files generated after running geogrid.
The log generated after running geogrid.
The Processing details generated while running geogrid.
The error generated while running geogrid.
The status can be monitored from the job progress area
WRF Portal - Ungrib run snapshot
WRF Portal - Ungrib run
Enter the values required for running ungrib
After pressing “execute” the user will be directed to a page where one can view the running process of Ungrib, done through the scheduler
Following are the outputs created after running ungrib
The files generated after running ungrib
The Log generated after running ungrib
The Processing log of Ungrib
The errors generated while running Ungrib
After successful execution of ungrib, a link for executing metgrid would be highlighted
WRF Portal: Metgrid run snapshot
WRF Portal: Metgrid run
The user will be asked to enter the run directory path of WRF
“Run” the metgrid
After execution of the metgrid the following output files are created:
The met files generated after running metgrid
The log generated
The processing log
The error generated while running metgrid
The user has the option of re – running metgrid or proceeding to the next step of running “real.exe”
WRF Portal: Run real.exe snapshot
WRF Portal: Run wrf.exe snapshot
WRF Portal: Run WRF Model
The user can select the number of processors for running “real.exe”
After pressing “Run” the user will be redirected to page where he can check the “status of the run” of “real.exe”
After running the “real.exe” following outputs are created :
The files generated after running “real.exe”
The output log generated as the job is submitted to the scheduler
The error log generated while running “real.exe”
Update the namelist.input file to run “wrf.exe”
User can re – run “wrf.exe” by pressing Re – run wrf.exe
User can store the namelist.wps and namelist.input files after running the entire model for future reference
At any time, the user can load another workflow to run the WRF model with the entries specific to that workflow
The user can only delete the workflow created by him
Advantages of CHReME
Scientific users can now concentrate on their respective domains without having
to worry about computational aspects of supercomputing facilities.
It Scales to the environments with thousands of users that can access this Portal
through various clusters
Impact on the existing network and cluster infrastructure is minimal:
The HPC Access Portal server can be completely separate from the cluster, as it uses SSH to communicate with HPC Cluster Scheduler
Built using JSP/Struts, a specification making the server installation portable and scalable
Integrating new Parallel Applications and additional tools to those that are
provided by HPC Resource Access and Management Portal
The HPC Portal facilitates submission and monitoring of the job and hence freeing
users from the strenuous tasks of command line submission and monitoring
It also helps the Administrator to Monitor the entire Cluster Users, Job submitted by the users, Cluster Resources and various reservations by the user that are
being used
Future Work
Development of a application integration module for various other domains
like Bioinformatics, Oceanography, Computational Chemistry etc. with customised GUIs.
Support for advanced reservation policies of resource manager.
Application level check-pointing feature.