CNGrid GOS*: China National Grid Software
Abstract
CNGrid GOS is a suite of grid software with independent intellectual property rights, which supports the China National Grid running environment. CNGrid GOS is an important achievement of China National Grid software research and development project which is supported by the Hi-Tech Research and Development (863) Program of China. This paper introduces the architecture, functionalities, and innovation of the various parts of CNGrid GOS, including a system software, a CA certificate management system and a testing environment, three business version of sub-systems (high performance computing gateway, data grid, and grid workflow), and a monitoring system.
Keywords
China National Grid (CNGrid), Grid Operating System (GOS), High Performance Computing Gateway (HPCG), Data Grid (CORSAIR), Grid Workflow, Grid Monitoring System (CNGridEye).
China National Grid (CNGrid) is a major project supported by the Hi-Tech Research and Development (863) Program of China. CNGrid is the new generation test-bed of information infrastructure aggregating high-performance computing and transaction processing capabilities. Through resource sharing, work in coordination, and service mechanism, CNGrid effectively supports many applications such as scientific research, resource environment, advanced manufacturing, and information services. CNGrid promotes the construction of national information industry and the development of related industries by technological innovations.
China National Grid Software, named CNGrid GOS, is a suite of grid software with independent intellectual property, which is developed by CNGrid software R&D project team. The relationship of CNGrid environment and GOS software is illustrated in Figure 1.

Figure 1. The relationship of CNGrid GOS software and CNGrid environment
CNGrid GOS mainly includes a system software, a CA certificate management system and a testing environment, three business version of sub-systems (high performance computing gateway, data grid, and grid workflow), and a monitoring system. This project is undertaken by seven organizations including Institute of Computing Technology of Chinese Academy of Sciences, Jiangnan Institute of Computing Technology, Tsinghua University, National University of Defense Technology, Beihang University, Computer Network Information Center of Chinese Academy of Sciences and Shanghai Supercomputing Center.
CNGrid GOS system software (VegaGOS) provides functionalities including global naming management, VO management, user management, resource management, application runtime management and so on. The VegaGOS has many important innovations in global naming management, distributed resource management, virtual organization (agora), grid process (grip) technology, grid security mechanism, supporting a variety of domain applications, etc.
(1) Naming. Naming is a decentralized and name-stable global object (Gnode) management system. Naming supports locating objects by the global unique identifier with the feature of low latency and high success ratio; Naming also supports object searching based-on attribute-match with the feature of low latency and high recall ratio. Naming is a fundamental component in VegaGOS to construct the whole system. As a reusable component, Naming forms a global layer of virtual names to solve the problem of non-stable of physical address and tight coupling between applications and resources.
(2) Resource Management. Resources in VegaGOS are in various forms, and are accessed in different ways. It is really difficult to describe and manage those heterogeneous resources. The introduction of resource controller mechanism (RController) is in order to import and manage various heterogeneous resources in a unified way. RController provides many functions for resources like create, destroy, access control, access, read and write properties, etc.
(3) VO Management. Virtual organization in VegaGOS, called Agora, supplies distributed resources, users and access control policy management, and has the characteristic of single sign-on and single system image. Agora, as a common trusted third-party super-organization, achieves the unified cross-domain access control mechanisms while keeping autonomy.
(4) Grid Application Runtime Management. Grid applications need to maintain the identities of users to support access control implementation during runtime. In VegaGOS, Grid Process technology, which is abbreviated to grip, is not only maintains the user identities and other application runtime context, but also manages resources occupied by the application and supports a number of applications collaborations.
Figure 2 shows the runtime architecture of VegaGOS, which illustrates the runtime interaction of the above key innovations.

Figure 2. VegaGOS application runtime management architecture
(5) Application Level Tools. VegaGOS provides a wealth of application level tools in order to support the traditional command-line mode in high-performance computing and to make it have grid characteristics, including Portal/GShell/VegaSSH/GOSClient. Portal provides users with friendly operation interface based on Web, and facilitates users to use VegaGOS. GShell is a grid shell like a GNU bash environment, to support the application running with a grip; VegaSSH supplies single sign-on to any grid node to use the back-end high performance computing resources; GOSClient is a set of client tools including GShell and can be installed independently to use VegaGOS system.
Affiliation: Institute of Computing Technology, Chinese Academy of Sciences
Address: No.6 Kexueyuan South Road Zhongguancun, Haidian District Beijing, 100190
Telephone: 010-62600969 Fax: 010-62600900
Contact: Li ZHA Copywriter: Xiaoyi LU
1. CA Certificate Management System
The CA of CNGrid provides the digital certificate service for all users, resources and applications in Grid, grants certificates for the testing and the formal environments, and supports certificate reclamation and status query.
The CNGrid CA system consists of Certification Authority and Registration Authority in different levels. The system adopts multi-certificates architecture, in which the top-level Certification Authority signs the certificates of low-level Certification Authority.
The servers in CA system are organized as three layers:the Web server, the Function server and the DB server. All servers run the Linux OS and we take PC with Windows OS as the client management platform.

Figure 3. CNGrid CA software architecture
CNGrid CA software architecture is illustrated in Figure 3.
The CA system provides full functionalities on web portal, such as certificate application, generation, distribution, revocation, status query and management.
(1) Analyzing certificate application information, generating, distributing and reclaiming digital certificate.
(2) Combination of RA distributed inspection and RS centralized inspection.
(3) Certificate store management.
(4) User management, log management, security audit and security management.
(5) Data backup and recovery.
(6) System key management.
(7) Certificates download, status query and validation.
(8) Standardizing certificates including the extending items of user requirement.
2. Testing Environment
The popular testing notion and modern testing management tools have been used to CNGrid GOS integration and testing environment. By the mean of rigorous software re-engineering procedure, the users will get a full-featured, high-performance and reliable GOS system. In order to achieve these goals, all perspectives of GOS have been completely tested.
(1) GUI testing ensures that GOS meets the requirements specification, especially the uniformity, usability and effective of GUI. GOS also should provide the users on-line help and operation tips.
(2) Functionality testing focuses on ensuring functionality requirements of GOS. Most of the test cases are automated.
(3) Performance testing determines the response time, the throughout and the number of concurrency of GOS. The performance analysis results will help the system developers to improve the system performance.
(4) Reliability testing evaluates how long GOS can run properly under a heavy workload (≥90%). It ensures GOS can provide reliable grid services.
(5) Compatibility testing evaluates GOS compatibility with the OS environments, host environments and the client environments. It ensures GOS runs smoothly in specified environments.
(6) Usability testing promotes the GOS as an easy-to-use and attractive software production.

Figure 4. CNGrid GOS integration and testing environment
Affiliation: Jiangnan Institute of Computing Technology
Address: No. 031 of P.O.Box 33, Wuxi, Jiangsu, 214083
Telephone: 0510-85155200 Fax: 0510-85155197
Contact: Hailiang WEI Copywriter: Hailiang WEI
High Performance Computing Gateway (HPCG) is a set of system services and application software developed upon VegaGOS to support high performance computing. HPCG has integrated the computing resources and storage resources of more than ten computing centers in the CNGrid. HPCG aims to supply non-professional users with "professional" scientific computing environment. HPCG is composed of many related system services, plus user interfaces including web portal, command line interfaces and APIs. The system services include batch job service, file management service, message service, user-mapping service, and accounting services. Through the different composition of these services, HPCG meets various high performance computing requirements of users. Characteristics of HPCG are as follows.
1. Full-featured
(1) Batch Job Service
It enables transparently submitting jobs to multiple high-performance computing centers and provides flexible and efficient mechanism for getting the job status.
(2) File Management Service
It enables remotely managing files and editing small files online. It can also support reliably synchronous or asynchronous file transfer adapted to the firewall settings.
(3) Accounting Service
It provides efficient resource usage accounting statistics and supports the global accounting.
2. Facilitated Integration
(1) APIs. Based on rich libraries, high-performance computing applications could be easily customized;
(2) Job Template. Based on template technique of HPCG, importing and sharing high-performance computing software resources can be facilitated by only modifying some XML-based template files;
3. Friendly User Interface
(1) It provides both grid portal and grid shell for scientific computing users and resource providers.
HPCG aims to address requirements of grid batch jobs for the enterprise intranet users, and to provide feature-rich, user-friendly, running-stable scientific computing environment. Figure 5 shows HPCG deployment diagram in enterprises and computing centers.

Figure 5. HPCG deployment diagram in enterprises and computing center
Affiliation: Institute of Computing Technology, Chinese Academy of Sciences
Address: No.6 Kexueyuan South Road Zhongguancun, Haidian District Beijing, 100190
Telephone: 010-62600966 Fax: 010-62600900
Contact: Boqun CHENG Copywriter: Boqun CHENG
CORSAIR is a virtual file system manager that solves the stage in, stage out and data sharing problems in Grid. The data access and sharing service are provided to users transparently by CORSAIR. It means that users can use the data resources without needing to know the physical locations and can share resources without complicated configurations. The storage resources and access control are covered by CORSAIR. Its features are listed as follows.
(1) Local and remote resources are integrated and presented in a unified view.
(2) Parallel file transfer, transfer resuming and third-party transfer are supported.
(3) Resource management can be performed in a unified way. (E.g. copy, paste, sharing, etc.)
(4) Keyword searching service for resources in CORSAIR.
(5) Web-based community management is supported. (E.g. creation/demission of communities, adding/removing of users, etc.)
CORSAIR provides public recourses for any users, private storage for registered users and community sharing storage for communities. CORSAIR provides convenient management tools, with the help of which users can manage data resources in CORSAIR as simple as local files.
CORSAIR is composed of storage services, mapping services, management portal, and GUI management tool (GUI Man) and command line management tool (CMD Man) for clients. The system deployment is showed in Figure 6.

Figure 6. Deployment of CORSAIR
Affiliation: Tsinghua University
Address: Department of Computer Science, Tsinghua University, Beijing, 100084
Telephone: 010-62796341 Fax: 010-62797141
Contact: Yongwei WU Copywriter: Xiaomeng HUANG
Affiliation: National University of Defense Technology
Address: 601 Department of Computer Science, National University of Defense Technology, Changsha, 410073
Telephone: 0731-4573639 Fax: 0731-4556089
Contact: Nong XIAO Copywriter: Nong XIAO
CNGrid GOS workflow provides a suite of service-based and graphic workflow modeling and executing environment. It enables users to orchestrate services from distributed CNGrid nodes in the form of workflow in a visualized development environment and monitor the execution state in a browser. The main features are as follows.
(1) Powerful Workflow Modeling Capability. With supporting two kinds of workflow language standards: WS-BPEL and XPDL, workflow modeler can describe both automatic scientific computing process and human-activity-involved workflow for scientific and business computing. In the latter situation, people can participate in activities, observe outputs and intervene if necessary.
(2) Easily Access to Grid Services. With configurable service adapter, workflow modeler and workflow portal can connect to distributed grid nodes and provide a personalized service directory for users to view, assemble or execute.
(3) Process as a reusable Service. Process deployed in servers can be reused as a service in other process.
(4) Pluggable, Extendable Workflow Management Console. As a kind of distributed management mechanism based on web plug-ins, the console can provide unified monitoring and management for different workflow engines, with functions ranging from process definition category management to system configuration.
(5) Extendable Workflow Modeler and Engine. Easy to extend new activities in the workflow model and add relevant interpretation and execution modules as plug-ins in the engine.

Figure 7. CNGrid workflow modeling and executing environment
Affiliation: Institute of Computing Technology, Chinese Academy of Sciences
Address: No.6 Kexueyuan South Road Zhongguancun, Haidian District Beijing, 100190
Telephone: 010-62600957 Fax: 010-62600900
Contact: Houfu LI Copywriter: Houfu LI
Affiliation: Beihang University
Address: P.O.Box 7-28, No.37 Xueyuan Road, Haidian District, Beijing, 100191
Telephone: 010-82339679 Fax: 010-82339679
Contact: Chunming HU Copywriter: Chunming HU
CNGridEye is a system offering resources monitoring and accounting services for China National Grid (CNGrid). CNGridEye collects the status of distributed, heterogeneous and dynamic resources inside CNGrid and uses collected information to support upper-layer processing such as job scheduling, failure detection, etc. CNGridEye offers powerful accounting functions using the accurate records of resources usage information to support the daily operation and QoS enhancement of CNGrid. The architecture of CNGridEye is shown in Figure 8.

Figure 8. CNGridEye architecture
CNGridEye has following features.
(1) Using an integrated architecture to monitor cross-domain and distributed resources.
(2) Supporting several different info-models to offer complete monitoring metrics from host, cluster, node and grid vision.
(3) Supporting many different kinds of resources for monitoring such as hardware, network and services/application and different job management systems such as OpenPBS, LSF and OAR.
(4) Offering powerful failure detection and processing functions.
(5) Monitoring Grid operation system (GOS) and helping to ensure its stable operation.
(6) Monitoring network status between CNGrid nodes to find possible bottlenecks or failures.
(7) Offering powerful user interface and supporting user to customize different kinds of charts.
(8) Supporting distributed accounting and flexible billing strategy.
Affiliation: Beihang University
Address: No.37 Xueyuan Road, Haidian District, Beijing, 100191
Telephone: 010-82315908 Fax: 010-82328077
Contact: Zhongzhi LUAN Copywriter: Zhongzhi LUAN
l For system deployment:
l For system management:
l For system development:
l CNGrid GOS Propagation Brochure:
CNGrid GOS Propagation Brochure (Chinese)
CNGrid GOS Propagation Brochure (English)