Engineering Seminar Topics :: Seminar Paper: Internet Distributed Computing

ABSTRACT

The confluence of Web services, Peer-to-Peer systems, and grid computing provides the foundation for internet distributed computing- allowing applications to scale from proximity ad hoc networks to planetary-scale distributed systems. This concept will make internet as a application hosting platform. To make the internet as a application hosting platform we have to provide additional functional layer to the internet that can allocate and manage the resources necessary for application execution. Given such a hosting environment, software developers could create network applications without having to know at design time the type or number of systems the application will execute on.

The foundation of our proposed approach is to disaggregate and virtualize individual system resources as services that can be described, discovered, and dynamically configured at runtime to execute an application. Potential of distributed computing are

a. Resource sharing and load balancing

b. Information sharing

c. Incremental growth

d. Reliability, availability, and fault tolerance

e. Enhanced performance potential

INTRODUCTION

The World Wide Web’s current implementation is designed predominantly for information retrieval and display in a human readable form. Its data formats and protocols are neither intended nor suitable for machine-to-machine interaction without humans in the loop. Emergent Internet uses – including peer- to- peer and grid computing – provide both a glimpse and impetus for evolving the Internet into a distributed computing platform.

What would be needed to make the Internet into a application-hosting platform. This would be a networked, distributed counterpart of the hosting environment that traditional operating system provides to application in a single node. Creating this platform requires additional functional layer to the Internet that can allocate and manage resources necessary for application execution.

Given such a hosting environment, software designers could create network application without having to know at design time the type or the number of nodes the application will execute on. With proper support, the system could allocate and bind software components to the resources they require at runtime, based on resource requirement, availability, connectivity and system state at actual time of execution. In contrast, early bindings tend to result in static allocations that cannot adapt well to resource, load and availability variations, thus the software components tend to be less efficient and have difficulty recovering from failures. The foundation of proposed approach is to disaggregate and virtualize.

System resources as services that can be described, discovered and dynamically configured at runtime to execute a application. Such a system can be built as a combination and extension of Web services, peer-to-peer computing, and grid computing standards and technologies, It thus follows the successful internet model of adding minimal and relatively simple functional layers to meet requirements while atop already available technologies.

But it does not advocate an “Internet OS” approach that would provide some form of uniform or centralized global-resources management. Several theoretical and practical reasons makes such an approach undesirable, including its inability to scale and the need to provide and manage supporting software on every participating platform. Instead, we advocate a mechanism that supports spontaneous, dynamic, and voluntary collaboration among entities with their contributing resources.

DISTRIBUTED COMPUTING

Distributed computing is a science You can define different ways. Various vendors have created and marketed distributed computing systems for years, and have developed numerous initiatives and architectures to permit distributed processing of data and objects across a network of connected systems.

One flavor of distributed computing has received a lot of attention lately, and it will be a primary focus of this story--an environment where you can harness idle CPU cycles and storage space of tens, hundreds, or thousands of networked systems to work together on a particularly processing-intensive problem. The growth of such processing models has been limited, however, due to a lack of compelling applications and by bandwidth bottlenecks, combined with significant security, management, and standardization challenges. But the last year has seen a new interest in the idea as the technology has ridden the coattails of the peer-to-peer craze started by Napster. A number of new vendors have appeared to take advantage of the nascent market; including heavy hitters like Intel, Microsoft, Sun, and Compaq that have validated the importance of the concept. Also, an innovative worldwide distributed computing project whose goal is to find intelligent life in the universe--SETI@Home--has captured the imaginations, and desktop processing cycles of millions of users and desktops.

DISTRIBUTED COMPUTING’S POTENTIAL

System designers have known for years that distributed computing offers many benefits including the following: -

1. Resource sharing and load balancing: providing efficient and responsive resource utilization.

2. Information sharing: permitting remote data access.

3. Incremental growth: delivery aided capacity when and where needed.

4. Reliability, availability, and fault tolerance: achieved through redundancy and dynamic allocation.

5. Enhanced performance potential: derived from parallel operation.

ENVIRONMENT AND DRIVERS

The Internet’s maturity and unprecedented popularity provide nearly ubiquitous computer connectivity in terms of both physical connection and platform independence, interoperable communication protocols. The confluence of several technological developments and emerging usage modes provides motivation for evolving the Internet into a distributed computing platform.

Streamlined e-business, with its increasing reliance on automation, depends on direct machine-to-machine transaction. Moore’s law promises a continuing supply of compute capacity that will form the foundation for distributing intelligence throughout the Internet.

Other factors motivating Internet distributed computing design includes a desire to use computing resources efficiently and hide the complexity inherent in managing heterogeneous distributed systems. These factors lead to a reduced total cost of ownership. The emergence of utility computing, autonomic computing and massively parallel applications – such a peer-to-peer grid-like formations that aggregate resources from millions of personal computers – demonstrates the need for and feasibility of large scale resource sharing.

The evolution of Internet distributed computing will likely include pervasive and proactive computing. The pervasive computing vision postulates an order-of-magnitude greater-scale Internet composed of such diverse entities as sensors, cars, and home appliances. Increased scale will require ad-hoc opportunistic configurations, as well as expanded naming and addressing schemes, such as IPv6. it will also false application designs that consider intermittent node availability a default operating assumption rather than a failure mode.

REQUIRMENTS

Key requirements for distributed computing in the environment we describe include support for heterogeneity and the ability to scale from a proximity-area network’s relatively few devices to many devices up to a global scale. The explosive developments in wireless technology make support for mobility another important requirement. These developments will continue to result in a significant increase in the number of devices and the form factors that need to connect to each other and to service and data.

PROPOSED APPROACH

Our approach attempts to identify key design principles and layers of abstraction that must be provided to give the Internet network-application-hosting capability. Following that, our implementation strategy builds on and reuses as much of the work in related standards and emerging technologies as possible.

Architecturally, the central idea is to virtualize resources that a node may want to share, such as data, computation cycles, network connections, or storage. The system adds available resources to the networked pool from which resources needed to complete the given task-such as executing an application-can be aggregated dynamically.

An aggregated collection of resources operates as an assembly only for the period required to complete the collective task, after which the resources return to the network pool. The network can be either public or private, as long as it uses internet-standards-compliant protocols and mechanisms for internodes communication and other services.

We assume that nodes willing to share a certain sub-set of their resources use mechanisms to announce their availability, possibly stating the terms of usage, to the rest of the distributed system or systems in which their willing to collaborate. Individual nodes may be motivated to share resources for several reasons, such as to provide or gain access to unique data or to trade temporally under utilized resources for profit or for the benefit of being able to draw upon the collective resources to handle their on peak loads. In addition, a company or an organization can mandate a certain modality of resource sharing to improve its over all asset utilization and realize higher effective aggregate computation power.

Even though this nascent area has only a few real life implementations, several benefits have already been observed in practice. For example researchers performed the largest computation on records as of this ratting-with 1.87*10^21 floating point operation-by aggregating the resources of millions of PCs whose owners donated their machines unused cycles to a scientific project.

IDC DESIGN PRINCIPLES

We believe that two key design principles go far towards meeting the requirements of Internet distributed computing: embedding intelligence in the network and creating self- configuring, self- organizing network structures.

Other researchers experience and our own experiments indicate that distributing intelligence throughout the network tends to improve scalability. The self-organizing and self-configuring aspects contribute to both the scalability and resiliency by creating ad hoc networks of currently available nodes and resources. Complex and rigid fixed configuration networks are neither scalable nor manageable enough for use in linking wireless devices that move in and out of range and must power down intermittently to conserve energy.

Improved scalability and performance result largely from the preference for use of local resources. Using local resources tends to shorten communication between the data source, its point of processing, and optional presentation to the user. This, in turn, results in reduced latency and bias towards consuming edge bandwidth rather than the backbone bandwidth needed to communicate with remote centralized servers-a behavior typical in client-server systems. Further, this trend is especially valuable in a short- range wireless network configuration such as Blue tooth, 802.11x, and the emerging ultra wideband technologies. These configurations can benefit from high-bandwidth, low-latency communication within a cell or in a close proximity to it. More over, infusing intelligence into the network tends to distribute the load across many processing points, as opposed to creating congestion at a few heavily used hot spots.

Resource virtualization

Although resource virtualization is a generally useful concept, we concentrate primarily on its use for resource sharing in distributed systems. In that context, resource virtualization can be thought of as an abstraction of some defined functionality and its public exposure as a service through an interface that applications and resource managers can invoke remotely.

We consider a service to be a virtualized software functional component. Services can be advertised and discovered using directories and inspection. Once discovered, an invoking entity can bind to the selected service and start communicating with its externally visible functions, preferably via platform-independent protocols. Each such virtualized component can be abstracted, discovered, and bound to.

These concepts can be extended to virtualize hardware resources, such as compute cycles, storage, and devices. This deceptively simple extension transforms the software component model into a distributed component model of immense power that may not be immediately obvious. Carrying its application to a logical extreme, we can envision a planetary-sized pool of compassable hardware and software resources connected via the Internet that is described and accessible using service abstractions.

Resource discovery

Discovery is a fundamental IDC component, simply because the system must find a service before it can use it. Traditional static systems resolve discovery implicitly or fix it at configuration or compile time.

To accommodate both mobile and fixed-location but dynamically aggregated environments, IDC must support a flexible service advertisement and discovery mechanism. Applications can discover services based on their functionality, characteristics, cost, or location. Dynamic discovery enables devices to adaptively co-operate and extended functionality so that the whole becomes, ideally, greater then the sum of its parts.

Dynamic configuration and runtime binding

Dynamic configuration depends upon the capability to bind components at runtime, as opposed to design or runtime. Deferred or runtime binding can be implemented with the assistance of service directory mechanism. Its primary benefit is decoupling application design from the detailed awareness of the under laying system configuration and physical connectivity. In effect, dynamic configuration facilitates application portability across a wide range of platforms and network configurations. It also allows decoupling of development of the service user-code from the service- provider code, unlike the tight, tandem development with the distributed shared logic in client-server systems.

Runtime binding also enables desirable system capabilities such as

1. Load balancing: By binding to the least-loaded services from a functionality equivalent group

2. Improved reliability: By binding to service explicitly known to be available at invocation.

In ad hoc and self-configuring networks, runtime binding facilitates adaptive peer configurations in settings with high node-fluctuation rates, such as 802.x hot spots or clusters of wireless sensors that power down intermittently to conserve power.

Resource aggregation and orchestration

We define resource orchestration as the control and management of an aggregated set of resources for completing a task. The term also includes the communication and synchronization necessary for co-ordination and collation of partial results. Once a task completes, the system can release resources back to the pool for allocation to other users. Operating systems commonly used this approach, which IDC extends to the network scale.

Using run time resource aggregation to meet application requirements implies that applications must be designed so that they can state their resource requirements explicitly- or atleast provide hints to the execution system that let it generate a reasonably efficient estimate.

IDC BUILDING BLOCKS

Mindful of the successful Internet model that builds on existing standards and available technologies, we have sought areas that could provide ideas, technology or standards to re-use or adapt for IDC implementation. A combination of web services, Peer-to-Peer systems and Grid Computing and a useful and powerful collection of building blocks.

Web Services

Web services are self contained, loosely coupled software components that define and use Internet protocols to describe, publish, discover and invoke each other. They can dynamically locate and interact with each other web services on the Internet to build complex machine-to-machine programmatic services. These services can be advertised and discovered using directories and registries such as the Universal Description, Discovery, and Integration (UDDI) specification or inspected using the Web Services Inspection Language.

To be useful, each discovered component must be described in a well-structured manner. The Web Services Description Language (WSDL) provides this capability, although deriving semantic meanings from a service presents some practical difficulties. The semantic web initiative addresses this problem. Using WSDL, an invoking entity can to the selected services and start communication with its externally visible functions via advertised protocols such as SOAP.

XML, which Web services uses extensively, supplies standards type system and wire format for communication, an essential for platform independent data representation and interoperability. These concepts can be extended to hardware to virtualize resources such as compute cycles, storage, and devices.

Web services can provide some key IDC requirements such as service description, Discovery and Platform independent invocation using either remote procedure calls or message based mechanisms. This also provides support for runtime binding.

Peer-to-Peer Computing

Providing both a design approach and a set of requirements, peer-to-peer computing exploits the aggregate compute power that millions of networked personal computers and high-end personal digital assistants provide. P2P has been used for applications such as file backup, content distribution, and collaboration- all without requiring central control and management. Decentralization and access to the vast resources distributed at the edge of the Internet usually motivate P2P use. In today’s P2P wide-area network, data and services are replicated widely for availability, durability, and locality, and thus provide suitable building blocks for IDC.

P2P solutions must deal problems inherent in vastly heterogeneous collection of personal machines. These include intermittent node availability, frequent lack of Domain Name System IP addressability stemming from dynamically assigned and translated IP addresses, and difficulties with bi-directional communication through network address translation (NAT) units and firewalls.

P2P provides a distributed computing paradigm in which participating machines usually act as equals that both contribute to and draw from the resources shared among a group of collaborating nodes. Often, these peers participate from the network’s edge instead of its center, where specialized or dedicated compute serves will reside. These fringe end-user machines can dynamically discover each other and forms an ad hoc collaborative environment. Thus, many P2P systems address and solve naming, discovery, intermittent connectivity, and NAT and firewall traversal issue, but do so in proprietary ways that preclude them from direct interconnection into IDC.

An IDC system can mimic P2P technique to form overlay networks that provide location-independent routing of message directly to the qualifying object or service, bypassing centralized resources and issuing only point-to-point links. This overly networks can distribute digital content to the large Internet user population more cost-effectively than can simple client-server techniques alone.

Peer-to-peer (P2P) computing promises to be the paradigm with mind share sufficient to push a number of interesting distributed computing technologies from the shadows into the spotlight. P2P computing is a subset of distributed computing. not all distributed computing is peer-to-peer computing. The name "peer-to-peer" suggests an egalitarian relationship between peers and, more importantly, suggests direct interactions between peers. P2P applications consist of a number of peers, each performing a specific role in the P2P network, in communication with each other. Typically, the number of peers is large and the number of different roles is small. These two factors explain why most P2P applications are characterized by massive parallelization in function. The problems to be solved in P2P computing overlap to a considerable degree with the problems faced in distributed computing -- coordinating and monitoring the activities of independent nodes and ensuring robust, reliable communication between nodes. But not all distributed computing is P2P computing. Distributed applications like SETI@home and the various distributed.net projects exhibit little interesting peer-to-peer interaction, and are therefore not really P2P

Grid computing

The term grid comes from the notation as a utility, and it derives from the analogy to a power grid as a pool of resources aggregated to meet variations in load demand without user awareness of or interest in the details of the grid’s operation. Grid computing extends conventional distributed computing by facilitating large scale sharing of computational and storage resources among a dynamic collection of individuals and instructions. Such settings have unique scale, security, authentication, and resource access and discovery requirements.

Grid computing origin and most of its current applications lie in the area of high performance computing. Initially grid technologies primary beneficiaries were scientists who wanted to access to large data sets or unique data sources or which wanted to aggregate mass-produced compute power to form comparatively inexpensive virtual supercomputers.

As grid technologies matured, it became evident that the problems being addressed and techniques being developed could apply to a broader range of computing problems. This subsequent inclusion of web services exemplified by the Open Grid Services Architecture (OGSA) provides a sound basis for cross-organizational and heterogeneous computing.

Grid computing provides both architectural solutions and middleware for resource virtualization, aggregation, and related abstractions. Higher layers of grid computing provide mechanisms and tools for application distribution and execution across a collection of machines.

Applications of Grid computing

Distributed Supercomputing

Distributed Supercomputing applications couple multiple computational resources - supercomputers and/or workstations

Distributed supercomputing applications include SFExpress (large-scale modeling of battle entities with complex interactive behavior for distributed interactive simulation), Climate Modeling (modeling of climate behavior using complex models and long time-scales)

High-Throughput Applications

Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work

High-throughput applications include RSA key cracking, SETI@home (detection of extra-terrestrial communication)

Data-Intensive Applications

Focus is on synthesizing new information from large amounts of physically distributed data

Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications, and digital library applications.

The term, grid computing, has become one of the latest buzzwords in the IT industry. Grid computing is an innovative approach that leverages existing IT infrastructure to optimize compute resources and manage data and computing workloads. A grid is a collection of resources owned by multiple organizations that is coordinated to allow them to solve a common problem. Three commonly recognized forms of grid:

Computing grid: - multiple computers to solve one application problem

Data grid: - multiple storage systems to host one very large data set

Collaboration grid: - multiple collaboration systems for collaborating on a common issue.

Grid computing is not a new concept but one that has gained recent renewed interest and activity for a couple of main reasons:

IT budgets have been cut, and grid computing offers a much less expensive alternative to purchasing new, larger server platforms.

Computing problems in several industries involve processing large volumes of data and/or performing repetitive computations to the extent that the workload requirements exceed existing server platform capabilities.

Some of the industries that are interested in grid computing include:- life sciences, Computer manufacturing, industrial manufacturing, financier services, and government.

PROTOTYPES AND PROOF OF CONCEPT

As the earlier and related work in distributed computing sidebar describes, grid and web services concentrate on large scale systems with fixed or slowly changing node populations, while P2P systems deal with somewhat more intermittent membership. Both areas provide several prototypes and concept of proof.

We designed and implemented a prototype system to test the applicability and downward scalability of these design principles to similar, proximity area wireless networks with mobile clients.

MOBILE CHALLENGES

Web services, grid computing, and P2P computing share certain assumptions about the compute and communications environment. Each assumes strong Internet access: fast, low latency, reliable, and durable connections to the network infrastructure. They also rely on predominantly static configurations of, for example, pre-configured grid node candidates or well-known UDDI servers and highly available web servers.

Currently, mobile devices can participate in this constituent community only by mimicking a traditional network entity for the duration of communication session. Mobile devices participate primarily by establishing, breaking, and reestablishing these well-behaved sessions, which relegates them to a second-class status

Dynamic discovery requirements

Mobile computing’s inherently nomadic nature- which includes moving within and between environments-poses some unique requirements, foremost among them the ability to locate available services and resources in or near a new location. Additionally, devices must be able to efficiently register their willingness to offer services to the local region, include their hardware constraints, and specify how long they will be available in that region.

Two simple but greatly differing usage scenarios involving users with wireless mobile devices highlight the challenges to dynamic discovery mechanism that mobile computing presents:

a) Sue several other numismatics at flea market to exchange coin inventories stored on their respective devices and look for possible trading opportunities.

b) Sue and other researchers meet at a working session in a conference room to review a reference design. They share a single projector and laptop documents and launches last-minute performance simulations into the surrounding desktop PC head.

The first scenario assumes no surrounding infrastructure. The ad-hoc networking takes places among devices through direct discovery and inspection, without the benefit of structured networks, routers, servers or even power outlets. The second scenario combines a wireless networking infrastructure and a fixed enterprise infrastructure including a desktop PC Grid. The group works behind a firewall within a more secure and trusted environment.

From the scenarios we conclude that Sue’s mobile devices and telescoping security- she may not want to share documents from work with random people at the flea market, and she may want to reveal only certain parts of her personal data or coin collection database to trusted individuals she meets there. In contrast, in the office environment, the discovery mechanism must let workers spontaneously collaborate while finding and using local, fixed resource such as disarticulated I/O and remote compute resources.

In both the cases, discovery must be resource friendly and constructed to minimize power drains and network traffic for participating devices. Other obvious issues include service advertisement freshness, replication and federation of discovery directories, and service description and invocation of semantics. A variety of methods – such as SLP, Salutation, UPnP, Jini, Bluetooth, SDP, UDDI – attest to the varied issues and needs that discovery addresses.

To avoid repeating earlier work, we investigated the applicability of several existing service discovery protocols to IDC. Our evaluation included creating a requirements set from IDC design calls. We then compared each service discovery protocols against those requirements.

We first required that the discovery mechanism support both discovery oriented and discovery free operation modes. Directory free methods tend to be more efficient in the ad-hoc or mobile environments, while directory oriented solutions tend to be more efficient in a static environment.

Second, we required that the mechanism be scalable in the extreme, from small ad-hoc networks to the complete enterprise and beyond. This, combined with the need for directory-oriented operation, implies the need for arranging the available directory servers hierarchically.

Third, we required that discovery be platform independent: it must usable from any language and OS available in the IDC environment.

Fourth, we required support for unreliable networks and transport protocols while allowing use of a reliable transport when available and appropriate. We also required that service registration to directory machines use soft-state principles to prevent those directories from filling up with outdated information.

Fifth and finally we required that the system permit discovery of both a service’s attributes and invocation interface.

IDC Dynamic discovery prototype

We prototyped a solution using Web service standards, expanding them where needed to meet their requirements for mobility support. Each device includes a description of the services it will share, in a format suitable for posting in the UDDI registry. To meet the need of very small ad-hoc networks devices can inspect each other’s service offering after establishing network level discovery and connectivity. We use WSDL and UDDI formats for convenience because most Web service clients interpret them.

For larger ad-hoc hybrid fixed or wireless networks, our prototype dynamically creates a UDDI like local directory with proxy entries for all local, currently available services, which it stores in a local node. Each device has local UDDI registry and server that contain the services it offers. The system filters the service advertisements based on environmental parameters such as network media, location and security. Although bootstrap discovery of the UDDI service on a given device can be accomplished in several ways, our prototype broadcasts queries on a well-known transmission control protocol port. For smaller devices that offer fewer services, custom streamlined version of the UDDI can be maintained in a list instead of a database. Devices then query or inspect each other’s UDDI service directories

Simple inspection does not scale well to larger concentration of portable devices, since each arrival of new device require all others to consume power and bandwidth that the new comer needs foe inspection of their service description. To accommodate power constrained devices, whose battery life depletion rate depends on network traffic intensity, our architecture permits the election or re-designation of a Local Master Directory (LMD), usually hosted by a node that has the best combination of durability, power and connectivity.

The LMD node aggregates service advertisements and can perform query resolution and service matching. An LMD might, for e.g.: - serve as a service lookup aggregator or proxy for fixed devices and infrastructure in the vicinity, such as the area around one or more Wi-Fi access points. To ensure scalability the architecture comprehends a hierarchical LMD federation, an organization that might be used within an enterprise or campus. To accommodate localized scaling and long distance discovery, we incorporated support for hierarchical searches, resolving queries at the point most local to the requestor.

CONCLUSION

IDC is poised to accelerate distributed computing on internet by providing an environment to aggregate, discover, and dynamically assemble computing structures on demand to perform a given task. Its underlying principles can be re-incremented at different scales in sensor, home, area, enterprise, and wide area networks. Indeed IDC provides a foundation for pervasive computing from small-scale area networks to virtual, planetary-scale systems.

Pervasive computing is often associated with the creation of smart environments that contain embedded computing resources. In such environments, mobile users will be able to carry a subset of physical computer resources, augmenting their computational, storage, UI capabilities as required by dynamically aggregating resources found in the environment.

Security and authentication are necessary for most practical application. Resource sharing and aggregation across potentially distinct security domains and levels of trust necessitate protection of both the host and guest application.

Engineering Seminar Topics :: Seminar Paper

Internet Distributed Computing - Seminar Report

2 comments:

Seminar Topics