Intermediaries on the WWW




Intermediary systems

Motivation

The World Wide Web, as a ubiquitous environment for finding information, exchanging ideas and transacting business, is now globally recognized as the universe of information accessible from any networked computer or device. Its explosive success, the growing amount of information available on the Internet, the more and more dynamic nature of this information, the permeate of 3rd generation wireless technologies and, finally, the need of collaboration and cooperation have changed both the way Internet services are offered to end users and the vision of the Web, currently, fully characterized by the Reed’s law (the value of the networks grows exponentially through their interconnectivity) and by the settled idea that the online communities will gain a role more and more decisive for the next generation Internet.

This evolution is even more important since it is combined with the growing amount of personalized services offered to the users. Tailoring Web resources to the user preferences, context, location and to the capabilities of their heterogeneous client devices requires on-the-fly content transformation, that as we advocate, could be efficiently provided by intermediary systems.

HTTP proxies or intermediaries represent an impressive mechanism to develop Web applications that require transparency, real time processing, easy usage, a certain degree of anonymity and sharing of access from several Web users. Being in the path of requests and responses, proxies can act as filters and may modify the requests and the responses that flow through them according to user preferences/needs and the capabilities of their accessing devices. Users can tweak proxies about their preferences, by defining personal profiles, in order to modify their functionalities.

Furthermore, to ensure an efficient provision of complex services other challenges must be addressed: scalability, that is, network services that incrementally support growth in both number of users and services, accessibility, that is, applications available on demand according to the client devices requirements and their system resources (CPU, memory, storage, etc.), high availability and robustness, that is, services which exhibit a 24×7 availability even though interruptions and failures, and finally cost-effectiveness, that is, services developed by leveraging on existing software systems, to save the cost of re-implementing part of the software.

Functionalities

Transcoding and Content Adaptation

The proliferation of the rich, entertaining and interactive applications available on the Web as well as a variety of terminal devices such as pagers, personal digital assistants (PDAs), hand-held computers, Web-phones, TV browsers, etc., involved new challenges about both the delivery and the presentation of complex personalized services. In order to meet such challenges new mechanisms for content adaptation are required, whereas content adaptation, that allows the conversion of Web content from one form to another, can be efficiently carried out by intermediary systems at the edge of the network.

In particular, content adaptation refers to two different abstractions, that is, personalization, the adaptation of the Web content according to user preferences, locations and contexts, and transcoding, the process of of tailoring Web content to the capabilities of the client device and the network connections.

Accessibility

Web accessibility means that people with disabilities can easily navigate and interact with the Web. Conversely, currently most Web sites have accessibility barriers that make difficult or impossible for many people with disabilities to use and access the Web content. Using keyboards and mouse, hearing video and audio multimedia files, browsing through some intrusive Web pages (i.e. Web pages with pop-ups, advertisements, etc.) could appear as a normal activity for non disabled people, but, it appears as a not simple task for users with some type of disability.

The W3C Consortium is employing a lot of efforts and is leading a lot of activities to make the most famous information space accessible for anyone, and then allow people with disabilities to actively perceive, understand, navigate, and interact with the Web. Nevertheless, the documents available on the Web continue to exhibit a growing complexity, especially for people with visual disabilities, that often are unable to access, summarize and distill information on a Web pages or groups of pages. On the other hand, for people with motor disabilities the WWW represents a very important source of information, a familiar and ubiquitous environment where to get information about any aspect of life: education, employment, shopping, business, government and more. Hence, as the Internet continues to evolve with an increasing diversity and heterogeneity, there is a growing demands for technological solutions that are able to allow the universal access to the Web content.

Cooperation and Collaboration

Due to its popularity and its ease of use, the Web is an attractive platform to support distributed cooperative works. These systems can help users to quickly find information among the growing amount of information available on the WWW, as well as allow users within a common workgroup to share and cooperate toward a common goal. The major usability problem that users can experience by surfing the Web is the disorientation due to the complex and always growing structure of the Web (often referred to as the lost in hyperspace phenomenon): users cannot successfully navigate Web sites and, thus, to find useful information. Moreover, it is often related to the cognitive overhead, that people experience when arriving in a specific point in a document they forget what they must to do there, what they are looking for, whether to follow a link or not, etc. Given this general context of confusion, a lot of users continue their navigation by choosing the first link which appears to suit their requirements, even it is not the right choice. Solutions that attempt to reduce this cognitive overhead should provide a compact overview of the navigated hyperspace or the augmentation of Web documents by enhancing link capabilities.

Filtering

A variety of content is served by various Internet web sites (contents that differ in size, format, message and so on) and different technologies are required to verify the appropriateness of these content delivered to end users. Often, this content contains material that is not suitable for example for children that increasingly have access to the Web at a early age. Content filtering mechanisms allow to get perfect control of Web sites users can access, allow that some contents are accessible only by authorized users, inhibiting children to see inappropriate documents by blocking the access or hiding them. In addition, they can regulate the time when users can connect to the Internet, the time they can spend on a PC, the programs that can be run, the folder that can be explored. The functionalities provided by content filtering mechanisms include virus scanning, cookie analysis, text analysis, image (pornography) analysis, etc. Content filtering usually works by specifying character strings that, if matched, indicate undesirable content that is to be filtered out. Content is typically filtered for pornographic content, for violence- or hate-oriented content. The critics of these filtering mechanisms are that they can unintentionally exclude desirable content. Content filtering systems also block requests for destination sites that are included in blacklists since their represent unwanted content. Examples are diversion sites (e.g. comics, games, or pornographic sites) auction sites and job search sites.

Privacy Protection

The increased dependence on the Internet for a wide variety of daily transactions causes access trails to be left in many locations. There is a corresponding loss in privacy for most users. Virtually all the popular Web sites, either directly or indirectly, gather data about the identity of the users. The growing concerns about identity theft has led users to worry about who has access to information about their Web navigation. A variety of techniques applied at a user’s browser or at a proxy on behalf of an organization, can be used to protect privacy-related information. Browser-based approaches are attractive because they can be applied at the source of a user’s requests and customized to the individual preferences of the user. Proxy-based approaches are independent of the different browsers employed by the users behind it. An anonymizing proxy can be used to hide the source of requests and limit per-user or even per-organization tracking.

These techniques include: disable cookies (for all or for third-party servers) that is the most commonly provided privacy technique by browsers such as Internet Explorer (IE) and Firefox, disabling or filter out script execution, filter all third-party objects (this technique can eliminate all object retrievals that could be used by third-party servers to aggregate information about a user’s page retrieval, although this technique may also alter needed content for a page), filter requests with identifying URLs, header filtering, filter objects from top aggregation servers, remove invisible Web bugs, image analysis.

Our contribution

SISI: Scalable Intermediary Software Infrastructure (for edge services)

The Scalable Intermediary Software Infrastructure for edge services (SISI) [HPCCC05] is a flexible and programmable intermediary infrastructure that enables universal access to the Web content. This framework has been designed with the goal of guaranteeing an efficient and scalable delivery of personalized services at intermediate edge server on the WWW.

SISI programmability is a crucial characteristics since it allows an easy implementation and assembling of adaptation services that enhance the quality of services perceived by users during their navigation. To allow programmability, the SISI framework provides a programming model and a set of APIs that can be used for a quick prototyping and a easy development of new services to improve the navigation on the Web. Services can be assembled and configured to enhance the set of pre-defined functionalities. The architecture is innovative since it provides programmability without giving up efficiency, offers users profiles management primitives, deployment/un-deployment and authentication/authorization mechanisms.

The work is placed on top of existing open-source applications such as Apache Web server and mod_perl because of their quality and their wide popularity. The SISI framework is composed of different modules, entirely written in Perl language, each acting in specific phases of the Apache HTTP Request life-cycle.

References

[SOUPS07] "Measuring privacy loss and the impact of privacy protection in web browsing" Balachander Krishnamurthy, Delfina Malandrino, and Craig E. Wills. In Proceedings of the Symposium on Usable Privacy and Security, pages 52-63, Pittsburgh, PA USA, July 2007. ACM International Conference Proceedings Series.

[UAIS07] "Personalizable Edge Services for Web Accessibility" Ugo Erra, Gennaro Iaccarino, Delfina Malandrino and Vittorio Scarano. In Universal Access in the Information Society, vol. 6, n. 3, pp. 285-306, November 2007. Springer Berlin, Heidelberg.

[WWW06] "Efficient Edge-Services for Colorblind Users" Gennaro Iaccarino, Delfina Malandrino, Marco Del Percio and Vittorio Scarano. In Poster Proc. of the 15th World Wide Web Conference 2006, May 23-26, 2006, Edinburgh, Scotland.

[W4A06] "Personalizable Edge-Services for Web Accessibility" Gennaro Iaccarino, Delfina Malandrino and Vittorio Scarano. In Proc. of the International Cross-Disciplinary Workshop on Web Accessibility. 22-23 May 2006, Edinburgh, Scotland, UK.

[WWWJ] "A Scalable cluster-based infrastructure for Edge-computing Services". Raffaella Grieco, Delfina Malandrino, Vittorio Scarano. World Wide Web Journal.

[CompNetw06] "Tackling Web Dynamics by Programmable Proxies". Delfina Malandrino, Vittorio Scarano. Computer Networks, Vol. 50, Issue 14, October 2006.

[Percom06] "Context-aware Provision of Advanced Internet Services". Raffaella Grieco, Delfina Malandrino, Francesca Mazzoni, Daniele Riboni. In Proceedings of the Fourth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom 2006).

[Mobis05] "Mobile-Web Services via Programmable Proxies". Raffaella Grieco, Delfina Malandrino, Francesca Mazzoni, Vittorio Scarano". In Proceedings of the IFIP TC8 Working Conference on Mobile Information Systems – 2005 (MOBIS). Leeds, UK, December 2005.

[HPCCC05] "A scalable framework for the support of advanced edge services". Michele Colajanni, Raffaella Grieco, Delfina Malandrino, Francesca Mazzoni, Vittorio Scarano". In Proceedings of the 2005 International Conference on High Performance Computing and Communications (HPCC’05). Sorrento (Naples), Italy, September 2005.

[SIUMI05] "An Intermediary Software Infrastructure for Edge Services" Raffaella Grieco, Delfina Malandrino, Francesca Mazzoni, Vittorio Scarano, Francesco Varriale. Proc. of The first International Workshop on Services and Infrastructure for the Ubiquitous and Mobile Internet (SIUMI’05). Columbus, Ohio, June 2005.

[Webist05] "A Taxonomy of Programmable HTTP Proxies for Advanced Edge Services" Delfina Malandrino, Vittorio Scarano. Proc. of the International Conference on Web Information Systems and Technologies, Miami, Florida, USA, May 2005.

[SAC05] "SEcS: Scalable Edge-computing Services" Raffaella Grieco, Delfina Malandrino, Vittorio Scarano. Proc. of 20th ACM Symposium on Applied Computing (SAC 2005).