第3卷第2期 智能系统学报 Vol.3№2 2008年4月 CAAI Transactions on Intelligent Systems Apr.2008 Dispatching mobile Agents for DDM a pplications LI Xi-ning'2,Guillaume Autran' (1.Department of Computing and Information Science,University of Guelph,Guelph,Ontario,NIG 2WI,Canada;2.State Key Laboratory of Novel Software Technology,Nanjing University,Nanjing 210093,China) Abstract:Techniques for mining information from distributed data sources accessible over the Internet are a growing area of research.The mobile Agent paradigm opens a new door for distributed data mining and knowledge discovery applications.In this paper we present the design of a mobile agent system which cou- ples service discovery,using a logical language based application programming interface,and database ac- cess.Combining mobility with database access provides a means to create more efficient data mining appli- cations.The processing of data is moved to network wide data locations instead of the traditional approach of bringing huge amount of data to the processing location.Our proposal aims at implementing system tools that will enable intelligent mobile Agents to roam the Internet searching for distributed data services. Agents access the data,discover patterns,extract useful information from facts recorded in the databases, then communicate local results back to the user.The user then generates a global data model through the aggregation of results provided by all Agents.This overcomes barriers posed by network congestion,poor security,and unreliability. Key words:mobile Agent;distributed data mining;service discovery;database service CLC nember TN253 Document code :A Article ID 1673-4785(2008)02-0181-07 Mobile Agent systems bring forward the crea-objective of our research is to equip mobile agents tive idea of dispatching user defined computations with system tools such that those agents can search agents,towards network resources,and provide a for data sites,move from hosts to hosts,gather whole new architecture for designing distributed information and access databases carry out complex systems.Distributed data mining (DDM)is one of data mining algorithms,and generate global data the important application areas of deploying intelli-model or pattern through the aggregation of the lo- gent mobile agent paradigm!2.Most existing cal results. DDM projects focus on approaches to apply various To deploy mobile agents in DDM,a mobile a- machine leaning algorithms to compute descriptive gent system must provide languages and various models of the physically distributed data sources.programming interfaces for fast and easy develop- Although these approaches provide numerous algo- ment of applications.Different languages,such as rithms,ranging from statistical model to symbolic/C and Java,have been chosen as agent-program- logic models,they typically consider homogeneous ming languages for variety of reasons.Among data sites and require the support of distributed da- them,logic-programming languages prove to be an tabases.As the number and size of databases and alternative tool of building mobile agents.Benefi- data warehouses grow at phenomenal rates,one of ting from their powerful deductive abilities,com- the main challenges in DDM is the design and im- plex calculations can often be represented by a set plementation of system infrastructure that scales of compact logic predicates,which make agents up to large,dynamic and remote data sources.The more suitable to migrate around the network.In addition,mobile agents must interact with their 收稿日期:2007-07-04. hosts in order to use data services or to negotiate 通讯作者:LI灯ning.上mail:i@cis.uoguelph.ca. services with service providers.Discovering serv- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
第 3 卷第 2 期 智 能 系 统 学 报 Vol. 3 №. 2 2008 年 4 月 CAA I Transactions on Intelligent Systems Apr. 2008 Dispatching mobile Agents for DDM applications L I Xi2ning 1 ,2 , Guillaume Autran 1 (1. Department of Computing and Information Science , University of Guelph , Guelph , Ontario ,N1 G 2W1 ,Canada ; 2. State Key Laboratory of Novel Software Technology , Nanjing University ,Nanjing 210093 ,China) Abstract :Techniques for mining information from distributed data sources accessible over t he Internet are a growing area of research. The mobile Agent paradigm opens a new door for distributed data mining and knowledge discovery applications. In this paper we present t he design of a mobile agent system which cou2 ples service discovery , using a logical language based application programming interface , and database ac2 cess. Combining mobility wit h database access provides a means to create more efficient data mining appli2 cations. The processing of data is moved to network wide data locations instead of t he traditional approach of bringing huge amount of data to the processing location. Our proposal aims at implementing system tools t hat will enable intelligent mobile Agents to roam t he Internet searching for distributed data services. Agents access the data , discover patterns , extract usef ul information from facts recorded in the databases , t hen communicate local results back to t he user. The user t hen generates a global data model t hrough t he aggregation of results provided by all Agents. This overcomes barriers posed by network congestion , poor security , and unreliability. Keywords :mobile Agent ; distributed data mining ; service discovery ; database service CLC nember :TN253 Document code :A Article ID :167324785 (2008) 0220181207 收稿日期 :2007207204. 通讯作者 :LI Xi2ning. E2mail :Xli @cis. uoguelph. ca. Mobile Agent systems bring forward the crea2 tive idea of dispatching user defined comp utations2 agents , towards network resources , and provide a whole new architect ure for designing distributed systems. Distributed data mining (DDM) is one of t he important application areas of deploying intelli2 gent mobile agent paradigm [122 ] . Most existing DDM projects focus on approaches to apply various machine leaning algorit hms to comp ute descriptive models of t he p hysically distributed data sources. Alt hough t hese approaches provide numerous algo2 rit hms , ranging from statistical model to symbolic/ logic models , t hey typically consider homogeneous data sites and require t he support of distributed da2 tabases. As t he number and size of databases and data warehouses grow at p henomenal rates , one of t he main challenges in DDM is the design and im2 plementation of system infrastructure that scales up to large , dynamic and remote data sources. The objective of our research is to equip mobile agents with system tools such t hat t hose agents can search for data sites , move from hosts to hosts , gat her information and access databases carry out complex data mining algorit hms , and generate global data model or pattern t hrough t he aggregation of the lo2 cal results. To deploy mobile agents in DDM , a mobile a2 gent system must provide languages and various programming interfaces for fast and easy develop2 ment of applications. Different languages , such as C and J ava , have been chosen as agent2program2 ming languages for variety of reasons. Among t hem , logic2programming languages p rove to be an alternative tool of building mobile agents. Benefi2 ting from their powerf ul deductive abilities , com2 plex calculations can often be represented by a set of compact logic predicates , which make agents more suitable to migrate around the network. In addition , mobile agents must interact with t heir ho sts in order to use data services or to negotiate services wit h service providers. Discovering serv2
·182· 智能系统学报 第3卷 ices for mo bile agents came form two considera- programming context and a closely coupled system tions.First,the agents possess local knowledge of provides database query interface as a subset or an the network and have a limited functionality,since extension of the language for featuring dynamic only agents of limited size and complexity can effi- query formulation and view access.For example, ciently migrate in a network and have little over- SWFProlog and KB-Prolog!belong to the first head.Hence specific services are required which and CGWI and Quintus!fall into the second aim at deploying mobile agents efficiently in system category. and network management.Secondly,mobile a- In this paper we present the design and imple- gents are subject to strong security restrictions, mentation of a mobile agent system,which couples which are enforced by the system security mecha-a logic language based application programming in- nism.Thus,mobile agents should find services terface for DDM.Two important system modules, that help to complete security-critical tasks,other namely,service discovery and database access, than execute code,which might jeopardize remote have been encapsulated and installed inside of the servers.Following this trend,it becomes increas existing IMAGO (intelligent mobile Agent gliding ingly important to give agents the ability of finding omline)system.The IMAGO system is an infra- and making use of data services that are available structure for mobile agent applications.It includes in a network A variety of Service Discovery code for the IMAGO servera Multi-threading Log- Protocols (SDPs)are currently under development ic Virtual Machine,the IMA GO-Prologa Prolog by some companies and research groups.The most like programming language extended with a rich well known schemes are Sun's Java based API for implementing mobile agent and DDM ap- Jini TM!41,Salutation!51,Microsoft's UPPls1, plications,and the IMA GO IDE,a Java-GUI based IETFs draft Service Location Protocoll7 and OA- program from which users can do editing,compi- SIS UDDIs).Some of these SDPs are extended andI ling,and invoking an agent application.The re- applied by several mobile agent systems to solve mainder of the paper is organized as follows.Sec- the service discovery problem. tion 2 gives an overview of the design of service In a DDM environment,data may be stored a- discovery module and integration with the IMA GO mong physically distributed sites and may be artifi-system.In Section 3,we discuss definitions of the cially partitioned among different sites for better database predicates and the principle of data driven scalability.Therefore endowing mobile agents implementation.In Section 4,we present a simple with the ability of accessing remote databases is database access example and briefly explain how to the basis for DDM applications.This encourages use database interface presented by the IMA GO us to explore the strategies of coupling a mobile a- system.Finally,we give the concluding remarks gent programming language with database access as well as future work. facilities.In recent years numerous approaches have been made under the topic of designing a cou- 1 Data service discovery pled system that integrates a relational database The general idea of distributed data services is and a logic programming language.They all enable that a DDM application may be separated from the programmers to access large amounts of shared da- data sites needed to fulfill a task.These data sites ta for knowledge processing,manage data effi-are mostly modeled by local and centralized data- ciently as well as process data intelligently.Gener- bases,which are independent of,or sometimes un- ally speaking,coupling a logic programming lan- known to the application.A commonly used DDM guage with database access facility can be roughly approach is to apply traditional data mining algo- divided into two categories.A loosely coupled sys rithms to an aggregation of data retrieved from tem embeds a nom-logic language,such as the physically distributed data sources.However,this structure query language (SQL),within the logic- approach may be impractical to a large scale of data 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
ices for mobile agents came form two considera2 tions. First , t he agents possess local knowledge of t he network and have a limited f unctionality , since only agents of limited size and complexity can effi2 ciently migrate in a network and have little over2 head. Hence specific services are required which aim at deploying mobile agents efficiently in system and network management. Secondly , mobile a2 gents are subject to strong security restrictions , which are enforced by t he system security mecha2 nism. Thus , mobile agents should find services t hat help to complete security2critical tasks , ot her t han execute code , which might jeopardize remote servers. Following this trend , it becomes increas2 ingly important to give agents the ability of finding and making use of data services that are available in a network [3 ] . A variety of Service Discovery Protocols (SDPs) are currently under development by some companies and research group s. The most well known schemes are Sun ’ s J ava based Jini TM [4 ] , Salutation [5 ] , Microsoft ’ s U PP [6 ] , IETF’s draft Service Location Protocol [7 ] and OA2 SIS UDDI [8 ] . Some of these SDPs are extended and applied by several mobile agent systems to solve t he service discovery problem. In a DDM environment , data may be stored a2 mong p hysically distributed sites and may be artifi2 cially partitioned among different sites for better scalability. Therefore endowing mobile agents with t he ability of accessing remote databases is t he basis for DDM applications. This encourages us to explore the strategies of coupling a mobile a2 gent programming language with database access facilities. In recent years numerous app roaches have been made under t he topic of designing a cou2 pled system t hat integrates a relational database and a logic programming language. They all enable programmers to access large amounts of shared da2 ta for knowledge processing , manage data effi2 ciently as well as process data intelligently. Gener2 ally speaking , coupling a logic programming lan2 guage with database access facility can be roughly divided into two categories. A loosely coupled sys2 tem embeds a non2logic language , such as the struct ure query language (SQL) , wit hin t he logic2 programming context and a closely coupled system provides database query interface as a subset or an extension of t he language for feat uring dynamic query formulation and view access. For example , SWI2Prolog and KB2Prolog [9 ] belong to t he first and CGW [10 ] and Quintus [11 ] fall into the second category. In this paper we present t he design and imple2 mentation of a mobile agent system , which couples a logic language based application programming in2 terface for DDM. Two important system modules , namely , service discovery and database access , have been encap sulated and installed inside of t he existing IMA GO (intelligent mobile Agent gliding on2line) system. The IMA GO system is an infra2 struct ure for mobile agent applications. It includes code for t he IMA GO server2a Multi2t hreading Log2 ic Virt ual Machine , t he IMA GO2Prolog2a Prolog2 like programming language extended with a rich API for implementing mobile agent and DDM ap2 plications , and t he IMA GO IDE , a J ava2GU I based program from which users can do editing , compi2 ling , and invoking an agent application. The re2 mainder of t he paper is organized as follows. Sec2 tion 2 gives an overview of t he design of service discovery module and integration wit h t he IMA GO system. In Section 3 , we discuss definitions of t he database p redicates and the principle of data driven implementation. In Section 4 , we present a simple database access example and briefly explain how to use database interface presented by t he IMA GO system. Finally , we give the concluding remarks as well as f uture work. 1 Data service discovery The general idea of distributed data services is t hat a DDM application may be separated from t he data sites needed to f ulfill a task. These data sites are mostly modeled by local and centralized data2 bases , which are independent of , or sometimes un2 known to t he application. A commonly used DDM approach is to apply traditional data mining algo2 rit hms to an aggregation of data retrieved from p hysically distributed data sources. However , this approach may be impractical to a large scale of data ·182 · 智 能 系 统 学 报 第 3 卷
第2期 LI Xining ,et al Dispatching mobile Agents for DDM applications ·183 sets distributed over the Internet.Deploying mo- states via broadcasting to anyone who is listening. bile agent paradigm in DDM offers a possible solu- A service advertisement should consist of the serv- tion because the application may decompose data ice identifier,plus a simple string describing what mining problems to scale up to a large distributed the service is,or a set of strings for specifications data sources and dispatch agents to carry out dis- and attributes. tributed data processing.This in turn leads us to The most significant feature of DSSEM is that the data service discovery problem,that is,how to we enrich the service description by using web find data sites available to a DDM application. page's URL to replace the traditional string-set Clearly,the number of services that will be- service description in mobile agent systems.That come available in the Internet is expected to grow is,service providers use web pages to advertise enormously.Examples are information access via their services.Because of the specific characteris- the Internet,multi-media on demand,Web serv-tics,such as containing rich media information ices and services that use computational infrastruc- (text,sound,image,etc.),working with the ture,such as P2P and Grid computing.In general, standard HTTP protocol and being able to refer- the service usage model is role-based.An entity ence each other,web pages may play a key role providing services that can be utilized by other re- acting as the template of the service description. questing entities acts as a provider.Conversely,an On the other hand,since the search engine is a ma- entity requesting the provision of a service is called ture technology and offers an automated indexing a requester.To be able to offer services,a provid-tool that can provide a highly efficient ranking er in turn can act as a requester making use of oth-mechanism for the collected information,it is use- er services.In a distributed system,requesters and ful for acting as the directory server in our model. providers usually live on physically separate hosts. Of course,DSSEM also benefits from previous Providers should from time to time advertise serv- service discovery research in selected areas but is ices by broadcasting to requesters or registering endowed with a new concept by combining some their services on third party servers. special features of mobile agents as well as integra- In the IMA GO system,we have implemented ting service discovery tool with agent servers. a new data service discovery model DSSEM (Dis- covery Service via Search Engine Model)for mo- nobile migrate gent search directory bile agents!.DSSEM is based on a search en search migrate engine gine,a global Web search tool with centralized in- dex and fuzzy retrieval.This model especially aims database nobile database gen access agent at solving the database service location problem database and is integrated with the IMA GO system.Data module advertise module service providers manually register their services in a service discovery server.A mobile agent locates a Fig I An example of service discovery and specific service by migrating to the service discov- data mining process ery server and subsequently submitting requests In principle,data service providers register with the required data description.The design goal the URLs of their websites that advertise all the of DSSEM is to provide a flexible and efficient information concerning services.As a middleware service discovery protocol in a mobile agent based on the service discovery server,the search engine DDM environment. will periodically retrieve web pages indicated in Before a service can be discovered,it should URLs and all their referencing documents,parse make itself public.This process is called service tags and words in documents and set up the rela- advertisement.The work can be done when serv- tionship between the keywords and the host ad- ices are initialized,or every time they change their dress of these service providers.On the other 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
sets distributed over t he Internet. Deploying mo2 bile agent paradigm in DDM offers a possible solu2 tion because t he application may decompose data mining problems to scale up to a large distributed data sources and dispatch agents to carry out dis2 tributed data processing. This in t urn leads us to t he data service discovery problem , t hat is , how to find data sites available to a DDM application. Clearly , t he number of services t hat will be2 come available in t he Internet is expected to grow enormously. Examples are information access via t he Internet , multi2media on demand , Web serv2 ices and services t hat use comp utational infrastruc2 t ure , such as P2P and Grid comp uting. In general , t he service usage model is role2based. An entity providing services t hat can be utilized by ot her re2 questing entities acts as a provider. Conversely , an entity requesting the provision of a service is called a requester. To be able to offer services , a provid2 er in turn can act as a requester making use of oth2 er services. In a distributed system , requesters and providers usually live on p hysically separate hosts. Providers should from time to time advertise serv2 ices by broadcasting to requesters or registering t heir services on t hird party servers. In t he IMA GO system , we have implemented a new data service discovery model DSSEM (Dis2 covery Service via Search Engine Model) for mo2 bile agents [13 ] . DSSEM is based on a search en2 gine , a global Web search tool wit h centralized in2 dex and f uzzy retrieval. This model especially aims at solving t he database service location problem and is integrated wit h t he IMA GO system. Data service providers manually register t heir services in a service discovery server. A mobile agent locates a specific service by migrating to t he service discov2 ery server and subsequently submitting requests with the required data description. The design goal of DSSEM is to p rovide a flexible and efficient service discovery protocol in a mobile agent based DDM environment. Before a service can be discovered , it should make itself p ublic. This process is called service advertisement. The work can be done when serv2 ices are initialized , or every time t hey change t heir states via broadcasting to anyone who is listening. A service advertisement should consist of t he serv2 ice identifier , plus a simple string describing what t he service is , or a set of strings for specifications and attributes. The most significant feat ure of DSSEM is t hat we enrich t he service description by using web page’s U RL to replace t he traditional string2set service description in mobile agent systems. That is , service providers use web pages to advertise t heir services. Because of the specific characteris2 tics , such as containing rich media information (text , sound , image , etc. ) , working wit h t he standard H TTP p rotocol and being able to refer2 ence each ot her , web pages may play a key role acting as t he template of t he service description. On t he ot her hand , since t he search engine is a ma2 t ure technology and offers an automated indexing tool t hat can provide a highly efficient ranking mechanism for t he collected information , it is use2 f ul for acting as t he directory server in our model. Of course , DSSEM also benefits from previous service discovery research in selected areas but is endowed with a new concept by combining some special feat ures of mobile agents as well as integra2 ting service discovery tool with agent servers. Fig11 An example of service discovery and data mining process In principle , data service providers register t he U RLs of their websites t hat advertise all t he information concerning services. As a middleware on the service discovery server , t he search engine will periodically retrieve web pages indicated in URL s and all t heir referencing documents , parse tags and words in documents and set up the rela2 tionship between the keywords and t he host ad2 dress of t hese service providers. On t he ot her 第 2 期 L I Xi2ning ,et al :Dispatching mobile Agents for DDM applications ·183 ·
·184· 智能系统学报 第3卷 hand,a mobile agent can move to a service discov- ded facts might be useless.In IMA GO system,we ery server,utilize the system interface to access intend to provide two data representation schemes, the search engine's database and obtain an itinerar-i.e.,fact-driven and data-driven,which would y that includes a list of ranked host addresses of broaden the range of choices accessible by applica- the data service providers.Based on the given itin- tion programmer.Data-driven approach converts erary,the mobile agent may travel from host to content of a table to a list of rows where each row host to carry out a DDM application.Figure 1 is a sub-list of fields.In other words,the results gives an example of service discovery and data min- obtained from a database query will be automatical- ing process. ly transformed to the IMA GO Prolog internal data The application programming interface of data structures. service discovery for mobile agents is a built-in Having received a list of database addresses predicate,namely,web_search(Query,Number, through the service discovery module,an agent Result),where Query is a compound term specif- may move from host to host to access these data- ying characteristics of the search,Number is an in-bases or clone multiple agents with assigned data- teger indicating what is the maximum number of base addresses to start a DDM application.In or- results expected,and Result is a variable to hold der to bridge logic based mobile agents with data- return values.For example,suppose a food com-base systems,the IMA GO system provides a set of pany wants to analyze the customer transaction re-database predicates,which enables agents to estab- cords for quickly developing successful business lish connection with data sources and make re- strategies,its DDM agent may move to a known I quests for desired information. MA GO service discovery server and then issue a To be able to access a database,an agent has query predicate requesting up to 10 possible food to find which database drivers are supported by the industry database locations: underlying system.For this purpose,built-in web_search(locate(“food”,“customer trans predicate db_drivers(?list)can be invoked to col- action”,“imago data server',l0,R) lect a list of all supported database drivers.A driv- The agent is blocked and control is transferred er identifier must be unique and is commonly to the service discovery module of the hosting I- named based on the database implementation,such MA GO server.The discovery module will commu- as mysql,oracle,and so on. nicate with the searcher,wait for search results, A database connection is established by issu- and resume the execution of the blocked agent. ing a predicate of the form db_connect(-conn,+db Search results will be delivered to the requesting a- URL).A valid db_URL looks like: gent in the form of list,where entries of the list driver:/username password @host- are ranked in priorities from high to low. name[port ][/database] where driver indicates the specific database imple- 2 Data mining facilities mentation,username and password are used for Obviously,representing content of a database authentication,hostname gives the network loca- in logic terms may be achieved by using different tion of the database server,port is used to establish methods.As a Prolog program itself is somehow a TCP connection if required,and database speci- thought of as a database,one natural approach is fies the actual database name which the agent fact-driven,namely,to append the content of a ta- wants to access.Upon success,variable conn is in- ble as a set of fact clauses to the end of the pro- stantiated with the database connection handler gram.The advantages of such approach are sim- which will be used in the subsequent database ac- plicity and ease of use,but the disadvantages are cesses.To disconnect a database,the agent may dynamical modification of the original program and issue a predicate db_disconnection(+conn),where creation of garbage code because part of the appen- conn represents a previously established connec- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
hand , a mobile agent can move to a service discov2 ery server , utilize the system interface to access t he search engine’s database and obtain an itinerar2 y t hat includes a list of ranked host addresses of t he data service p roviders. Based on t he given itin2 erary , t he mobile agent may travel from host to host to carry out a DDM application. Figure 1 gives an example of service discovery and data min2 ing process. The application programming interface of data service discovery for mobile agents is a built2in predicate , namely , web_ search ( Query , Number , Result) , where Query is a compound term specif2 ying characteristics of t he search , Number is an in2 teger indicating what is t he maximum number of results expected , and Result is a variable to hold ret urn values. For example , suppose a food com2 pany wants to analyze t he customer transaction re2 cords for quickly developing successf ul business strategies , its DDM agent may move to a known I2 MA GO service discovery server and t hen issue a query predicate requesting up to 10 possible food industry database locations: web_ search (locate “( food”,“customer trans2 action”,“imago data server”) , 10 , R) The agent is blocked and control is transferred to t he service discovery module of t he hosting I2 MA GO server. The discovery module will commu2 nicate wit h t he searcher , wait for search results , and resume t he execution of the blocked agent. Search results will be delivered to the requesting a2 gent in the form of list , where entries of t he list are ranked in priorities from high to low. 2 Data mining facilities Obviously , representing content of a database in logic terms may be achieved by using different met hods. As a Prolog program itself is somehow t hought of as a database , one nat ural approach is fact2driven , namely , to append t he content of a ta2 ble as a set of fact clauses to t he end of t he p ro2 gram. The advantages of such approach are sim2 plicity and ease of use , but the disadvantages are dynamical modification of the original program and creation of garbage code because part of t he appen2 ded facts might be useless. In IMA GO system , we intend to provide two data representation schemes , i. e. , fact2driven and data2driven , which would broaden t he range of choices accessible by applica2 tion programmer. Data2driven approach converts content of a table to a list of rows where each row is a sub2list of fields. In other words , t he results obtained from a database query will be automatical2 ly transformed to t he IMA GO Prolog internal data struct ures. Having received a list of database addresses t hrough t he service discovery module , an agent may move from host to host to access t hese data2 bases or clone multiple agents wit h assigned data2 base addresses to start a DDM application. In or2 der to bridge logic based mobile agents with data2 base systems , t he IMA GO system provides a set of database predicates , which enables agents to estab2 lish connection with data sources and make re2 quests for desired information. To be able to access a database , an agent has to find which database drivers are supported by t he underlying system. For t his p urpose , built2in predicate db_drivers( ? list) can be invoked to col2 lect a list of all supported database drivers. A driv2 er identifier must be unique and is commonly named based on t he database implementation , such as mysql , oracle , and so on. A database connection is established by issu2 ing a predicate of t he form db_connect (2conn , + db _U RL) . A valid db_URL looks like : [ driver :/ / ] [ username [ : password ] @] host2 name[ :port ][/ database ] where driver indicates t he specific database imple2 mentation , username and password are used for aut hentication , host name gives t he network loca2 tion of t he database server ,port is used to establish a TCP connection if required , and database speci2 fies t he act ual database name which t he agent wants to access. Upon success , variable conn is in2 stantiated with the database connection handler which will be used in the subsequent database ac2 cesses. To disconnect a database , t he agent may issue a predicate db_disconnection ( + conn) , where conn rep resents a previously established connec2 ·184 · 智 能 系 统 学 报 第 3 卷
第2期 LI Xining ,et al Dispatching mobile Agents for DDM applications ·185 tion.Once a database has been disconnected,all and migrate back to its stationary server,or invoke resources allocated for this connection are released dispatch($oneway_messenger,'home',Results) and the connection handler is no longer valid. to create a messenger which is responsible to deliv- To facilitate DDM applications,the IMA GO er Results to the home server. system provides two different ways of database ac- Communication among agents takes place by cess operations,namely,the set retrieval and the means of an Agent Communication Language tuple retrieval.The former returns the entire (ACL).The essence of an ACL is to make data matching data set to the requesting agent,whereas mining agents understanding the purpose and the latter allows the requesting agent to consume meaning of their exchanged data.In general,a the matching data on the tuple by tuple basis.For message consists of two aspects,namely,perfor- example,to issue a database query,a simple predi-mative and content.The performative specifies the cate db query (conn,sql)can be invoked,purpose of a message and the content gives a con- where argument sql represents a standard SQL crete description for achieving the purpose.In or- statement.If the query succeeds,user may either der to facilitate open standards of ACL's,the I- issue a subsequent call to db_store_result(+conn,MA GO agent-based communication model is in rest,+options)to save the entire result set into compliance with the FIPA ACL message structure the agent's memory as a two dimensional list,or specification!Of course,the performative and to db_use_data(+conn,-rest)to initiate the result content of a message often determine the reaction set and then issue a sequence of calls to db_next_ of the receiver.In addition to the various types of row(+rest,row,+options)for row-wise data system built-in messengers for sending agents,the processing,and finally invoke a db_free_result ( IMA GO system provides a set of predicates for re- rest)when all rows have been exhausted. ceiving agents.The predicate which is similar to an In a DDM application,agents are not working unblocking receive is accept(Sender,Msg).An in- alone and they need to communicate with each oth- vocation to this procedure succeeds if a matching er to cooperate and generate a global data aggrega-messenger is found,or fails if either the caller's tion for further analysis.Most existing mobile a- messenger queue is empty or there is no matching gent systems adopt some kind of communication messenger in the queue.Likewise,the predicate models/protocols from traditional distributed sys- which implements the concept of blocking receive tems.However,the IMA GO system adopts a dif- is wait_accept(Sender,Msg).A call to this proce- ferent strategy to cope with this issue.The idea is dure succeeds immediately if a matching messenger to deploy intelligent mobile messengers for inter-a- is found.However,it will cause the caller to be gent communication!31.Messengers are thin agents blocked if either the caller's messenger queue is dedicated to deliver messages.Like normal agents, empty,or no matching messenger can be found.In a messenger can move,clone,and make decisions. this case,it will be automatically re-executed when Unlike normal agents,a messenger is anonymous a new messenger attaches to the caller's messenger and its special task is to track down the receiving queue.Pragmatically,the semantics of matching agent and reliably deliver messages in a dynamic, messengers is implemented by logic unification. changing environment.The IMA GO system pro- 3 A simple example of DDM appli- vides a set of built-in messengers as a part of its programming interface.A data-mining agent at cation any remote sites and at any time may dispatch mes- In order to illustrate the usage of the IMA GO sengers to deliver data to designated receivers.For database facilities,we show a very simple example example,suppose that a mobile agent has comple- in this section.Although not very practical,this ted its data mining work at a remote database serv- example provides a typical sample of how to deploy er,it can either call move('home')to carry results mobile agents in DDM applications.As a reference 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
tion. Once a database has been disconnected , all resources allocated for t his connection are released and t he connection handler is no longer valid. To facilitate DDM applications , the IMA GO system p rovides two different ways of database ac2 cess operations , namely , the set retrieval and the t uple retrieval. The former ret urns t he entire matching data set to t he requesting agent , whereas t he latter allows t he requesting agent to consume t he matching data on the t uple by t uple basis. For example , to issue a database query , a simple predi2 cate db _ query ( + conn , + sql) can be invoked , where argument sql rep resents a standard SQL statement. If t he query succeeds , user may eit her issue a subsequent call to db_store_result ( + conn , ? rest , + options) to save the entire result set into t he agent’s memory as a two dimensional list , or to db_use_data ( + conn , 2rest) to initiate the result set and t hen issue a sequence of calls to db_ next _ row ( + rest , ? row , + options) for row2wise data processing , and finally invoke a db_free_result ( + rest) when all rows have been exhausted. In a DDM application , agents are not working alone and they need to communicate wit h each oth2 er to cooperate and generate a global data aggrega2 tion for f urt her analysis. Mo st existing mobile a2 gent systems adopt some kind of communication models/ protocols from traditional distributed sys2 tems. However , the IMA GO system adopts a dif2 ferent strategy to cope wit h t his issue. The idea is to deploy intelligent mobile messengers for inter2a2 gent communication [ 13 ] . Messengers are thin agents dedicated to deliver messages. Like normal agents , a messenger can move , clone , and make decisions. Unlike normal agents , a messenger is anonymous and its special task is to track down t he receiving agent and reliably deliver messages in a dynamic , changing environment. The IMA GO system p ro2 vides a set of built2in messengers as a part of its programming interface. A data2mining agent at any remote sites and at any time may dispatch mes2 sengers to deliver data to designated receivers. For example , suppose t hat a mobile agent has comple2 ted its data mining work at a remote database serv2 er , it can eit her call move‘( home’) to carry results and migrate back to its stationary server , or invoke dispatch ( oneway_messenger ,‘home’, Results) to create a messenger which is responsible to deliv2 er Results to t he home server. Communication among agents takes place by means of an Agent Communication Language (ACL) . The essence of an ACL is to make data mining agents understanding t he p urpose and meaning of their exchanged data. In general , a message consists of two aspects , namely , perfor2 mative and content. The performative specifies t he p urpose of a message and t he content gives a con2 crete description for achieving t he p urpose. In or2 der to facilitate open standards of ACL’s , the I2 MA GO agent2based communication model is in compliance with the FIPA ACL message struct ure specification [14 ] . Of course , t he performative and content of a message often determine t he reaction of t he receiver. In addition to t he various types of system built2in messengers for sending agents , t he IMA GO system provides a set of p redicates for re2 ceiving agents. The predicate which is similar to an unblocking receive is accept (Sender , Msg) . An in2 vocation to t his procedure succeeds if a matching messenger is found , or fails if either t he caller’ s messenger queue is empty or t here is no matching messenger in t he queue. Likewise , t he predicate which implements the concept of blocking receive is wait_accept (Sender , Msg) . A call to t his proce2 dure succeeds immediately if a matching messenger is found. However , it will cause the caller to be blocked if eit her t he caller’ s messenger queue is empty , or no matching messenger can be found. In t his case , it will be automatically re2executed when a new messenger attaches to t he caller’s messenger queue. Pragmatically , t he semantics of matching messengers is implemented by logic unification. 3 A simple example of DDM appli2 cation In order to illustrate t he usage of t he IMA GO database facilities , we show a very simple example in t his section. Although not very p ractical , this example provides a typical sample of how to deploy mobile agents in DDM applications. As a reference 第 2 期 L I Xi2ning ,et al :Dispatching mobile Agents for DDM applications ·185 ·
·186· 智能系统学报 第3卷 framework,more complex DDM algorithms can be The code of a DDM_agent defines the home a- easily coded. gent resided at the stationary server.When the home agent starts execution,it creates two mobile :-home agent(a DDM agent) agents called workerl and worker2 alone with ini- a_DDM_agent(_): tial arguments and then waits for results.When a create(“workerI”,worker1,location(“perseus.imago worker is loaded,it migrates to the IMA GO data- _lab.cis.uoguelph.ca","mysql://localhost/ base server specified in its argument.Having ar- ImagoDB1”), create(“worker2”,worker2,location(“orion.imago_ rived to the database server,the worker continues lab.cis.uoguelph.ca","mysql://localhost/Ima- execution by connecting to the given local database goDB2"”), and making a SQL query for data access.Two wait accept(worker1,Tablel), worker agents take different way to retrieve data, wait accept(worker2,Table2), where workerl uses set retrieval and worker2 a- result compare(Tablel,Table2), dopts tuple retrieval.Having collected all required terminate. data,both workers separately dispatch a s oneway result compare (TI,T2):-write ("Two tables are i- messenger to deliver results back to the home a- dentical."),nl. result_compare(,)-write("Two tables are not i- gent for further data mining analysis. dentical."),nl 4 Conclusion :-end home agent(a DDM agent) In this paper,we discussed the scheme of de- :-mobile agent (worker1). ploying mobile agents in DDM applications.The workerl(location(Database Server,DB URL)):- move(Database Server), advantage of adopting mobile agents for DDM is to db connect(C,DB URL). scale up to large,dynamic and remote data db_query(C,“SEL ECT·FROM LOG', sources,such as various databases distributed over db_store_result(C,Table,name(true),type the Internet.We presented the design of data serv- (true),length(true)]), ice discovery module and database management db_disconnect (C), module.The programming interface of these mod- dispatch($oneway_messenger,home,Table)., ules is a set system built-in predicates capable to dispose. couple a logic programming language with func- :-end mobile agent (worker1) tionalities of locating data services and accessing -mobile_agent (worker2). remote databases.Equipped with those system worker2(location(Database Server,DB URL)):- tools,mobile agents may search for suitable data move(Database Server), sites,roam the Internet to collect useful informa- db connect(C,DB URL) tion,and communicate with each other to generate db query(C,"SELECT FROM LOG), db use result(C,R), a global view of data through the aggregation of get_rows(R,Table), distributed computations.In order to verify the db disconnect(C,), feasibility and efficiency of the mobile agent based dispatch($oneway_messenger,home,Table), DDM proposal,experimental service discovery dispose module and database management module have get_rows(R,[RowT])- been implemented and integrated with the IMA GO db_next_row(R,Row,[name(true),type(true), system.The service discovery module is based on length(true)]), the search engine technology and concentrates on get_rows(R,T) get_rows(R,[]): locating database services.It uses web pages as a db_free_result(R). medium to advertise services,and runs an inde- :-end_mobile_agent (worker2) pendent search engine to gather and index service provider's information,such as service types,da- 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
framework , more complex DDM algorit hms can be easily coded. :2home_agent (a_DDM_agent) . a_DDM_agent (_) :2 create “( worker1”, worker1 , location“( perseus. imago _ lab. cis. uoguelph. ca”, “mysql :/ / localhost/ ImagoDB1”) ) , create “( worker2”, worker2 , location “( orion. imago _ lab. cis. uoguelph. ca”“, mysql :/ / localhost/ Ima2 goDB2”) ) , wait_accept (worker1 , Table1) , wait_accept (worker2 , Table2) , result_compare ( Table1 , Table2) , terminate. result_compare ( T1 , T2) :2 write “( Two tables are i2 dentical. ”) , nl. result_compare (_ , _) :2 write “( Two tables are not i2 dentical. ”) , nl. :2end_home_agent (a_DDM_agent) . :2mobile_agent (worker1) . worker1 (location (Database_Server , DB_URL) ) :2 move (Database_Server) , db_connect (C , DB_URL) , db_query (C ,“SEL ECT 3 FROM LO G”) , db _ store _ result ( C , Table , [ name (true) , type (true) , length (true) ]) , db_disconnect (C) , dispatch ( oneway_messenger , home , Table) . , dispose. :2end_mobile_agent (worker1) . :2mobile_agent (worker2) . worker2 (location (Database_Server , DB_URL) ) :2 move (Database_Server) , db_connect (C , DB_URL) , db_query (C ,“SEL ECT 3 FROM LO G”) , db_use_result (C , R) , get_rows(R , Table) , db_disconnect (C , ) , dispatch ( oneway_messenger , home , Table) , dispose. get_rows(R , [ Row| T]) :2 db_next_row (R , Row , [ name (true) , type (true) , length (true) ]) , get_rows(R , T) . get_rows(R , [ ]) :2 db_free_result (R) . :2end_mobile_agent (worker2) . The code of a_DDM_agent defines t he home a2 gent resided at t he stationary server. When t he home agent starts execution , it creates two mobile agents called worker1 and worker2 alone wit h ini2 tial arguments and t hen waits for results. When a worker is loaded , it migrates to t he IMA GO data2 base server specified in its argument. Having ar2 rived to t he database server , t he worker continues execution by connecting to t he given local database and making a SQL query for data access. Two worker agents take different way to retrieve data , where worker1 uses set retrieval and worker2 a2 dopts tuple retrieval. Having collected all required data , bot h workers separately dispatch a oneway _messenger to deliver results back to the home a2 gent for f urt her data mining analysis. 4 Conclusion In this paper , we discussed t he scheme of de2 ploying mobile agents in DDM applications. The advantage of adopting mobile agents for DDM is to scale up to large , dynamic and remote data sources , such as various databases distributed over t he Internet. We presented the design of data serv2 ice discovery module and database management module. The programming interface of t hese mod2 ules is a set system built2in predicates capable to couple a logic p rogramming language wit h f unc2 tionalities of locating data services and accessing remote databases. Equipped with those system tools , mobile agents may search for suitable data sites , roam t he Internet to collect usef ul informa2 tion , and communicate wit h each other to generate a global view of data t hrough the aggregation of distributed comp utations. In order to verify t he feasibility and efficiency of the mobile agent based DDM proposal , experimental service discovery module and database management module have been implemented and integrated wit h t he IMA GO system. The service discovery module is based on t he search engine technology and concentrates on locating database services. It uses web pages as a medium to advertise services , and runs an inde2 pendent search engine to gat her and index service provider’s information , such as service types , da2 ·186 · 智 能 系 统 学 报 第 3 卷
第2期 LI Xining ,et al Dispatching mobile Agents for DDM applications ·187 tabase names,URLs,access modes,as well as [4]HASHMAN S,KNUDSEN S.The application of jini possible verification information.The database technology to enhance the delivery of mobile services. management module not only provides flexible in- [2001-12-01].http:/www.sun.com. [5 Salutation consortium,salutation architecture overview terface for accessing data,but also manipulates da- EB/OL ]1999-06-23 ]http /www.salutation.org/ tabase connections efficiently.At the current whitepaper stage,the database model in the IMA GO system is [6]Universal plug and play forum,UPnP?device architec- MySQL,the most popular open source DBMS sys- ture,version 1.0.1 EB/OL ][2006-911].http:// tem in the worldlis1. www.upnp-ic.org/resources/UPnP_device_architecture_ Research on the agent based DDM involves docs/. further extensions of the IMA GO system.First, [7]GUTTMAN E,PERKINS C,VEIZADESJ,et al.Serv- the current implementation of service discovery ice Location Protocol,Version 2[EB/OL ][1999-07- 21].http //www.ietf.org/rfc/rfc2608.txt. module deals with only a limited number of logical [8]OASIS EB/OL ]2005-10-27].http:/www.uddi. relationships.To be able to offer more precise dis- org. covery service,this module could be enhanced to [9]BOCCA J,DAHMENM M.FREESTON M,MACAR- parse some complex search criteria,such as condi- TNEY G,PEARSON P J.A Prolog for very large tional expressions and substring matching.Sec- knowledge bases [C]//Proceedings of the Seventh ondly,since databases may contain multi-dimen- British National Conference on Databases.Heriot-Watt sional data,retrieving such kind of information University,England,1990:163-184. [10]CERI S,GOTTLOB G,WIEDERHOLD G.Efficient from flat web pages is a pending problem.We are database access from prolog [J ]IEEE Transactions on looking to use XML meta-data to solve the data- Software Engineering,1989,15(2):153-164. base dimensional problem.In addition,we are [11 ]Quintus Inc.ProDBI,ODBC Interface for Quintus Pro- making investigations on adding more program- log Database,V.4.0[EB/OL ].[1997-06-18 ]http:// ming languages to the IMA GO system,as well as www.sics.se/. introducing more flexible and efficient communica- [12]SONG L,LI X,NI J.A database service discovery tion tools,such as mobile socket,to facilitate model for mobile Agents[J ]International Journal of In- DDM applications. telligent Information Technologies,2006,2(2)16-29. Acknowledgments.We would like to express [13]LI X,AUTRAN G.Interagent communication in MA GO prolog [J ]Lecture Notes in Artificial Intelli- our appreciation to the Natural Science and Engi- gence,2005:3346,163-180. neering Council of Canada and State Key Laborato- [14]FIPA ACL,Agent Communication Language Specifica- ry of Novel Software Technology (Nanjing Univer- tions,FIPA,2005:http //www.fipa.org,. sity)for supporting this research. [15]MySQL AB:MySQL 5.0 Reference Manual,MySQL Documentation Library EB/OL ]2008-01-03 ]ht- References tp://dev.mysql.com/doc/mysql/en/. [1]KLUSCH M,LODI S,MORO G.The role of Agents in 作者简介: distributed data mining:issues and benefits[Cl//IEEE LI Xi-ning was born in 1952.He is a WIC International Conference on Intelligent Agent Tech- professor of the Department of Computing nology.Halifax,Canada,2003:211-217. and Information Science at the University of Guelph and the director of the IMAGO [2]PARK B,KARGUPTA H.Distributed data mining:al- gorithms,systems,and applications[M].[S.1.]Law- Lab.His research interests include mobile rence Erlbaum Associates,2003:341-358. agent system,logic programming,and vir- tual machine implementation.He received a [3]BETTSTETTER C,RENNER C.A comparison of serv- PhD in computer science from the University of Calgary in ice discovery protocols and implementation of the service 1989.He has published 1 book,2 book chapters,and 70 location protocol [C]//Proc of EUNICE 2000.Twente, journal and conference papers. Netherlands,2000:13-15. 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.net
tabase names , URL s , access modes , as well as possible verification information. The database management module not only provides flexible in2 terface for accessing data , but also manip ulates da2 tabase connections efficiently. At t he current stage , t he database model in the IMA GO system is MySQL , t he most pop ular open source DBMS sys2 tem in t he world [15 ] . Research on t he agent based DDM involves f urther extensions of t he IMA GO system. First , t he current implementation of service discovery module deals with only a limited number of logical relationship s. To be able to offer more precise dis2 covery service , t his module could be enhanced to parse some complex search criteria , such as condi2 tional expressions and sub2string matching. Sec2 ondly , since databases may contain multi2dimen2 sional data , retrieving such kind of information from flat web pages is a pending problem. We are looking to use XML meta2data to solve t he data2 base dimensional problem. In addition , we are making investigations on adding more program2 ming languages to the IMA GO system , as well as introducing more flexible and efficient communica2 tion tools , such as mobile socket , to facilitate DDM applications. Acknowledgments. We would like to express our appreciation to t he Nat ural Science and Engi2 neering Council of Canada and State Key Laborato2 ry of Novel Software Technology (Nanjing Univer2 sity) for supporting t his research. References : [1 ] KLUSCH M , LODI S ,MORO G. The role of Agents in distributed data mining : issues and benefits[ C]/ / IEEE WIC International Conference on Intelligent Agent Tech2 nology. Halifax , Canada , 2003 : 2112217. [2 ] PAR K B , KARGU PTA H. Distributed data mining : al2 gorithms , systems , and applications[ M ]. [ S. l. ] :Law2 rence Erlbaum Associates , 2003 :3412358. [ 3 ]BETTSTETTER C , RENN ER C. A comparison of serv2 ice discovery protocols and implementation of the service location protocol [ C]/ / Proc of EUNICE 2000. Twente , Netherlands , 2000 : 13215. [4 ] HASHMAN S , KNUDSEN S. The application of jini technology to enhance the delivery of mobile services. [2001212201 ]. http :/ / www. sun. com. [ 5 ] Salutation consortium , salutation architecture overview [ EB/ OL ]. [ 1999206223 ]. http :/ / www. salutation. org/ whitepaper. [6 ] Universal plug and play forum , U PnP ? device architec2 ture , version 1. 0. 1 [ EB/ OL ]. [ 200629211 ]. http :/ / www. upnp2ic. org/ resources/ U PnP_device_architecture_ docs/ . [ 7 ] GU TTMAN E , PER KINS C , VEIZADES J ,et al. Serv2 ice Location Protocol , Version 2 [ EB/ OL ]. [ 19992072 21 ]. http :/ / www. ietf. org/ rfc/ rfc2608. txt. [8 ] OASIS [ EB/ OL ]. [ 2005210227 ]. http :/ / www. uddi. org. [9 ]BOCCA J , DA HMENM M , FREESTON M , MACAR2 TN EY G, PEARSON P J. A Prolog for very large knowledge bases [ C ]/ / Proceedings of the Seventh British National Conference on Databases. Heriot2Watt University , England , 1990 : 1632184. [10 ] CERI S , GO TTLOB G, WIEDERHOLD G. Efficient database access from prolog [J ]. IEEE Transactions on Software Engineering , 1989 , 15 (2) : 1532164. [ 11 ]Quintus Inc. ProDBI , ODBC Interface for Quintus Pro2 log Database , V. 4. 0 [ EB/ OL ]. [ 1997206218 ]. http :/ / www. sics. se/ . [12 ] SON G L , L I X , NI J. A database service discovery model for mobile Agents[J ]. International Journal of In2 telligent Information Technologies , 2006 ,2 (2) 16229. [13 ] L I X , AU TRAN G. Inter2agent communication in I2 MA GO prolog [J ]. Lecture Notes in Artificial Intelli2 gence , 2005 :3346 ,1632180. [14 ]FIPA ACL , Agent Communication Language Specifica2 tions , FIPA , 2005 : http :/ / www.fipa. org ,. [15 ] MySQL AB : MySQL 5. 0 Reference Manual , MySQL Documentation Library [ EB/ OL ]. [ 2008201203 ]. ht2 tp :/ / dev. mysql. com/ doc/ mysql/ en/ . 作者简介 : L I Xi2ning was born in 1952. He is a professor of the Department of Computing and Information Science at the University of Guelph and the director of the IMA GO Lab. His research interests include mobile agent system , logic programming , and vir2 tual machine implementation. He received a PhD in computer science from the University of Calgary in 1989. He has published 1 book , 2 book chapters , and 70 journal and conference papers. 第 2 期 L I Xi2ning ,et al :Dispatching mobile Agents for DDM applications ·187 ·