当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《电子商务 E-business》阅读文献:Component Search and Reuse_An Ontology-based Approach

资源类别:文库,文档格式:PDF,文档页数:8,文件大小:296.8KB,团购合买
点击下载完整版文档(PDF)

Component Search and Reuse: An Ontology-based Approach Awny Alnusair and Tian Zhao Department of Computer Science University of Wisconsin-Milwaukee, USA falnusair, tzhao@uwm. edu Abstract cation domain, and to represent the relationships that hold among these concepts. Due to their solid formal and rea- In order to realize the full potential of software reuse, soning foundation, ontologies can play an important role effective search techniques are indeed essential. In this pa- in domain engineering reuse; they can be used to structure per, we propose a semantic-based approach for identifying and build a source-code knowledge base that can be used by Ind retrieving relevant components from a reuse repository. software agents(e. g, search engine) and certainly serve as This approach relies on building a knowledge base accord- a basis for semantic queries [ 9]. Therefore, ontologies have ing to an ontology model that includes a source-code on- been successfully used by various researchers to improve many aspects of the software engineering processes [12] ogy. Due to the indexing and knowledge population mecha Towards this end, we have developed an ontology model nisms we used, our proof-of-concept supports various kinds that includes an enhanced software representation ontol of search techniques. However, our experiments show evi- This ontology is further extended with additional dence that only pure semantic search that exploits domain component-specific knowledge and automatically popu- nowledge tends to improve precision Based on a usability lated with ontological instances representing various pro- case study, we argue that semantic search is indeed usable gram elements of a given library. These instances are fur ther annotated with respect to concepts from a domain specific ontology Ontology-based component search is thus performed by 1. Introduction the semantic matching of user requests expressed using terms from the domain ontology with component descrip Systematic software reuse enables developers to use ex tions found in the populated knowledge-base. Furthermore isting software components for constructing new quality the knowledge po ation oftware systems. Thus, speeding up the development pro- in our approach still allows searching the knowledge base cess and reducing costs and risks. Failure modes analysis using the most familiar keyword search. This is particularly useful when domain ontologies or semantic component an- of the reuse process shows that in order to be reused, a soft- notations are lacking or incomplete. Users are thus able to ware component must be findable and certainly understand able 51. On one hand, understanding the component's func- keyword queries, type-based queries or a mixture of all tionality as well as its relationships with other components in a reuse library is usually hampered due to the lack of quality descriptions of library services. On the other hand, 2. Ontology model for component reuse finding suitable components is still a significant barrier for exploiting systematic software reuse At the core of our ontology model for object-oriented In this paper, we present an approach for describing, component retrieval is a Source Code Representation On retrieving, and exploring the various relationships among tology(referred to afterwards as SCRO). This ontology pro- software components in object-oriented reuse libraries. In vides a base model for capturing the relationships and de order to provide a formal and precise representation of li- pendencies among source-code artifacts. It models major brary code, our approach relies on ontologizing software concepts and features of object-oriented programs, includ- ing encapsulation, class and interface inheritance, met Ontologies provide means to explicitly describe con- overloading, method overriding, and method signature in- pts, objects, properties and other entities in a given appli- formation. SCRO,'s knowledge is represented using the

Component Search and Reuse: An Ontology-based Approach Awny Alnusair and Tian Zhao Department of Computer Science University of Wisconsin-Milwaukee, USA {alnusair, tzhao}@uwm.edu Abstract In order to realize the full potential of software reuse, effective search techniques are indeed essential. In this pa￾per, we propose a semantic-based approach for identifying and retrieving relevant components from a reuse repository. This approach relies on building a knowledge base accord￾ing to an ontology model that includes a source-code on￾tology, a component ontology, and a domain-specific ontol￾ogy. Due to the indexing and knowledge population mecha￾nisms we used, our proof-of-concept supports various kinds of search techniques. However, our experiments show evi￾dence that only pure semantic search that exploits domain knowledge tends to improve precision. Based on a usability case study, we argue that semantic search is indeed usable and practical. 1. Introduction Systematic software reuse enables developers to use ex￾isting software components for constructing new quality software systems. Thus, speeding up the development pro￾cess and reducing costs and risks. Failure modes analysis of the reuse process shows that in order to be reused, a soft￾ware component must be findable and certainly understand￾able [5]. On one hand, understanding the component’s func￾tionality as well as its relationships with other components in a reuse library is usually hampered due to the lack of quality descriptions of library services. On the other hand, finding suitable components is still a significant barrier for exploiting systematic software reuse. In this paper, we present an approach for describing, retrieving, and exploring the various relationships among software components in object-oriented reuse libraries. In order to provide a formal and precise representation of li￾brary code, our approach relies on ontologizing software knowledge. Ontologies provide means to explicitly describe con￾cepts, objects, properties and other entities in a given appli￾cation domain, and to represent the relationships that hold among these concepts. Due to their solid formal and rea￾soning foundation, ontologies can play an important role in domain engineering reuse; they can be used to structure and build a source-code knowledge base that can be used by software agents (e.g., search engine) and certainly serve as a basis for semantic queries [9]. Therefore, ontologies have been successfully used by various researchers to improve many aspects of the software engineering processes [12]. Towards this end, we have developed an ontology model that includes an enhanced software representation ontol￾ogy. This ontology is further extended with additional component-specific knowledge and automatically popu￾lated with ontological instances representing various pro￾gram elements of a given library. These instances are fur￾ther annotated with respect to concepts from a domain￾specific ontology. Ontology-based component search is thus performed by the semantic matching of user requests expressed using terms from the domain ontology with component descrip￾tions found in the populated knowledge-base. Furthermore, the knowledge population and indexing mechanisms used in our approach still allows searching the knowledge base using the most familiar keyword search. This is particularly useful when domain ontologies or semantic component an￾notations are lacking or incomplete. Users are thus able to retrieve components using purely semantic-based queries, keyword queries, type-based queries or a mixture of all. 2. Ontology model for component reuse At the core of our ontology model for object-oriented component retrieval is a Source Code Representation On￾tology (referred to afterwards as SCRO). This ontology pro￾vides a base model for capturing the relationships and de￾pendencies among source-code artifacts. It models major concepts and features of object-oriented programs, includ￾ing encapsulation, class and interface inheritance, method overloading, method overriding, and method signature in￾formation. SCRO’s knowledge is represented using the

OWL-DL ontology language. OWL is a web-based con- components in a given domain. Therefore, we extends ceptual modelling language used for capturing relationship SCROs semantic representations of API structures and en- semantics among domain concepts, OWL-DL is a subset rich it with additional component-specific descriptions re of oWL based on Description Logic(DL) and has desir- quired to uniquely identify and retrieve an API compe able computational properties for automated reasoning sys- nent. The result is a COMPonent REpresentation ontol- tems. OWL-DLs reasoning support enables inferring addi- ogy(referred to afterwards as COMPRE). In addition to tional knowledge and computing the classification hierar- the concepts, axioms, and properties inherited from SCRO, chy(subsumption reasoning). SCRO defines various OWL COMPRe defines its own class hierarchy and relations concepts that map directly to source-code elements and col- for semantic component descriptions. For instance, the lectively represent the most important concepts found in Component concept represents software components in object-oriented programs. Furthermore, SCRO defines var- general and subsumes other OWL classes that represents ious OWL object properties, datatype properties, and onto- Java-specific components such as Method, classType logical axioms to represent the various relationships among and Interface Type. A fragment of the ontology tax ontological concepts. SCRO is precise, well documented, onomy is shown on the left pane of Figure l and the com and designed with ontology reuse in mind. The availability plete ontology can be examined online [1 of SCRO online [1], allows researchers to reuse or extend its Moreover, comPre defines various ontological ax- representational components to support any semantic-based ioms and object properties that represent relationships application that requires source-code knowledge among software components. These properties link com- ponents with their corresponding semantic descriptions 2.l. Domain-specific ontology specified in the domain-specific ontology. For example, Domain ontologies describe concepts and structures re- corresponding inverse properties are used to annotate in- lated to a particular domain(e.g. finance, shopping, dividual software components with domain terms repre medicine, graphics, or the object-oriented programs domain senting the expected inputs and outputs of the component as specified in SCRO). In our approach to component re- Moreover, COMPRE defines the dependson symmetric trieval, we use a domain ontology that conceptualizes each object property that defines dependency relationship be- software library we need to reuse. This ontology provides tween components. describedBy is also defined to li a common vocabulary with unambiguous and conceptually a component to a domain concept that best describes the sound terms that can be used to annotate software compo- purpose or the nature of the component nents. Annotations in this context serves two key purposes In addition to pure semantic search that is based on Firstly, both software providers and users can communicate using a shared and common vocabulary provided by this component annotations with respect to a domain onto ogy, COMPRE also defines various datatype properties to ontology. Thus enabling a precise retrieval of API compo- model other metadata about components. These proper nents. Secondly, it brings difterent perspectives to typical ties are provided to enable metadata keyword queries that program comprehension tasks. Users can familiarize them- can be used when semantic annotations are lacking or in- selves with terminology and conceptual knowledge that is complete. For instance, the hasInput Terms and the typically implicit in the problem domain. To this end, we have developed a mini-ontology for data words describing the component's input and its expected retrieval in the Semantic Web applications domain. This on tology (referred to afterwards as Swonto) has been used outpu during the evaluation of our approach(cf. Section 4)and serves as a proof of concept that component search can be 23. Knowledge population significantly enhanced through the use of domain ontolo- gies. A small fragment of the ontology's taxonomy is shown on the upper right pane of Figure I and the complete onto- to populate the knowledge base with ontological instances ogy is found online [1] that represent various ontological concepts and their corre- 2.2. Component-specific ontology sponding relationships. Therefore, we have built a knowl- edge extractor subsystem for the Java programming lan In the context of component retrieval, we certainly need guage. Our subsystem performs a comprehensive parsing of profound semantic description of the components in- the Java bytecode and captures every ontology concept that represents a source code element and generates instances ner working structure and its interrelationships with other of all ontological properties defined in our ontologies for http://www.w3.org/2004/owl/ those program elements. The generated semantic instances

OWL-DL1 ontology language. OWL is a web-based con￾ceptual modelling language used for capturing relationship semantics among domain concepts, OWL-DL is a subset of OWL based on Description Logic (DL) and has desir￾able computational properties for automated reasoning sys￾tems. OWL-DLs reasoning support enables inferring addi￾tional knowledge and computing the classification hierar￾chy (subsumption reasoning). SCRO defines various OWL concepts that map directly to source-code elements and col￾lectively represent the most important concepts found in object-oriented programs. Furthermore, SCRO defines var￾ious OWL object properties, datatype properties, and onto￾logical axioms to represent the various relationships among ontological concepts. SCRO is precise, well documented, and designed with ontology reuse in mind. The availability of SCRO online [1], allows researchers to reuse or extend its representational components to support any semantic-based application that requires source-code knowledge. 2.1. Domain-specific ontology Domain ontologies describe concepts and structures re￾lated to a particular domain (e.g. finance, shopping, medicine, graphics, or the object-oriented programs domain as specified in SCRO). In our approach to component re￾trieval, we use a domain ontology that conceptualizes each software library we need to reuse. This ontology provides a common vocabulary with unambiguous and conceptually sound terms that can be used to annotate software compo￾nents. Annotations in this context serves two key purposes. Firstly, both software providers and users can communicate using a shared and common vocabulary provided by this ontology. Thus enabling a precise retrieval of API compo￾nents. Secondly, it brings different perspectives to typical program comprehension tasks. Users can familiarize them￾selves with terminology and conceptual knowledge that is typically implicit in the problem domain. To this end, we have developed a mini-ontology for data retrieval in the Semantic Web applications domain. This on￾tology (referred to afterwards as SWONTO) has been used during the evaluation of our approach (cf. Section 4) and serves as a proof of concept that component search can be significantly enhanced through the use of domain ontolo￾gies. A small fragment of the ontology’s taxonomy is shown on the upper right pane of Figure 1 and the complete ontol￾ogy is found online [1]. 2.2. Component-specific ontology In the context of component retrieval, we certainly need a profound semantic description of the component’s in￾ner working structure and its interrelationships with other 1http://www.w3.org/2004/OWL/ components in a given domain. Therefore, we extends SCROs semantic representations of API structures and en￾rich it with additional component-specific descriptions re￾quired to uniquely identify and retrieve an API compo￾nent. The result is a COMPonent REpresentation ontol￾ogy (referred to afterwards as COMPRE). In addition to the concepts, axioms, and properties inherited from SCRO, COMPRE defines its own class hierarchy and relations for semantic component descriptions. For instance, the Component concept represents software components in general and subsumes other OWL classes that represents Java-specific components such as Method, ClassType, and InterfaceType. A fragment of the ontologys tax￾onomy is shown on the left pane of Figure 1 and the com￾plete ontology can be examined online [1]. Moreover, COMPRE defines various ontological ax￾ioms and object properties that represent relationships among software components. These properties link com￾ponents with their corresponding semantic descriptions specified in the domain-specific ontology. For example, hasDomainInput and hasDomainOutput and their corresponding inverse properties are used to annotate in￾dividual software components with domain terms repre￾senting the expected inputs and outputs of the component. Moreover, COMPRE defines the dependsOn symmetric object property that defines dependency relationship be￾tween components. describedBy is also defined to link a component to a domain concept that best describes the purpose or the nature of the component. In addition to pure semantic search that is based on component annotations with respect to a domain ontol￾ogy, COMPRE also defines various datatype properties to model other metadata about components. These proper￾ties are provided to enable metadata keyword queries that can be used when semantic annotations are lacking or in￾complete. For instance, the hasInputTerms and the hasOutputTerms are used to assign meaningful key￾words describing the component’s input and its expected output. 2.3. Knowledge population Once the ontology structure is specified, one next needs to populate the knowledge base with ontological instances that represent various ontological concepts and their corre￾sponding relationships. Therefore, we have built a knowl￾edge extractor subsystem for the Java programming lan￾guage. Our subsystem performs a comprehensive parsing of the Java bytecode and captures every ontology concept that represents a source code element and generates instances of all ontological properties defined in our ontologies for those program elements. The generated semantic instances 2

are serialized using RDF. RDF is web-based language few terms describing its purpose. The third triple, however, suitable for describing resources and provides an extensible tags the same input parameter with a meaningful concept data model for representing machine-processable semantics (QueryText)from the domain ontology. Thus, giving the of data. For each application framework parsed, we thus parameter an agreed-upon and meaningful description other generate an RDF ontology that represents the instantiated than terms or the semantically vague String type. knowledge base for the framework at hand. This knowledge base is managed by Jena [8], an open source Java frame 1. create[.] scro: hasInput Type String work for building Semantic Web applications 2. create[.] compre: hasInputTerms The process of generating semantic instances for the query string concepts and relations specified in SCRO is completely au- 3. create[.] compre: hasDomainInput However, the process of annotating componen according to COMPRE's object properties is currently man- ual as it is the case for semantic annotations in general. ebase http Ourtoolthoughprovidesmeansforinsertingtheseannota-prefixscro: tionsdirectlyintotheknowledge-base,thusgraduallybuild-prefixcomprechttp:../ontologies/compre.owl4> ing semantic descriptions for a particular API that can be PREFIx swonto: properties is generated automatically via direct parsing of a scro: staticMethod the source-code. We thus capture and normalize method scro: hasInputType i scro: hasInput Type signatures, identifier names, source-code comments, and scro: hasInput Type available Java annotations in order to obtain a meaningful utputType keyword descriptions of components. These descriptions scro: invokesMethod are lexically analyzed, stored, and indexed using the to- scro: hassignature create[String, string, Syntax] kenization and indexing mechanisms provided by Apache ompre: describedBy I a swonto: QueryCreation]i Lucene, an open-source full-featured text search engine compre: hasDomainInput a swonto: QueryText In the next section, we show how the knowledge gener- compre: hasDomainInput a swonto: URIl ed using this knowledge extractor sub-system can be used compre: hasDomainInput swonto: QueryLanguage Sy for component search. For an extended discussion of our hasDomainoutput ontologies, complete knowledge population samples, we re fer the reader to our ontologies website [1] compre: hasInputrermsquery string .."i ompre: hasInputTerms query syntax URI 3. ontological search compre: hasoutput Terms " query dc: description "create query Listing I shows a partial RDF description obtained luring the knowledge population phase for a Jena API Listing 1. RDF descriptor for an APl method method, the create method. This method belongs to the jueryFactory class and usually used to create a Query This multi-faceted description of components enables object given the specified input. This rdF descriptioN clearly captures the component's metadata at the semantic a) type or signature-based queries; b)metadata keyword queries; c) pure semantic-based queries; or d) blended The underlying data structure of RDF is a labeled di rected graph. Each node-arc-node in this graph represents a queries of the previous three types However, we focus the discussion on pure semantic- triple that consists of three parts, subject, predicate and ob- based queries that rely on domain-specific knowledge ject. Consider Listing I for example, the described method Primarily, search techniques that rely on variations of is always the subject, onto- keyword-based search suffer from synonymity and poly- ogy properties are predicates, and objects are either a re- semic ambiguity that often lead to low recall and precision source, unlabeled node(blank node)or a literal value. For On the other hand, signature matching techniques cannot example, the first triple below uses a property from scro distinguish between components that have the same signa to assert that the method has an input parameter of type ture but serve different purposes, e.g ng Jena api to String. The second triple associates this parameter with create a new query vs read a query from a file. In http://www.w3.org/tr/rdf-primer semantic search, however, these limitations are completely 3http://lucene.apache.org/ dealt with since the semantics of each of the types in si

are serialized using RDF 2 . RDF is web-based language suitable for describing resources and provides an extensible data model for representing machine-processable semantics of data. For each application framework parsed, we thus generate an RDF ontology that represents the instantiated knowledge base for the framework at hand. This knowledge base is managed by Jena [8], an open source Java frame￾work for building Semantic Web applications. The process of generating semantic instances for the concepts and relations specified in SCRO is completely au￾tomatic. However, the process of annotating components according to COMPRE’s object properties is currently man￾ual as it is the case for semantic annotations in general. Our tool though provides means for inserting these annota￾tions directly into the knowledge-base, thus gradually build￾ing semantic descriptions for a particular API that can be shared, evolved, and reused by a community of users. On the other hand, metatdata modelled by COMPRE’s datatype properties is generated automatically via direct parsing of the source-code. We thus capture and normalize method signatures, identifier names, source-code comments, and available Java annotations in order to obtain a meaningful keyword descriptions of components. These descriptions are lexically analyzed, stored, and indexed using the to￾kenization and indexing mechanisms provided by Apache Lucene 3 , an open-source full-featured text search engine. In the next section, we show how the knowledge gener￾ated using this knowledge extractor sub-system can be used for component search. For an extended discussion of our ontologies, complete knowledge population samples, we re￾fer the reader to our ontologies website [1]. 3. Ontological search Listing 1 shows a partial RDF description obtained during the knowledge population phase for a Jena API method, the create method. This method belongs to the QueryFactory class and usually used to create a Query object given the specified input. This RDF description clearly captures the component’s metadata at the semantic and syntactic level. The underlying data structure of RDF is a labeled di￾rected graph. Each node-arc-node in this graph represents a triple that consists of three parts, subject, predicate and ob￾ject. Consider Listing 1 for example, the described method in this snippet, create[..], is always the subject, ontol￾ogy properties are predicates, and objects are either a re￾source, unlabeled node (blank node) or a literal value. For example, the first triple below uses a property from SCRO to assert that the method has an input parameter of type String. The second triple associates this parameter with 2http://www.w3.org/TR/rdf-primer 3http://lucene.apache.org/ few terms describing its purpose. The third triple, however, tags the same input parameter with a meaningful concept (QueryText) from the domain ontology. Thus, giving the parameter an agreed-upon and meaningful description other than terms or the semantically vague String type. 1. create[..] scro:hasInputType String 2. create[..] compre:hasInputTerms "query string" 3. create[..] compre:hasDomainInput [ a swonto:QueryText] @base PREFIX scro: PREFIX compre: PREFIX swonto: PREFIX dc: a scro:StaticMethod ; scro:hasInputType ; scro:hasInputType ; scro:hasInputType ; scro:hasOutputType ; scro:invokesMethod ; scro:hasSignature "create[String,String,Syntax]"; compre:describedBy [ a swonto:QueryCreation]; compre:hasDomainInput [ a swonto:QueryText]; compre:hasDomainInput [ a swonto:URI]; compre:hasDomainInput [ a swonto:QueryLanguageSy￾ntax]; compre:hasDomainOutput[ a swonto:ExtendedQuery]; compre:hasInputTerms "query string ..."; compre:hasInputTerms "base URI ..."; compre:hasInputTerms "query syntax URI ..."; compre:hasOutputTerms "query ..."; dc:description "create query ..."; .... Listing 1. RDF descriptor for an API method This multi-faceted description of components enables four different types of queries against the knowledge base: a) type or signature-based queries; b) metadata keyword queries; c) pure semantic-based queries; or d) blended queries of the previous three types. However, we focus the discussion on pure semantic￾based queries that rely on domain-specific knowledge. Primarily, search techniques that rely on variations of keyword-based search suffer from synonymity and poly￾semic ambiguity that often lead to low recall and precision. On the other hand, signature matching techniques cannot distinguish between components that have the same signa￾ture but serve different purposes, e.g. using Jena API to create a new query vs read a query from a file. In semantic search, however, these limitations are completely dealt with since the semantics of each of the types in sig- 3

natures are encoded and processed during search. Besides are used to create, read or even parse a semantic addressing knowledge representation effectively, semantic query. Nevertheless, this search mechanism is flexible since search offers extensible solutions to component retrieval. it allows a wide range of queries to run against the knowl- Since we are focusing on API usage and reuse, the de- edge base. In fact, the expressive power provided by our scriptions shown in Listing I capture the component's inter- ontologies allows users to express their queries in more de- face and its relationships with other components. However, tails than would otherwise be expressed with any altena these description can be easily extended to capture other tive method. For instance, assume that the user was able to facets(e.g, component's environment) via introducing ad- obtain a Query object as described in the previous exam- ditional ontological properties. ple. The next natural step is to find a component that can Reasoning is one of the primary added benefits in seman- take this query as input, execute it, and return the required tic search. In addition of classifying and checking the con- results. Browsing the Jena API looking for such a compo- sistency of our ontologies, a DL reasoner can also be used nent or even querying using typical keyword-based queries to inferring and thus enriching the knowledge base with ad- would not return an answer since there is an intermediate ditional knowledge that is not explicitly stated. Thus, play- query execution object that must be obtained to complete ing a vital role in improving search precision and recall in the task. This appears to be a dead end. However, using comparison with other search techniques. DL Subsumption semantic search, this request can be expressed fairly easily reasoning, for example, is typically used to establish sub- as follows set inclusion relationships between different concepts an the ontology. Consider the descriptions Query E compre: Component n Listing I for example, when pure semantic-based queries (compre: has DomainInput are used, users need only to provide domain concepts de- swonto: SemanticQuery) n scribing the components interface. Therefore, if the user Compre: has DomainOutput 3 compre provides SemanticQuery as a domain output of the re- has DomainOutput swonto: Result Set quested component, the method shown in the listing would This query expresses the fact that we are looking for a com- still match this request since SemanticQuery suDs ponent that takes a semantic query as inpu returns an- ExtendedQuery as specified in the subsumption hierar- oth chy of our domain ontology. Thus, automatically enablin other component that returns a query solution. Thus, query ing for multiple components at the sam an implicit form of query expansion It is notable that semantic-based retrieval alleviates many 3. 1. Implementation and ranking mechanisms problems typically faced by tools that rely on exact key word or type matching. One of the strengths of our ap- proach, however, is the ability to utilize our various ontolo- We have implemented this approach in a tool called gies in order to perform blended search against the knowl- CompRE, conveniently named after the main ontology in edge base. In particular, this is helpful when components in our modeL. CompRE is deployed as a plug-in for the the knowledge base are not completely annotated or when clipse Integrated Development Environment (IDE). Fi users are still in the process of becoming familiar with ure I shows a snapshot of CompREs main views in the the ontology. Consider for example a user who wishes Eclipse workbench. When loaded for the first time, Com- pRE processes the library code and the component ontology type(SemanticQuery)and one of the actual input types in order to generate the initial knowledge base as described ( Syntax) are known. Furthermore, since the user is not about the other input types, she wants to provide a fer ficient(it only took 4.5 seconds for par terms to filter out the results. This request can be expressed the Jena framework). CompRE also includes a module that using the following query in DL-like syntax: allow users to tag components with semantic references that corresponds to concepts from the domain ontology. These Query E compre: Component n dra (compre: hasDomain Output tured by the storage module, and stored automatically in swonto: SemanticQuery)n knowledge base Upon the conclusion of the knowledge Escro: hasInputType kb: Syntax)n population process, a knowledge repository is created and (compre: hasInputTerms value"base uri) becomes ready for answering user requests CompRe provides two separate views for formulating As expected, executing this query returns not only the queries. The first view is provided as a simple data entry method shown in Listing I but also other unrelated meth- form as shown in the figure. In each entry box, users need ods. It turns out that the input terms specified in the query to provide search restrictions that are either prefixed with are very popular and are used to describe API methods that an ontology name or provided as plain keywords enclosed

natures are encoded and processed during search. Besides addressing knowledge representation effectively, semantic search offers extensible solutions to component retrieval. Since we are focusing on API usage and reuse, the de￾scriptions shown in Listing 1 capture the component’s inter￾face and its relationships with other components. However, these description can be easily extended to capture other facets (e.g., component’s environment) via introducing ad￾ditional ontological properties. Reasoning is one of the primary added benefits in seman￾tic search. In addition of classifying and checking the con￾sistency of our ontologies, a DL reasoner can also be used to inferring and thus enriching the knowledge base with ad￾ditional knowledge that is not explicitly stated. Thus, play￾ing a vital role in improving search precision and recall in comparison with other search techniques. DL Subsumption reasoning, for example, is typically used to establish sub￾set inclusion relationships between different concepts and properties in the ontology. Consider the descriptions in Listing 1 for example, when pure semantic-based queries are used, users need only to provide domain concepts de￾scribing the component’s interface. Therefore, if the user provides SemanticQuery as a domain output of the re￾quested component, the method shown in the listing would still match this request since SemanticQuery subsumes ExtendedQuery as specified in the subsumption hierar￾chy of our domain ontology. Thus, automatically enabling an implicit form of query expansion. It is notable that semantic-based retrieval alleviates many problems typically faced by tools that rely on exact key￾word or type matching. One of the strengths of our ap￾proach, however, is the ability to utilize our various ontolo￾gies in order to perform blended search against the knowl￾edge base. In particular, this is helpful when components in the knowledge base are not completely annotated or when users are still in the process of becoming familiar with the ontology. Consider for example a user who wishes to find a component in which the component’s domain output type (SemanticQuery) and one of the actual input types (Syntax) are known. Furthermore, since the user is not sure about the other input types, she wants to provide a few terms to filter out the results. This request can be expressed using the following query in DL-like syntax: Query ≡ compre : Component u (∃compre : hasDomainOutput . swonto : SemanticQuery) u (∃scro : hasInputT ype . kb : Syntax) u (∃compre : hasInputT erms value 00base uri00) As expected, executing this query returns not only the method shown in Listing 1 but also other unrelated meth￾ods. It turns out that the input terms specified in the query are very popular and are used to describe API methods that are used to create, read or even parse a semantic query. Nevertheless, this search mechanism is flexible since it allows a wide range of queries to run against the knowl￾edge base. In fact, the expressive power provided by our ontologies allows users to express their queries in more de￾tails than would otherwise be expressed with any alterna￾tive method. For instance, assume that the user was able to obtain a Query object as described in the previous exam￾ple. The next natural step is to find a component that can take this query as input, execute it, and return the required results. Browsing the Jena API looking for such a compo￾nent or even querying using typical keyword-based queries would not return an answer since there is an intermediate query execution object that must be obtained to complete the task. This appears to be a dead end. However, using semantic search, this request can be expressed fairly easily as follows: Query ≡ compre : Component u (∃compre : hasDomainInput . swonto : SemanticQuery) u (∃compre : hasDomainOutput . ∃ compre : hasDomainOutput . swonto : ResultSet) This query expresses the fact that we are looking for a com￾ponent that takes a semantic query as input and returns an￾other component that returns a query solution. Thus, query￾ing for multiple components at the same time. 3.1. Implementation and ranking mechanisms We have implemented this approach in a tool called CompRE, conveniently named after the main ontology in our model. CompRE is deployed as a plug-in for the Eclipse Integrated Development Environment (IDE). Fig￾ure 1 shows a snapshot of CompRE’s main views in the Eclipse workbench. When loaded for the first time, Com￾pRE processes the library code and the component ontology in order to generate the initial knowledge base as described in Section 2.3. This process is completely automatic and ef- ficient (it only took 4.5 seconds for parsing and processing the Jena framework). CompRE also includes a module that allow users to tag components with semantic references that corresponds to concepts from the domain ontology. These annotations entered via drag and drop mechanisms, cap￾tured by the storage module, and stored automatically in the knowledge base. Upon the conclusion of the knowledge population process, a knowledge repository is created and becomes ready for answering user requests. CompRE provides two separate views for formulating queries. The first view is provided as a simple data entry form as shown in the figure. In each entry box, users need to provide search restrictions that are either prefixed with an ontology name or provided as plain keywords enclosed 4

Edit Source Refactor Navigate Search Project Run s CompRE Window Help COMPRE C Domain ontology 23 日 L thing o-C queryLanguage Syntax A 曰 Component L ClassType pu工⊥。工往 ss Retr1 eveselectec [ puB工1Resu⊥ LaSer execute 区 arrw-OurrwFer e EnerFaceType|‖(Tat( E Component Search 3S a界 ARQL Query 由@ AnnotationTyp Enter your search criteria.. +base +uri swonto SemanticQuery +C InstanceMetho Dependencies: Start Over ontrolstructure L Repetitionstrucy Figure 1. CompRE: showing the component ontology, domain ontology, and the main search view within quotes. As described in the previous section, using prove this initial ranking, we refine this initial order based the compre-kb prefix tells the system that this is in fact an on suitability measures that consider the current user con- actual API type specified in the knowledge base. However, text. We thus parse the code that is currently being de- the swonto prefix refers to a concept from the currently veloped and create a profile that includes all visible types active domain ontology. Since the domain ontology ca that are either declared by the programmer or inherited be different for each APl, its name is provided as an exter- the user's context. We further analyze each retrieved can- nal configuration parameter. Free-text requirements in each didates signature in terms of the new input types that this query may optionally utilize all fuzzy extensions supported candidate will introduce into the current context if selected by the lucene's query parser, thus allowing a full-featured by the user. Naturally, candidates that introduce more types keyword-based search. Once the form is filled, CompRE should be assigned a lower rank value. However, finding the collects the search requirements and automatically gener- newly introduced type in the context profile, will not count ate a query using the SPARQL 4 query language, it then against this candidates score. executes this query against the knowledge base. CompRE With the absence of keywords in user queries, we ap- also provides a query answering view for advanced users ply only context-based heuristics such that candidates with who wish to edit their own SPARQL queries directly, there- exact matches are put at the top of the list while other can- ore, gaining full control over various aspects of the compo- didates are ranked based on the number and type of their nent ontology. For instance, users may wish to specify that input and output types. For example, consider a user who is the desired component extends a particular compone trying to search for a component that requires two particular or perhaps usedBy a certain number of components as a input types, namely ll and 12. Assume that the repository measure of its popularity in the target library. Regardless contains three components, namely Cl, C2, and C3. Lets of the data entry mechanism used, CompRE executes the also assume that Cl is an exact match, C2 has only one in- query, ranks the retrieved instances, and presents the result put type(l1), and C3 requires three input types (ll, 12, and n a viewer that enables further exploration of each recom- 13). The system is then ranks Cl first, C2 is ranked second mended component and c3 is in fact the least desired since it will introduce a Ranking the retrieved candidates according to their rele- new type to the user context, it is thus included in the result ncy to the user needs saves time and efforts. While al lowing blended queries in our approach ensures flexibil- are simple, easy to implement, and work surprisingly well ity and robustness, it however complicates the ranking pro- cess. When blended or pure syntactic queries are submitted, 4. Experiments and results we initially rely on the traditional, however solidly proven, scoring mechanisms supported by Lucene. In order to im- Due to the lack of independent and standard benchmark test data, search tools evaluation is, to some degree, sub- 4http://www.w3.org/tr/rdf-sparQ-qUery jective. However, we designed our experi

Figure 1. CompRE: showing the component ontology, domain ontology, and the main search view within quotes. As described in the previous section, using the compre-kb prefix tells the system that this is in fact an actual API type specified in the knowledge base. However, the swonto prefix refers to a concept from the currently active domain ontology. Since the domain ontology can be different for each API, its name is provided as an exter￾nal configuration parameter. Free-text requirements in each query may optionally utilize all fuzzy extensions supported by the Lucene’s query parser, thus allowing a full-featured keyword-based search. Once the form is filled, CompRE collects the search requirements and automatically gener￾ate a query using the SPARQL 4 query language, it then executes this query against the knowledge base. CompRE also provides a query answering view for advanced users who wish to edit their own SPARQL queries directly, there￾fore, gaining full control over various aspects of the compo￾nent ontology. For instance, users may wish to specify that the desired component extends a particular component or perhaps usedBy a certain number of components as a measure of its popularity in the target library. Regardless of the data entry mechanism used, CompRE executes the query, ranks the retrieved instances, and presents the result in a viewer that enables further exploration of each recom￾mended component. Ranking the retrieved candidates according to their rele￾vancy to the user needs saves time and efforts. While al￾lowing blended queries in our approach ensures flexibil￾ity and robustness, it however complicates the ranking pro￾cess. When blended or pure syntactic queries are submitted, we initially rely on the traditional, however solidly proven, scoring mechanisms supported by Lucene. In order to im- 4http://www.w3.org/TR/rdf-sparql-query/ prove this initial ranking, we refine this initial order based on suitability measures that consider the current user con￾text. We thus parse the code that is currently being de￾veloped and create a profile that includes all visible types that are either declared by the programmer or inherited in the user’s context. We further analyze each retrieved can￾didate’s signature in terms of the new input types that this candidate will introduce into the current context if selected by the user. Naturally, candidates that introduce more types should be assigned a lower rank value. However, finding the newly introduced type in the context profile, will not count against this candidate’s score. With the absence of keywords in user queries, we ap￾ply only context-based heuristics such that candidates with exact matches are put at the top of the list while other can￾didates are ranked based on the number and type of their input and output types. For example, consider a user who is trying to search for a component that requires two particular input types, namely I1 and I2. Assume that the repository contains three components, namely C1, C2, and C3. Lets also assume that C1 is an exact match, C2 has only one in￾put type (I1), and C3 requires three input types (I1, I2, and I3). The system is then ranks C1 first, C2 is ranked second, and C3 is in fact the least desired since it will introduce a new type to the user context, it is thus included in the result set to improve recall, however, ranked last. These heuristics are simple, easy to implement, and work surprisingly well. 4. Experiments and results Due to the lack of independent and standard benchmark test data, search tools evaluation is, to some degree, sub￾jective. However, we designed our experiments such that it 5

increases our confidence level of a fair evaluation. We have required Jena method, makeRewindable, was not com- selected the Jena framework for testing CompRE, the do- pletely annotated with the proper return type. Thus, search main ontology described in Section 2. 1 fits naturally in the produced spurious components since only the input type Jenas application domain. was used during search. However, in most cases when proper tags exist, semantic search can precisely describe 4. 1. Experiment: component search the needed component and improve overall precision val ues as seen in Figure 2. Metadata keyword-based search This experiment is designed to reveal the overlap be- performs poorly due to the two well-known fundamental is- tween various search methods that are supported by Com- sues of polysemy and synonymy. These two problems be- pRE. The fundamental guiding hypothesis we test in this come even more evident when searching for software com- experiment is that pure semantic-based representation and ponents. This is due in part to inconsistent and often in- annotation of library components improve search precision complete API descriptions of library code. Nevertheless when compared with other techniques. Precision is defined keyword search tends to return an exact match when a par- as the ratio of the number of relevant component instances ticular keyword is used to describe only a single component that are recommended by the tool to the total number of in the library(e.g, clone in Q4) ommended instances. Recall is the other commonly used Signature based queries tend to yield low precision in metric in evaluating search systems, it is defined as the ra- cases where the component signature includes one or more tio of the number of relevant component instances that are semantically vague types such as the Java String type. The recommended to the total number of relevant components best example to illustrate this notion is Q3. This task re in the repository. However, in these experiments we fix re- quires accessing a query service over a Http cOnneCtio ll since we are searching for distinct components, 1.e, the therefore, one needs to provide, among other things, the ent we are searching for is either four ind or not found We have selected twelve programming tasks, six of these URL of this service and the text representation of the query tasks were carefully designed by us and the remaining tasks both of which are specified as String objects. Unless there were collected from the Jena developers forum. Each of a clear semantic descriptions for such parameters, the these tasks requires a query to be fired in order to search for matchmaking process would return many false positives a conoco diverse enough and cover various aspects of the prisingly well. We believe that it is often the case that the onent that is required to complete the task. These (low precision). Blended search, however, performed sur- tasks problem domain. For space limitations, we do not include user is certain about a single API type that is used in the these tasks here, rather an extended discussion of the tasks component 's signature or a certain keyword that precisely d results can be found online 1 describes some aspect of the component. These descrip- We then prepared the necessary coding environment and tions can also be coupled with semantic annotations to pro- formulated four search queries for each task, i.e, one query duce higher precision. These results indicate that blended ch kind of search supported by CompRE. Precision search needs to be formally investigated in more details ummery graph for these queries is shown in Fig- Ranking and running time analysis have been computed as well. On average, semantic search achieved 1.75 rank over twelve queries, 1.e., the desired component was ranked either at the first or at most the second position. This ranking score is relatively comparable with other search 590史 schemes in which they achieved 2.27, 1.8, and 1.66 for key- word, signature, and blended search, respectively. However, the average observed response times were 15.5,9, 2.5, for semantic, blended, and signature and keyword searches, re- spectively. All experiments were performed on a Windows Query Number XP machine with 1. 8GHZ Intel processor and IGB mem ory. This relatively lower performance for semantic queries -Semantic Keyword -Signature - Blended is a result of having our queries run through the reasoner Figure 2. Precision graph for Jena queries In general, search time for all search methods is suscepti ble to increase as the size of the knowledge base increases in the case of semantic search, speed is continuously im As expected, semantic search tends to perform poorly hen the components in the knowledge base are incom proving as reasoners evolve. Achieving perfection in com- ponent search is near impossible, however, we believe tha pletely or incorrectly annotated. In Q11 for example, CompRe's internal mechanisms proved effective, and in the http://tech.groupsyahoocom/group/jena-dev/ majority of cases, show a clear support for our hypothe

increases our confidence level of a fair evaluation. We have selected the Jena framework for testing CompRE, the do￾main ontology described in Section 2.1 fits naturally in the Jena’s application domain. 4.1. Experiment: component search This experiment is designed to reveal the overlap be￾tween various search methods that are supported by Com￾pRE. The fundamental guiding hypothesis we test in this experiment is that pure semantic-based representation and annotation of library components improve search precision when compared with other techniques. Precision is defined as the ratio of the number of relevant component instances that are recommended by the tool to the total number of rec￾ommended instances. Recall is the other commonly used metric in evaluating search systems, it is defined as the ra￾tio of the number of relevant component instances that are recommended to the total number of relevant components in the repository. However, in these experiments we fix re￾call since we are searching for distinct components, i.e, the component we are searching for is either found or not found. We have selected twelve programming tasks, six of these tasks were carefully designed by us and the remaining tasks were collected from the Jena developers forum5 . Each of these tasks requires a query to be fired in order to search for a component that is required to complete the task. These tasks are diverse enough and cover various aspects of the problem domain. For space limitations, we do not include these tasks here, rather, an extended discussion of the tasks and results can be found online [1]. We then prepared the necessary coding environment and formulated four search queries for each task, i.e, one query for each kind of search supported by CompRE. Precision summery graph for running these queries is shown in Fig￾ure 2. Figure 2. Precision graph for Jena queries As expected, semantic search tends to perform poorly when the components in the knowledge base are incom￾pletely or incorrectly annotated. In Q11 for example, a 5http://tech.groups.yahoo.com/group/jena-dev/ required Jena method, makeRewindable, was not com￾pletely annotated with the proper return type. Thus, search produced spurious components since only the input type was used during search. However, in most cases when proper tags exist, semantic search can precisely describe the needed component and improve overall precision val￾ues as seen in Figure 2. Metadata keyword-based search performs poorly due to the two well-known fundamental is￾sues of polysemy and synonymy. These two problems be￾come even more evident when searching for software com￾ponents. This is due in part to inconsistent and often in￾complete API descriptions of library code. Nevertheless, keyword search tends to return an exact match when a par￾ticular keyword is used to describe only a single component in the library (e.g., clone in Q4). Signature based queries tend to yield low precision in cases where the component signature includes one or more semantically vague types such as the Java String type. The best example to illustrate this notion is Q3. This task re￾quires accessing a query service over a HTTP connection, therefore, one needs to provide, among other things, the URL of this service and the text representation of the query; both of which are specified as String objects. Unless there is a clear semantic descriptions for such parameters, the matchmaking process would return many false positives (low precision). Blended search, however, performed sur￾prisingly well. We believe that it is often the case that the user is certain about a single API type that is used in the component’s signature or a certain keyword that precisely describes some aspect of the component. These descrip￾tions can also be coupled with semantic annotations to pro￾duce higher precision. These results indicate that blended search needs to be formally investigated in more details. Ranking and running time analysis have been computed as well. On average, semantic search achieved 1.75 rank over twelve queries, i.e., the desired component was ranked either at the first or at most the second position. This ranking score is relatively comparable with other search schemes in which they achieved 2.27, 1.8, and 1.66 for key￾word, signature, and blended search, respectively. However, the average observed response times were 15.5, 9, 2.5, for semantic, blended, and signature and keyword searches, re￾spectively. All experiments were performed on a Windows XP machine with 1.8GHZ Intel processor and 1GB mem￾ory. This relatively lower performance for semantic queries is a result of having our queries run through the reasoner. In general, search time for all search methods is suscepti￾ble to increase as the size of the knowledge base increases; in the case of semantic search, speed is continuously im￾proving as reasoners evolve. Achieving perfection in com￾ponent search is near impossible, however, we believe that CompRe’s internal mechanisms proved effective, and in the majority of cases, show a clear support for our hypothesis. 6

4.2. User study cise queries. This is confirmed by the responses we ob- tained upon the conclusion of the experiment, four out of This experiment is designed to asses the usability of se- six students indicated that there is a small initial learning mantic search and to understand the possible difficulties curve that was reduced fairly quickly as they became more faced by end users in learning and using domain ontologies comfortable with the API vocabulary represented in the do- for successfully completing a particular search task against main ontology. Since the last task has to be done without an unfamiliar apl CompRE's assistance, most students argued that this task Six graduate-level MIS students from Northwestem Uni- could have been completed faster had CompRE's assistance versity have voluntarily agreed to participate in this study been allowed. A domain ontology provide a concise de- Although all students have at least seven months of Java ex- scription of API content and vocabulary. This knowledge perience and a good working knowledge of Semantic Web can be used also to successfully finishing a coding task that technologies, no student has been directly exposed to the may require more than one component. Consider task T2 Jena apl. we delivered a one-hour tutorial that includes a for example, this task requires instantiating an intermediate brief introduction to the Jena apl. brief introduction to the object of type QueryExecution, we suspect that the co- domain ontology, and a sample training task that explains herent representation of ontology concepts and axioms aid CompRE's semantic search features. We then charged each users in arriving at such conclusions during the initial time student with other four independent Jena API programming Invested in understanding and learning the taxonomy tasks that vary in scope: T1) data set creation and handling One may conclude that completing the last task(alterna of multiple RDF gtaphs; T2)query construction and execu- tive task)should be relatively easier. After all,students have tion over a given ontology model; T3)result manipulation been using the same API, thus, the knowledge gained about of query solu this API after completing the first three tasks can be helpful services. On average, two distinct components are needed However, when examining the results, there is no dramatic to successfully complete a given task. An environment is improvement in response time for using alternative meth setup for each task with a skeleton code, each student is ods. We suspect that these alternative methods (e.g,ex then asked to finish three consecutive tasks using only Com- ploring documentation, searching for code in the Web, etc. PRE's pure semantic search while the last task in the se- do not provide a systematic and focused learning experi- quence must be completed using other alternative methods ence for programmers. In this study however, we did not of students choice intend to make a systematic comparison between semantic Table I shows task completion time measured from the search mechanisms with normal search practices used by time in which the task is presented to the student until the fi- programmers. However, the obtained results clearly sup- nal correct answer is submitted. Numbers appeared in bold- port our hypothesis and show that semantic search, in most face represents tasks completed using semantic search while cases, yield better API learning experience and can certainly other numbers are underlined increase programmers productivity Overall, students provided positive comments about CompRE and semantic search. Two students indicated that Table 1. User study statistics the SParQl query view was indeed helpful and used in for- Time(Minutes) mulating more complex queries. However, these students requested a thorough integration of the domain ontology T2 T3 including its object properties into the CompRE's domain ontology view. Only one student reported a relative diffi- 3522 0123 culty adapting to a new search approach after being familiar 3240 with other alternative methods. This user also requested a Tooltip feature such that when the user hovers over an on- 161840 ology concept in the domain ontology view, a hover box Avg(Seman appears with class description. Based on this sound and helpful feedback, we are currently adding new features as The most significant conclusion we can draw from these expressive. lumbers is the correlation between the time taken by stu- dents to complete the first semantic task and the last 5. Related work in the sequence. In most cases, there was a significant re- duction in time as students became more familiar with the Due to the benefits acquired by systematic reuse, many domain ontology and thus more able to construct more pre- researchers have proposed solutions and tackled the reuse 7

4.2. User study This experiment is designed to asses the usability of se￾mantic search and to understand the possible difficulties faced by end users in learning and using domain ontologies for successfully completing a particular search task against an unfamiliar API. Six graduate-level MIS students from Northwestern Uni￾versity have voluntarily agreed to participate in this study. Although all students have at least seven months of Java ex￾perience and a good working knowledge of Semantic Web technologies, no student has been directly exposed to the Jena API. We delivered a one-hour tutorial that includes a brief introduction to the Jena API, brief introduction to the domain ontology, and a sample training task that explains CompRE’s semantic search features. We then charged each student with other four independent Jena API programming tasks that vary in scope: T1) data set creation and handling of multiple RDF gtaphs; T2) query construction and execu￾tion over a given ontology model; T3) result manipulation of query solutions; and T4) access and treatment of remote services. On average, two distinct components are needed to successfully complete a given task. An environment is setup for each task with a skeleton code, each student is then asked to finish three consecutive tasks using only Com￾pRE’s pure semantic search while the last task in the se￾quence must be completed using other alternative methods of student’s choice. Table 1 shows task completion time measured from the time in which the task is presented to the student until the fi- nal correct answer is submitted. Numbers appeared in bold￾face represents tasks completed using semantic search while other numbers are underlined. Table 1. User study statistics Time (Minutes) T1 T2 T3 T4 S1 35 22 27 13 S2 33 35 20 11 S3 45 43 31 33 S4 32 40 22 15 S5 52 12 43 8 S6 21 16 18 40 Avg (Semantic) 37.5 21.25 23.6 16 Avg (Alternative) 34 41.5 43 40 The most significant conclusion we can draw from these numbers is the correlation between the time taken by stu￾dents to complete the first semantic task and the last one in the sequence. In most cases, there was a significant re￾duction in time as students became more familiar with the domain ontology and thus more able to construct more pre￾cise queries. This is confirmed by the responses we ob￾tained upon the conclusion of the experiment, four out of six students indicated that there is a small initial learning curve that was reduced fairly quickly as they became more comfortable with the API vocabulary represented in the do￾main ontology. Since the last task has to be done without CompRE’s assistance, most students argued that this task could have been completed faster had CompRE’s assistance been allowed. A domain ontology provide a concise de￾scription of API content and vocabulary. This knowledge can be used also to successfully finishing a coding task that may require more than one component. Consider task T2 for example, this task requires instantiating an intermediate object of type QueryExecution, we suspect that the co￾herent representation of ontology concepts and axioms aid users in arriving at such conclusions during the initial time invested in understanding and learning the taxonomy. One may conclude that completing the last task (alterna￾tive task) should be relatively easier. After all, students have been using the same API, thus, the knowledge gained about this API after completing the first three tasks can be helpful. However, when examining the results, there is no dramatic improvement in response time for using alternative meth￾ods. We suspect that these alternative methods (e.g., ex￾ploring documentation, searching for code in the Web, etc.) do not provide a systematic and focused learning experi￾ence for programmers. In this study however, we did not intend to make a systematic comparison between semantic search mechanisms with normal search practices used by programmers. However, the obtained results clearly sup￾port our hypothesis and show that semantic search, in most cases, yield better API learning experience and can certainly increase programmer’s productivity. Overall, students provided positive comments about CompRE and semantic search. Two students indicated that the SPARQL query view was indeed helpful and used in for￾mulating more complex queries. However, these students requested a thorough integration of the domain ontology including its object properties into the CompRE’s domain ontology view. Only one student reported a relative diffi- culty adapting to a new search approach after being familiar with other alternative methods. This user also requested a Tooltip feature such that when the user hovers over an on￾tology concept in the domain ontology view, a hover box appears with class description. Based on this sound and helpful feedback, we are currently adding new features as well as modifying CompRE’s interface so it becomes more expressive. 5. Related work Due to the benefits acquired by systematic reuse, many researchers have proposed solutions and tackled the reuse 7

problem from various perspectives. Many approaches utilizes domain knowledge not only usable and achievable, (e.g.3]employ traditional knowledge representation and but also improves precision of search results. Our results variations of signature matching or keyword-based re- also showed that blended search has a great potential, we are trieval. Similar to CompRE, other tools(e. g, [1l)) leverage currently conducting more case studies to asses the value software understanding by being embedded in the develop- of blended search. There are also two other future work ment environment. However, unlike CompRE, these tools directions. Firstly, ranking reused candidates has always rely on a local repository of sample client code to search for been a challenge, therefore, we are currently investigating components. Code Broker [11] for example, use a combina low could ranking be improved using semantic technolo- tion of free-text and signature matching techniques. In or- gies. Secondly, we have not yet investigated how could one der to retrieve appropriate matches, the user must write high motivate library providers to ship domain ontologies with quality doc comments that precisely describe functionality heir software, or how could individually created ontologies If the user comments did not retrieve satisfactory results, the be shared by a community of users system considers the signature of the method immediately following the comments. Finding a well documented code References to populate the repository with is highly unlikely, especially in open-source and legacy software Other component retrieval approaches(e.g, [6], [7Dap- [1] A. Alnusair and T. Zhao. Ontology models, framework ontolgies, and CompRE evaluation ply automated testing techniques to analyze a corpus of http://www.cs.uwmedu/-a client code harvested from the Web. Code Conjurer[6]for [2] B. Antunes, P. Gomez, and NSeco example, helps agile development users in finding suitable system based on the semantic web components on the basis of unit test cases. Therefore, users International Workshop on Semantic Web Enabled Software of the system has to write such test cases in order to invoke Engineering(SWESE), 2007. the system. Once invoked, the system contacts a remote 3 S. Bajracharya, O. Ossher, and C. Lopes. Sourcerer: An server that finds suitable candidates based on the compo- internet-scale software repository. In First International nents interface specified in the test case Work shop on Search-driven Development: Users, Infras Other semantic-based approaches have also been pro- tructure, Tools and Evaluation (SUITE09), 2009 [4 F A Durao, T.A. Vanderlei, E S. Almeida, and S.R. meira osed. However, the full potential of utilizing domain Applying a semantic layer in a source code search tool. In e was not e xplored. Sugumaran and Storey[10] Proceedings of the 23rd ACM Symposium on Applied Com proposed an approach that utilizes domain models; a do- puting, pages 1151-1157, Fortaleza Ceara, Brazil, 2008 main ontology is used mainly for term disambiguate B. Frakes and K. Kang. Software reuse research: Status basic query refinement for keyword-based queries; and future. IEEE Transactions on Software Engineering keywords are then mapped against the ontology to 31(7):529-536,July2005. hat correct terms are being used. However, no semantic- [6] O. Hummel,, W Janjic, and C. Atkinson. Code Conj based descriptions of components have been used. Other Pulling reusable software out of thin air. IEEE Softu proposals([2] and [4))employ ontologies to addressing the 5(5:45-52,2008 [7 O. Hummel and C. Atkinson. Extreme Harvesting: Test knowledge representation problem found in previous ap- proaches. In [4], software assets are classified into domain driven discovery and reuse of software components. In Pro- eedings of the IEEE International Conference on Inform categories(/O, GUl, Security, etc. )and indexed with a do- tion Reuse and integration(IRI'04), 2004 main field as well as other bookkeeping fields to facilitate [81 B. McBride. Jena: a semantic web toolkit. IEEE Internet free text search. Although the SRS [2] proposal uses the Computing,6(6:55-59,2002 ame indexing mechanism, it maintains two [9] F N. Noy and D. L. McGuinness. Ontology development tologies; an ontology for describing software assets as we 101: A guide to creating your first ontology. Stanford as a domain ontology for classifying these assets. Howeve Knowledge System Technical Report KSL-01-05, 2001 the structure of the source code assets and the semantic re- [10] V Sugumaran and V.C. Storey. A semantic-based approach lationships between those assets via axioms and role restric to component retrieval. ACM SIGMIS DATABASE, 24.2003 tions were not fully utilized [11] Y. Ye and G. Fischer. Reuse-conductive development envi- ronments. The international Journal of Automated Software 6. Conclusions and future work engineering,12(2):199-235,2005 [12] Y. Zhao, J. Dong, and T Peng. Ontology classification for semantic web based software engineering. IEEE Transa We proposed an approach for component reuse In addi- tions on Services Computing, 2(4): 303-317, 2009 tion to supporting pure semantic-based search, our approach lso supports other kinds of search techniques. However, our studies showed evidence that pure semantic search that

problem from various perspectives. Many approaches (e.g., [3]) employ traditional knowledge representation and variations of signature matching or keyword-based re￾trieval. Similar to CompRE, other tools (e.g., [11]) leverage software understanding by being embedded in the develop￾ment environment. However, unlike CompRE, these tools rely on a local repository of sample client code to search for components. CodeBroker [11] for example, use a combina￾tion of free-text and signature matching techniques. In or￾der to retrieve appropriate matches, the user must write high quality doc comments that precisely describe functionality. If the user comments did not retrieve satisfactory results, the system considers the signature of the method immediately following the comments. Finding a well documented code to populate the repository with is highly unlikely, especially in open-source and legacy software. Other component retrieval approaches (e.g., [6], [7]) ap￾ply automated testing techniques to analyze a corpus of client code harvested from the Web. Code Conjurer[6] for example, helps agile development users in finding suitable components on the basis of unit test cases. Therefore, users of the system has to write such test cases in order to invoke the system. Once invoked, the system contacts a remote server that finds suitable candidates based on the compo￾nent’s interface specified in the test case. Other semantic-based approaches have also been pro￾posed. However, the full potential of utilizing domain knowledge was not explored. Sugumaran and Storey[10] proposed an approach that utilizes domain models; a do￾main ontology is used mainly for term disambiguation and basic query refinement for keyword-based queries; these keywords are then mapped against the ontology to ensure that correct terms are being used. However, no semantic￾based descriptions of components have been used. Other proposals ([2] and [4]) employ ontologies to addressing the knowledge representation problem found in previous ap￾proaches. In [4], software assets are classified into domain categories (I/O, GUI, Security, etc.) and indexed with a do￾main field as well as other bookkeeping fields to facilitate free text search. Although the SRS [2] proposal uses the same indexing mechanism, it maintains two separate on￾tologies; an ontology for describing software assets as well as a domain ontology for classifying these assets. However, the structure of the source code assets and the semantic re￾lationships between those assets via axioms and role restric￾tions were not fully utilized. 6. Conclusions and future work We proposed an approach for component reuse. In addi￾tion to supporting pure semantic-based search, our approach also supports other kinds of search techniques. However, our studies showed evidence that pure semantic search that utilizes domain knowledge not only usable and achievable, but also improves precision of search results. Our results also showed that blended search has a great potential, we are currently conducting more case studies to asses the value of blended search. There are also two other future work directions. Firstly, ranking reused candidates has always been a challenge, therefore, we are currently investigating how could ranking be improved using semantic technolo￾gies. Secondly, we have not yet investigated how could one motivate library providers to ship domain ontologies with their software, or how could individually created ontologies be shared by a community of users. References [1] A. Alnusair and T. Zhao. Ontology models, framework ontolgies, and CompRE evaluation. Available online at: http://www.cs.uwm.edu/˜alnusair/compre. [2] B. Antunes, P. Gomez, and N. Seco. SRS: A software reuse system based on the semantic web. Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE), 2007. [3] S. Bajracharya, O. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In First International Workshop on Search-driven Development: Users, Infras￾tructure, Tools and Evaluation (SUITE’09), 2009. [4] F. A. Durao, T. A. Vanderlei, E. S. Almeida, and S. R. Meira. Applying a semantic layer in a source code search tool. In Proceedings of the 23rd ACM Symposium on Applied Com￾puting, pages 1151–1157, Fortaleza Ceara, Brazil, 2008. [5] W. B. Frakes and K. Kang. Software reuse research: Status and future. IEEE Transactions on Software Engineering, 31(7):529–536, July 2005. [6] O. Hummel, , W. Janjic, and C. Atkinson. Code Conjurer: Pulling reusable software out of thin air. IEEE Software, 25(5):45–52, 2008. [7] O. Hummel and C. Atkinson. Extreme Harvesting: Test driven discovery and reuse of software components. In Pro￾ceedings of the IEEE International Conference on Informa￾tion Reuse and Integration (IRI’04), 2004. [8] B. McBride. Jena: a semantic web toolkit. IEEE Internet Computing, 6(6):55–59, 2002. [9] F. N. Noy and D. L. McGuinness. Ontology development 101: A guide to creating your first ontology. Stanford Knowledge System Technical Report KSL-01-05, 2001. [10] V. Sugumaran and V. C. Storey. A semantic-based approach to component retrieval. ACM SIGMIS DATABASE, 34(3):8– 24, 2003. [11] Y. Ye and G. Fischer. Reuse-conductive development envi￾ronments. The International Journal of Automated Software Engineering, 12(2):199–235, 2005. [12] Y. Zhao, J. Dong, and T. Peng. Ontology classification for semantic web based software engineering. IEEE Transac￾tions on Services Computing, 2(4):303–317, 2009. 8

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
已到末页,全文结束
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有