Internet DRAFT - draft-chen-ds-description

draft-chen-ds-description



Internet Working Group                                      YP. Chen
Internet-Draft                                                H. Xia

Intended status: Informational                              ZM. Wang
Expires: May 29, 2018                                        P. Yang
                                                            CW. Tang
       Shaanxi Key Laboratory of Network Data Intelligent Processing
                    Xi'an University of Posts and Telecommunications
                                                   November 25, 2017
INTERNET-DRAFT
A Unified Description Method for Data Service   
draft-chen-ds-description-00
   
Status of this Memo   
This Internet-Draft is submitted in full conformance with the 
provisions of BCP 78 and BCP 79.
   
Internet-Drafts are working documents of the Internet Engineering 
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
   
Internet-Drafts are draft documents valid for a maximum of six 
months and may be updated, replaced, or obsoleted by other documents 
at any time. It is inappropriate to use Internet-Drafts as reference 
material or to cite them other than as "work in progress."

This Internet-Draft will expire on May 29, 2018.

Copyright Notice   

Copyright (c) 2017 IETF Trust and the persons identified as the 
document authors. All rights reserved.
   
This document is subject to BCP 78 and the IETF Trust's Legal    
Provisions Relating to IETF Documents    
(http://trustee.ietf.org/license-info) in effect on the date of 
publication of this document. Please review these documents 
carefully, as they describe your rights and restrictions with respect 
to this document. Code Components extracted from this document must






Chen                    Expires May 29, 2018             [Page 1]
Internet-Draft    Data Service Unified Description     November 2017 

include Simplified BSD License text as described in Section 4.e of 
the Trust Legal Provisions and are provided without warranty as 
described in the Simplified BSD License.

Abstract   
The rapid development of Internet has driven more and more 
enterprises or individuals encapsulate operations on key data 
entities we call data service (DS). Due to the different fields 
between enterprise or individual, resulting in the description of 
data services appear semantic heterogeneity. In this paper, we 
propose a more principled approach to the problems of heterogeneous 
data service on the Web. We start with a data service description 
document pre-processing. Finally, we propose a unified description 
language model for data service, the Unified Description Language 
for Data Service (UDL4DS).

Table of Contents

1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . .  2
   1.1. Background  . . . . . . . . . . . . . . . . . . . . . . . 2 
2. Conventions Used in This Document  . . . . . . . . . . . . . . 3
3. Data Service Description  . . . . . . . . . . . . . . . . . .  4 
   3.1. Data Service Overview  . . . . . . . . . . . . . . . . .  4
   3.2. Data Service Preprocessing . . . . . . . . . . . . . . .  5
      3.2.1. Data Service Acquisition  . . . . . . . . . . . . .  6
      3.2.2. Feature Word Extraction for Data Service  . . . . .  6
   3.3. Data Service Classification  . . . . . . . . . . . . . .  7
      3.4. Data Service Description Language Design  . . . . . .  8
      3.4.1. Semantic Annotation of Data Service  . . . . . . . . 8
   3.5. Data Service Description Model  . . . . . . . . . . . . . 9
4. Security Considerations  . . . . . . . . . . . . . . . . . .  10
5. IANA Considerations  . . . . . . . . . . . . . . . . . . . .  10
6. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . .  10
7. References  . . . . . . . . . . . . . . . . . . . . . . . . . 10
   7.1. Normative References  . . . . . . . . . . . . . . . . .  10
   7.2. Informative References  . . . . . . . . . . . . . . . .  10
8. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . .  10
Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . .  11

1. Introduction
1.1.  Background
With the development of computer Internet and cloud computing, 
various forms of data information have generated. Due to these data 
service use different description standards and technology on the 
Web, there is no common data model and access method so that it is
Chen                  Expires May 29, 2018               [Page 2]
Internet-Draft    Data Service Unified Description     November 2017

difficult to realize the mutual sharing of heterogeneous data source 
information. In order to solve the above problems, a large number of 
heterogeneous data are published on the Internet in the form of 
services to provide data services for service users.

The essence of the data service is to use network service protocols 
and standards such as Hyper Text Transfer Protocol (HTTP), Web 
Services Description Language (WSDL), XML (Extensible Markup 
Language), SOAP (Simple Object Access Protocol), Universal 
Description Discovery and Integration (UDDI) to encapsulate 
heterogeneous data sources in the Internet by opening up an agent or 
interface access and providing data services for users. However, as 
data in various fields is continuously encapsulated as services, 
data services are becoming more and more frequent, leading to higher 
and higher requirements for data services. In the process of data 
service release and invocation, there are critical problems of data 
service description as following:

The existing promulgators of data service are from different 
industries or fields that cause the lack of a unified data 
standards and norms as a result of semantic heterogeneity 
description in the data service.

With the development of data services and the increasing 
complexity of demands requested by service consumers, a single 
service can not accurately and quickly satisfy the complex 
demands. It becomes an urgent problem about how to effectively 
integrate these data services to solve actual demands required by 
the customer.

The method of sorting and semantic annotation for data service is 
not good enough.
In this paper, we propose a data service description language model 
named UDL4DS based on XML Schema, including the classification of 
data services, the construction of domain ontology and semantic 
annotation to solve the semantic heterogeneity between data service 
in different fields. In addition, XML Schema description of the key 
elements of the language model was designed to form a common 
specification to achieve a unified description of data services.

2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
document are to be interpreted as described in RFC 2119 [RFC2119].
Chen                 Expires May 29, 2018                [Page 3]
Internet-Draft    Data Service Unified Description     November 2017

In this document, these words will appear with that interpretation 
only when in ALL CAPS. Lower case uses of these words are not to be 
interpreted as carrying significance described in RFC 2119.

3. Data Service Description
At present, the data service description is generally based on the 
XML specification, which describes the access interface and other 
information of data service. As the constant changes of needs 
required by users, the description of data service is changing from 
the grammatical level to the semantic level, which solves the 
problem that computers are difficult to understand for their 
semantic expression and provides the best data service more quickly 
and intelligently. However, due to the data service providers are 
from different industries which have their own standards for the 
services they publish, as a result of shared-nothing and 
interoperated-nothing with each other.

In this paper, through the division of data service and the solution 
of data service published in different fields, we propose a unified 
description language for data service (UDL4DS) based on the XML 
Schema specification. We complete the description of unity in two 
ways: on the one hand, we propose a new data service semantic 
annotation method based on the domain ontology library to solve the 
semantic heterogeneity between data services. On the other hand, 
design a unified description language model, which describes the 
data service according to the designed description language.

3.1. Data Service Overview
In different fields, the meaning of data service is very different. 
Manu MR and Richard Manning believe that data service layer applies 
SOA architecture and plays an important role in data integration. 
Carey M J believes that data service is a software service that 
provides a unified data model and various access operations to 
data resources. WS. Zhang believes that the data Service is an XML 
access interface that can access the database and return the Web 
Service of the XML format result set. Zhang Peng believes that data 
service only encapsulates data resources in the information system.
Before and after invocation, data service does not change the state 
of the outside world, and does not have the logic function of 
handling any business by itself. Following the principles of Web 
architecture[W3C.REC-webarch-20041215].

The data service directly encapsulates the data of the underlying 
data source and opens an access interface for the data service 
requester to invocate, thus the cost of updating and maintaining the
Chen                 Expires May 29, 2018                [Page 4]

Internet-Draft    Data Service Unified Description     November 2017
 
system will be reduced. In addition, it can facilitate the user to
easily discover and transparently access the data from data source. 
Therefore, data services are becoming more and more popular on the 
encapsulation of data.

3.2. Data Service Preprocessing
Data service exists in the form of XML specification on the Web. The 
service requester accesses the published data service by calling the 
open interface of the data service publisher. However, the data 
service publishers have different industries or fields, and the data
services perform semantic heterogeneity in service descriptions, 
resulting in data service requesters can not exactly and quickly 
access the best data service that satisfies their needs.

In order to discover and invoke the data service better, we 
implement the preprocessing of data service by analyzing the basic 
information described in the data service description document and 
extracting the attribute values of the key tags in the description 
document, we can obtain the feature word text that can represent the 
data service, classifying data services by feature word text, 
dividing the fields into which they belong, and providing keywords 
that can represent the data service, as shown in Figure 1, which 
illustrates the preprocessing of data service.

+----------------------------------+           
|             Web                  |           
|                                  |           
+----------------------------------+ <-------- 
|                                  |    |      
| WSDL Document for Data Service   |  Obtain   
|                                  |    |      
+----------------------------------+ <-------- 
| Feature Vector of WSDL Document  |   Extract 
|                                  |    |      
+----------------------------------+ <-------- 
|                                  |    |      
|    Domain Ontology Library       | Construct 
|                                  |    |      
+----------------------------------+ <-------- 
|                                  |    |      
|       Subset of Data Service     | Classify  
|      (Weather) (News)            |    |      
+----------------------------------+ <--------

              Figure 1: Data Service preprocessing

Chen                 Expires May 29, 2018                [Page 5]
Internet-Draft    Data Service Unified Description     November 2017
										
3.2.1. Data Service Acquisition
In this paper, we mainly study the data service described by 
WSDL. We find that the existing form of description document is WSDL, 
ASMX based on the manifestations of WSDL description document on 
the Web. We obtain these kinds of data services through the 
preparation of the crawler. First, we set a certain rule according 
to our own needs. second, we crawl on the Web to match the rules of 
document from a given URL. Finally, end crawl as the number of 
crawling documents reached the set threshold. Figure 2 shows the 
process of crawling.

      +------------------------+
      |           URL          |
      +------------------------+
      |                        |
      |     Regular Expression |
      |                        |
+---> +------------------------+ <--+
|     |       Extract Web link |    |
|     |                        |    |
|     +------------------------+    |
|     |                        |    +-----------+
|     |        Link queue      |    |    WSDL   |
|     |                        |    |Description|
|     +------------------------+    |Document   |
|     |                        |    |  (Match)  |
|     |       Lenght of link   |    +-----------+
|Less |                        |
+---> +------------------------+
      |          End           |
      |         (more)         |
      +------------------------+

               Figure 2: The process of Crawling

3.2.2. Feature Word Extraction for Data Service
Each data service corresponds to a WSDL description document that 
describes the basic information of the data service, such as "What 
does the data service do", "Where is the data service", and "how to 
invoke data service". In this paper, in order to better and easier 
to represent a data service, we extract some of the more 
representative tags in the data service description document as 
attributes of the document, such as (WSDL: service) describes the
Chen                 Expires May 29, 2018                [Page 6]

Internet-Draft    Data Service Unified Description     November 2017
 
name of data service, (WSDL: operation) describes what kind of 
functional information the data service can accomplish. For example, 
a data service "Weather Service" whose method name "Get Weather By 
IP" can clearly illustrate that the data service is a service that 
obtains the weather information of the city or region represented by 
the IP address through the IP address.

Each element in the WSDL description document represents a certain 
meaning. In order to extract the unique attribute representing the 
data service, the elements in the document need to be parsed. In 
this document, the content of the name attribute from the (WSDL: 
service) and (WSDL: operation) tags are extracted as the document's 
unique attribute value.

3.3. Data Service Classification
At present, the ontology construction generally consists of 
requirements analysis, information collection, terminology 
recognition, formal coding and assessment, as shown in Figure 3.4. 
There are many ontology libraries built by the above aspects, but 
considering the different fields and projects, the constructed 
ontology base not only considers the general process but also 
combines with the actual situation.

In order to construct a domain ontology suitable for this study, we
cluster the feature words of WSDL description document for obtained 
data service and construct Vector Space Mode (VSM) for all feature 
words, that is each WSDL description document feature word as a 
column to form a word - document matrix D, the document matrix D on 
behalf of N WSDL document, to facilitate the calculation of each 
feature word weight in any feature word document.
Based on the prototype model of domain ontology, the ontology was 
modeled by OWL ontology description language, the result of 
clustering the feature words of WSDL document using K-center 
algorithm, combine of domain information and the tool developed by 
Stanford University.

We implement the classification of data service based on domain 
ontology from three aspects. First, we parse the obtained WSDL 
description document of data service, extract the feature word 
document that represents the basic information of the data service, 
and construct the feature word vector according to the space vector 
model. Second, we use the WordNet to calculate the semantic distance 
between the feature word vector and the vector formed by the domain 
ontology. Finally, we select the appropriate dividing line to divide 
the document into its own field.
Chen                 Expires May 29, 2018                [Page 7]
Internet-Draft    Data Service Unified Description     November 2017
 
The extraction of feature words for data service and the 
construction of feature word space vector models, and will generate
a data service feature vector (SFV). In order to better calculate the
similarity between the feature word vector of data service and the 
domain, domain ontology can be generalize to a domain vector (DV). We
can divide the data service belongs to which field according to the
similarity between two vectors.

3.4. Data Service Description Language Design
In this section, we first improve the formula for calculating the 
similarity of feature words in the WSDL description document. Then, 
we present an approach of calculating the similarity based on domain 
ontology to complete the semantic processing of data service. On the 
basis pf semantic annotation, we propose a unified description 
language model of data service as well as complete the design of 
description language.

3.4.1. Semantic Annotation of Data Service
In order to describe the data service uniformly, it is necessary to 
solve the semantic difference between heterogeneous data services. 
In this paper, we propose a new semantic annotation method for data 
service which combines the domain ontology library constructed above. 
The problem of semantic differences between heterogeneous data 
services can be solved by semantic annotation for data service.

The idea of this method is as follows: Firstly, we extract feature 
word from WSDL description document of data service to form a feature
word set that represents the description document. Secondly, we 
cluster the feature word set by using K-center algorithm and construct
the domain ontology library by combining with the domain information. 
Finally, we calculate the weight of each feature word combining with 
the domain ontology, and the set of feature words and their weights 
are stored according to ontology space vector model VSM. The WSDL 
document containing these feature words Is associated with the 
corresponding feature word, thus the mapping between the data service 
description document and the domain ontology concept is formed.

Because ontology is a detailed description of the constraints of 
the related concepts, concept attributes and the concepts of various 
hierarchies in this field, semantic annotation of data services 
based on domain ontology can not only reflect the relationship 
between service description documents and semantic relevance of
Chen                 Expires May 29, 2018                 [Page 8]
Internet-Draft    Data Service Unified Description     November 2017
 
categories, as well as display the implicit semantic information of 
data service description documents. In this way, the data service 
description documents have a certain semantic relationship between 
them, so as to solve the problem of heterogeneous data services, 
provide more accurate and comprehensive data services, and lay down 
unified descriptions for implementing data services.

3.5. Data Service Description Model
At present, the data service description methods and standards 
published on the Web are different. In order to enable the sharing 
of heterogeneous service resources, it is necessary to solve the 
semantic heterogeneity between data service resources to make the 
data service resources to complete a unified semantic description in 
service description as well as automatically judge the service 
access mechanism in the implementation of service.

In this paper, we present a unified data service description 
language model (UDL4DS), Figure 3 illustrates the model of UDL4DS.

      +--> +-----------+ <--+-------------------------+
      |    | Execution |    |                         |
      |    |Information|    |Execute                  |
      |    +-----------+    | DSExecute   /DSExecute  |
+--+--+                     | JDSExecute  /JDSExecute |
|  |                        |                         |
|  |                        |/Execute                 |
|  |                        +-------------------------+
|DS|+----> +-----------+ <--+--------------------+
|  |       |   Basic   |    |BaseInfo            |
|  |       |Information|    | DSID   /DSID       |
|  |       +-----------+    | DSName   /DSName   |
|  |                        |                    |
+--+--+                     |/BaseInfo           |
      |                     +--------------------+
      |
      |
+-----+ +-----------+ <-+-------------------------------+
|       | Semantic  |   |Semantic                       |
|       |Information|   |ClassifyName   /ClassifyName   |
+-----> +-----------+   |ClassifyMethod /ClassifyMethod |
                        |ClassfyTime   /ClassfyTime     |
                        |Semantic                       |
                        +-------------------------------+

              Figure 3: UDL4DS Language Model

Chen                 Expires May 29, 2018                 [Page 9]
Internet-Draft    Data Service Unified Description     November 2017
 
4. Security Considerations
In this paper, we mainly focus on the unified description of the 
heterogeneous data service described in the existing WSDL. However, 
when considering the heterogeneous data sources such as text or 
webpage data and other forms of data services, the study is not 
comprehensive enough.

5. IANA Considerations
There are no IANA considerations related to this document.

6. Conclusions
This document proposes a unified description method for 
heterogeneous data service, which can make data service share to 
solve the complex needs of users. We start with a pre-processing of 
data service description document. Second, we propose a unified 
description language model for data service, the Unified Description 
Language for Data Service (UDL4DS). Finally, we implement 
description system of data service based on Web.

7. References

7.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 
Requirement Levels", BCP 14, RFC 2119, 
DOI 10.17487/RFC2119, March 1997,

7.2. Informative References

[W3C.REC-webarch-20041215]
  Jacobs, I. and N. Walsh, "Architecture of
the World Wide Web, Volume One", World Wide Web Consortium

Recommendation REC-webarch-20041215, December 2004.

8. Acknowledgments
Thanks for comments and suggestions provided by H. Wang.

This document was prepared using 2-Word-v2.0.template.dot.
Chen                 Expires May 29, 2018                [Page 10]
Internet-Draft    Data Service Unified Description     November 2017
 
Authors' Addresses
YP Chen
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
	
Email: CHENYP@XUPT.edu.cn

H Xia
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
	
Email: XIAHONG@XUPT.edu.cn

ZM Wang
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
	
Email: ZMWANG@XUPT.edu.cn

P Yang
Xi'an University of Posts and Telecommunications
China
	
Email: YANGPING@163.com

CW Tang
Xi'an University of Posts and Telecommunications
China
	
Email: 1316904833@qq.com
Chen                 Expires May 29, 2018               [Page 11]