当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《Managing XML and Semistructured Data》教学资源(PPT课件讲稿)Part 04 Compressing XML Data

资源类别:文库,文档格式:PPT,文档页数:115,文件大小:2.24MB,团购合买
点击下载完整版文档(PPT)

Managing XML and Semistructured data Part 4: Compressing XMl data

1 Part 4: Compressing XML Data Managing XML and Semistructured Data

In this section XML Compression Motivation The State-of-the-Art Queriable compressors a Non-queriable compressors Resources XMILL: An Efficient Compressor for XML Data by liefke and Suciu in Sigmod20ol Others: XGrind, XPress, XQuec, XMLzip ■ⅩCQ: From my publications XOZip: From my publications MOX: From my publications

2 In this section ▪ XML Compression • Motivation • The State-of-the-Art ▪ Queriable compressors ▪ Non-queriable compressors Resources ▪ XMILL: An Efficient Compressor for XML Data by Liefke and Suciu, in SIGMOD'2001 ▪ Others: XGrind, XPress, XQuec, XMLzip, … ▪ XCQ: From my publications ▪ XQZip: From my publications ▪ MQX : From my publications

Introduction a More and more xml data is created Duplicate structures(tags, paths.) Data inflation: data in XML is much larger than raw data Compression: storage and data transfer General-purpose compressor( e.g. gzip) Characteristics of Xml data not utilized Ungueriable

3 Introduction ▪ More and more XML data is created • Duplicate structures (tags, paths …) • Data inflation: data in XML is much larger than raw data • Compression: storage and data transfer ▪ General-purpose compressor (e.g. gzip) • Characteristics of XML data not utilized • Unqueriable

Compression: The Problem XML for exchange(space or time But XML is verbose and inflated due to Duplicated tags and paths Users prefer application specific formats Eg Web Server Logs Is Xml doomed to fail Solution XML-specific compressor Non-queriable: XMill Queriable XQzip

4 Compression: The Problem ▪ XML for exchange (space or time) ▪ But XML is verbose and inflated due to • Duplicated tags and paths ▪ Users prefer application specific formats: • Eg. Web Server Logs ▪ Is XML doomed to fail ? ▪ Solution: XML-specific compressor • Non-queriable: XMill • Queriable: XQzip

XML-Specific Compressors Unqueriable Compression( e.g. XMill) Full-chunked data commonalities eliminated Very good compression ratio Queriable Compression(e.g XGrind, XPRESS Fine-grained: data commonalities ignored Inadequate compression ratio and time Support simple path queries with atomic predicate

5 XML-Specific Compressors ▪ Unqueriable Compression (e.g. XMill): • Full-chunked: data commonalities eliminated • Very good compression ratio ▪ Queriable Compression (e.g. XGrind, XPRESS): • Fine-grained: data commonalities ignored • Inadequate compression ratio and time • Support simple path queries with atomic predicate

Issues in XML Compression Compression ratios Compression time Query coverage. memory Usage ...(see my survey paper in wwwJ) Technologies Compression Compression Memory Usage Time Compression Used (compared(compared( for compression Scheme with Gzip) with Gzip) Used Consistently Constant Not Support SAX Better Slower 8 MB (default) Querying Compress (UNIX) Much At least two Roughly Huffman Exact-match, SAX times longer Constant Coding Prefix-match Xpath Axes Child Attribute XPRESS At least two Roughly uffman Coding, Exact-match, SAX Constant Approximated Prefix-match Xpath Axes Arithmetic Child and Encoding Descendant Attribute prohibitively Constant onge XMLZip Comparable Much Proportional Not Support DOM Querying Input Data Size porti tructure Better Longer Io Compression Querying Slightl Much Proportional Differential Not Support DOM DDT Better DTD Tree Querying Input Data Size Compression, Comparison of existing technologies

6 Issues in XML Compression ▪ Compression ratios, Compression time, Query Coverage, Memory Usage…(see my survey paper in WWWJ) Comparison of existing technologies

An Example: Web Server logs ASCll File 15.9 Mb (gzipped 1.6MB) 202.239.238.16get/http:/1.otext/html2001997/10/01-00:00:021-14478-i-http://www.netjp/moziLla/3.1lJaj( XML-ized apache web log inflates to 24.2 Mb gzipped 2. 1MB) 202. 239.238.16 Apacherequestline>get/http:/1.0 text/html 200 1997/10/01-00: 00: 02 4478 Kapachereferer>http://www.net.ip/ Mozilla/3. 1S[SjaS]S()

7 An Example:Web Server Logs 202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478|-|-|http://www.net.jp/|Mozilla/3.1[ja](I) 202.239.238.16 GET / HTTP/1.0 text/html 200 1997/10/01-00:00:02 4478 http://www.net.jp/ Mozilla/3.1$[$ja$]$(I) ASCII File 15.9 MB (gzipped 1.6MB): XML-ized apache web log inflates to 24.2 MB (gzipped 2.1MB):

XMill First specialized compressor for XML data SAX parser for parsing XML data Still using gzip as its underlying compressor Clever grouping of data into containers for compression Compress Xml via three basic techniques Compress the structure separately from the data Group the data values according to their types Apply semantic(specialized) compressors Downloadable www.cs.washington.edu/homes/suciu/xmill

8 XMill ▪ First specialized compressor for XML data • SAX parser for parsing XML data • Still using gzip as its underlying compressor • Clever grouping of data into containers for compression ▪ Compress XML via three basic techniques • Compress the structure separately from the data • Group the data values according to their types • Apply semantic (specialized) compressors: ▪ Downloadable: • www.cs.washington.edu/homes/suciu/XMILL

XMill Architecture nput file: XML Command line: Container Expressions P//apache: host=>IP apache: host>203.237.165. 15 pache: request 11 e>GET /images/logo.gif -P// apache: requeatliae=>set("GET "t) P!/ apache: useragent>mozilla/ 4.0 SAX-Parser :203:172.222351 GET ,diat/testzi1 Path Processor Sem Compressor 1 Sem Compressor 2... Sem Compressor k Main memory Structure container Data container 1 Data container 2 Data container k CB ED 12C1#3c2 A5 0E Mo=i11a/4,0[ea] CB AC 16 02 dit/te計t,〓iP Output file: compressed XMl Figure 4: Architecture of the Compressor

9 XMill Architecture:

How Xmill Works. Three ideas Compress the structure separately from the data gzip structure gzip Data 202.23923816 Get/htTp/1.0 text/html =1.75MB 200

10 How Xmill Works: Three Ideas . . . 202.239.238.16 GET / HTTP/1.0 text/html 200 … gzip Structure gzip Data + =1.75MB Compress the structure separately from the data:

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共115页,可试读30页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有