本书出自资深教师之手,内容取自课堂上讲述的实际素材,适合作为计算机科学和工程专业学生的教材。本书概述了多媒体创作工具和分类学,包括图像、视频和音频的数据表示,颜色的重要属性,数据压缩,多媒体通信和检索以及在数字图书馆中基于内容的检索。
本书的主要特点
●介绍多媒体创作工具,例如,音乐序列发生器、图像和视频编辑器,XML和SMIL等流行语言,以及Director、Flash、VRML等程序。
●图形/图像/视频/音频数据表示,包括颜色模型、HDTV、MIDI和音频编码。
●数据、图像、视频和音频的压缩格式和标准,包括无损压缩和有损压缩。
●多媒体网络,考虑了QoS、VoIP、实时媒体点播和无线网络上的多媒体。
●数字图书馆中基于内容的检索。
本书有一个相关的网站http://www.cs.sfu.ca/mmbook,为教师和学生提供多媒体课程的完整素材和主题以及大量的资源。
作者简介:
Ze-NianLi教授现任加拿大温哥华的SimonFraser大学计算机学院的院长,他还是该校视觉和媒体实验室的主任。
MarkS.Drew现为SimonFraser大学计算机学院的副教授。
两人均已发表了80多篇关于多媒体及相关领域的论文。
Preface xvi
Multimedia Authoring and Data Representations 1
Introduction to Multimedia 3
1.1 What is Multimedia? 3
1.1.1 Components of Multimedia 3
1.1.2 Multimedia Research Topics and Projects 4
1.2 Multimedia and Hypermedia 5
1.2.1 History of Multimedia 5
1.2.2 Hypermedia and Multimedia 7
1.3 World Wide Web 8
1.3.1 History of the WWW 8
1.3.2 HyperText Transfer Protocol (HTTP) 9
1.3.3 HyperText Markup Language (HTML) 10
1.3.4 Extensible Markup Language (XML) 11
1.3.5 Synchronized Multimedia Integration Language (SMIL) 12
1.4 Overview of Multimedia Software Tools 14
1.4.1 Music Sequencing and Notation 14
1.4.2 Digital Audio 15
1.4.3 Graphics and Image Editing 15
1.4.4 Video Editing 15
1.4.5 Animation 16
1.4.6 Multimedia Authoring 17
1.5 Further Exploration 17
1.6 Exercises 18
1.7 References 19
Multimedia Authoring and Tools 20
2.1 Multimedia Authoring 20
2.1.1 Multimedia Authoring Metaphors 21
2.1.2 Multimedia Production 23
2.1.3 Multimedia Presentation 25
2.1.4 Automatic Authoring 33
2.2 Some Useful Editing and Authoring Tools 37
2.2.1 Adobe Premiere 37
2.2.2 Macromedia Director 40
2.2.3 Macromedia Flash 46
2.2.4 Dreamweaver 51
2.3 VRML 51
2.3.1 Overview 51
2.3.2 Animation and Interactions 54
2.3.3 VRML Specifics 54
2.4 Further Exploration 55
2.5 Exercises 56
2.6 References 59
3 Graphics and Image Data Representations 60
3.1 Graphics/Image Data Types 60
3.1.1 1-Bit Images 61
3.1.2 8-Bit Gray-Level Images 61
3.1.3 Image Data Types 64
3.1.4 24-Bit Color Images 64
3.1.5 8-Bit Color Images 65
3.1.6 Color Lookup Tables (LUTs) 67
3.2 Popular File Formats 71
3.2.1 GIF 71
3.2.2 JPEG 75
3.2.3 PNG 76
3.2.4 TIFF 77
3.2.5 EXIF 77
3.2.6 Graphics Animation Files 77
3.2.7 PS andPDF 78
3.2.8 Windows WMF 78
3.2.9 Windows BMP 78
3.2.10 Macintosh PAINT and PICT 78
3.2.11 X Windows PPM 79
3.3 Further Exploration 79
3.4 Exercises 79
3.5 References 81
4 Color in Image and Video 82
4.1 Color Science 82
4.1.1 Light and Spectra 82
4.1.2 Human Vision 84
4.1.3 Spectral Sensitivity of the Eye 84
4.1.4 Image Formation 85
4.1.5 Camera Systems 86
4.1.6 Gamma Correction 87
4.1.7 Color-Matching Functions 89
4.1.8 CIE Chromaticity Diagram 91
4.1.9 Color Monitor Specifications 94
4.1.10 Out-of-Gamut Colors 95
4.1.11 White-Point Correction 96
4.1.12 XYZ to RGB Transform 97
4.1.13 Transform with Gamma Correction 97
4.1.14 L*a*b* (CIELAB) Color Model 98
4.1.15 More Color-Coordinate Schemes 100
4.1.16 Munsell Color Naming System 100
4.2 Color Models in Images 100
4.2.1 RGB Color Model for CRT Displays 100
4.2.2 Subtractive Color: CMY Color Model 101
4.2.3 Transformation from RGB to CMY 101
4.2.4 Undercolor Removal: CMYK System 102
4.2.5 Printer Gamuts 102
4.3 Color Models in Video 104
4.3.t Video Color Transforms 104
4.3.2 YUV Color Model 104
4.3.3 YIQ Color Model 105
4.3.4 YCbCr Color Model 107
4.4 Further Exploration 107
4.5 Exercises 108
4.6 References 111
Fundamental Concepts in Video 112
5.1 Types of Video Signals 112
5.1.1 Component Video 112
5.1.2 Composite Video 113
5.1.3 S-Video 113
5.2 Analog Video 113
5.2.1 NTSCVideo 116
5.2.2 PAL Video 119
5.2.3 SECAMVideo 119
5.3 Digital Video 119
5.3.1 Chroma Subsampling 120
5.3.2 CCIR Standards for Digital Video 120
5.3.3 High Definition TV (HDTV) 122
5.4 Further Exploration 124
5.5 Exercises 124
5.6 References 125
Basics of Digital Audio 126
6.1 Digitization of Sound 126
6.1.1 What Is Sound? 126
6.1.2 Digitization 127
6.1.3 Nyquist Theorem 128
6.1.4 Signal-to-Noise Ratio (SNR) 131
6.1.5 Signal-to-Quantization-Noise Ratio (SQNR) 131
6.1.6 Linear and Nonlinear Quantization 133
6.1.7 Audio Filtering 136
6.1.8 Audio Quality versus Data Rate 136
6.1.9 Synthetic Sounds 137
6.2 MIDI: Musical Instrument Digital Interface 139
6.2.1 MIDI Overview 139
6.2.2 Hardware Aspects of MIDI 142
6.2.3 Structure of MIDI Messages 143
6.2.4 General MIDI 147
6.2.5 MIDI-to-WAV Conversion 147
6.3 Quantization and Transmission of Audio 147
6.3.1 Coding of Audio 147
6.3.2 Pulse Code Modulation 148
6.3.3 Differential Coding of Audio 150
6.3.4 Lossless Predictive Coding 151
6.3.5 DPCM 154
6.3.6 DM 157
6.3.7 ADPCM 158
6.4 Further Exploration 159
6.5 Exercises 160
6.6 References 163
II Multimedia Data Compression 165
7 Lossless Compression Algorithms 167
7.1 Introduction 167
7.2 Basics of Information Theory 168
7.3 Run-Length Coding 171
7.4 Variable-Length Coding (VLC) 171
7.4.1 Shannon-Fano Algorithm 171
7.4.2 Huffman Coding 173
7.4.3 Adaptive Huffman Coding 176
7.5 Dictionary-Based Coding 181
7.6 Arithmetic Coding 187
7.7 Lossless Image Compression 191
7.7.1 Differential Coding of Images 191
7.7.2 Lossless JPEG 193
7.8 Further Exploration 194
7.9 Exercises 195
7.10 References 197
Lossy Compression Algorithms 199
8.1 Introduction 199
8.2 Distortion Measures 199
8.3 The Rate-Distortion Theory 200
8.4 Quantization 200
8.4.1 Uniform Scalar Quantization 201
8.4.2 Nonuniform Scalar Quantization 204
8.4.3 Vector Quantization* 206
8.5 Transform Coding 207
8.5.1 Discrete Cosine Transform (DCT) 207
8.5.2 Karhunen-Loeve Transform* 220
8.6 Wavelet-Based Coding 222
8.6.1 Introduction 222
8.6.2 Continuous Wavelet Transform* 227
8.6.3 Discrete Wavelet Transform* 230
8.7 Wavelet Packets 240
8.8 Embedded Zerotree of Wavelet Coefficients 241
8.8.1 The Zerotree Data Structure 242
8.8.2 Successive Approximation Quantization 244
8.8.3 EZW Example 244
8.9 Set Partitioning in Hierarchical Trees (SPIHT) 247
8.10 Further Exploration 248
8.11 Exercises 249
8.12 References 252
Image Compression Standards 253
9.1 The JPEG Standard 253
9.1.1 Main Steps in JPEG Image Compression 253
9.1.2 JPEGModes 262
9.1.3 A Glance at the JPEG Bitstream 265
9.2 The JPEG2000 Standard 265
9.2.1 Main Steps of JPEG2000 Image Compression* 267
9.2.2 Adapting EBCOT to JPEG2000 275
9.2.3 Region-of-Interest Coding 275
9.2.4 Comparison of JPEG and JPEG2000 Performance 277
9.3 The JPEG-LS Standard 277
9.3.1 Prediction 280
9.3.2 Context Determination 281
9.3.3 Residual Coding 281
9.3.4 Near-Lossless Mode 281
9.4 Bilevel Image Compression Standards 282
9.4.1 The JBIG Standard 282
9.4.2 The JBIG2 Standard 282
9.5 Further Exploration 284
9.6 Exercises 285
9.7 References 287
10 Basic Video Compression Techniques 288
10.1 Introduction to Video Compression 288
10.2 Video Compression Based on Motion Compensation 288
10.3 Search for Motion Vectors 290
10.3.1 Sequential Search 290
10.3.2 2D Logarithmic Search 291
10.3.3 Hierarchical Search 293
10.4 H.261 295
10.4.1 Intra-Frame (I-Frame) Coding 297
10.4.2 Inter-Frame (P-Frame) Predictive Coding 297
10.4.3 Quantization in H.261 297
10.4.4 H.261 Encoder and Decoder 298
10.4.5 A Glance at the H.261 Video Bitstream Syntax 301
10.5 H.263 303
10.5.1 Motion Compensation in H.263 304
10.5.2 Optional H.263 Coding Modes 305
10.5.3 H.263+ and H.263++ 307
10.6 Further Exploration 308
10.7 Exercises 309
10.8 References 310
11 MPEG Video Coding I -- MPEG-1 and 2 312
11.1 Overview 312
11.2 MPEG-1 312
11.2.1 Motion Compensation in MPEG-1 313
11.2.2 Other Major Differences from H.261 315
11.2.3 MPEG-1 Video Bitstream 318
11.3 MPEG-2 319
11.3.1 Supporting Interlaced Video 320
11.3.2 MPEG-2 Scalabilities 323
11.3.3 Other Major Differences from MPEG-1 329
11.4 Further Exploration 330
11.5 Exercises 330
11.6 References 331
12 MPEG Video Coding II m MPEG-4, 7, and Beyond 332
12.1 Overview of MPEG-4 332
12.2 Object-Based Visual Coding in MPEG-4 335
12.2.1 VOP-Based Coding vs. Frame-Based Coding 335
12.2.2 Motion Compensation 337
12.2.3 Texture Coding 341
12.2.4 Shape Coding 343
12.2.5 Static Texture Coding 346
12.2.6 Sprite Coding 347
12.2.7 Global Motion Compensation 348
12.3 Synthetic Object Coding in MPEG-4 349
12.3.1 2D Mesh Object Coding 349
12.3.2 3D Model-based Coding 354
12.4 MPEG-4 Object types, Profiles and Levels 356
12.5 MPEG4 Part10/H.264 357
12.5.1 Core Features 358
12.5.2 Baseline Profile Features 360
12.5.3 Main Profile Features 360
12.5.4 Extended Profile Features 361
12.6 MPEG-7 361
12.6.1 Descriptor (D) 363
12.6.2 Description Scheme (DS) 365
12.6.3 Description Definition Language (DDL) 368
12.7 MPEG-21 369
12.8 Further Exploration 370
12.9 Exercises 370
12.10 References 371
13 Basic Audio Compression Techniques 374
13.1 ADPCM in Speech Coding 374
13.1.1 ADPCM 374
13.2 G.726 ADPCM 376
13.3 Vocoders 378
13.3.1 Phase Insensitivity 378
13.3.2 Channel Vocoder 378
13.3.3 Formant Vocoder 380
13.3.4 Linear Predictive Coding 380
13.3.5 CELP 383
13.3.6 Hybrid Excitation Vocoders* 389
13.4 Further Exploration 392
13.5 Exercises 392
13.6 References 393
14 MPEG Audio Compression 395
14.1 Psychoacoustics 395
14.1.1 Equal-Loudness Relations 396
14.1.2 Frequency Masking 398
14.1.3 Temporal Masking 403
14.2 MPEG Audio 405
14.2.1 MPEG Layers 405
14.2.2 MPEG Audio Strategy 406
14.2.3 MPEG Audio Compression Algorithm 407
14.2.4 MPEG-2 AAC (Advanced Audio Coding) 412
14.2.5 MPEG-4 Audio 414
14.3 Other Commercial Audio Codecs 415
14.4 The Future: MPEG-7 and MPEG-21 415
14.5 Further Exploration 416
14.6 Exercises 416
14.7 References 417
IH Multimedia Communication and Retrieval 419
15 Computer and Multimedia Networks 421
15.1 Basics of Computer and Multimedia Networks 421
15.1.10SI Network Layers 421
15.1.2 TCP/IPProtocols 422
15.2 Multiplexing Technologies 425
15.2.1 Basics of Multiplexing 425
15.2.2 Integrated Services Digital Network (ISDN) 427
15.2.3 Synchronous Optical NETwork (SONET) 428
15.2.4 Asymmetric Digital Subscriber Line (ADSL) 429
15.3 LAN and WAN 430
15.3.1 Local Area Networks (LANs) 431
15.3.2 Wide Area Networks (WANs) 434
15.3.3 Asynchronous Transfer Mode (ATM) 435
15.3.4 Gigabit and 10-Gigabit Ethernets 438
15.4 Access Networks 439
15.5 Common Peripheral Interfaces 441
15.6 Further Exploration 441
15.7 Exercises 442
15.8 References 442
16 Multimedia Network Communications and Applications 443
16.1 Quality of Multimedia Data Transmission 443
16.1.1 Quality of Service (QoS) 443
16.1.2 QoS for IP Protocols 446
16.1.3 Prioritized Delivery 447
16.2 Multimedia over IP 447
16.2.1 IP-Multicast 447
16.2.2 RTP (Real-time Transport Protocol) 449
16.2.3 Real Time Control Protocol (RTCP) 451
16.2.4 Resource ReSerVation Protocol (RSVP) 451
16.2.5 Real-Time Streaming Protocol (RTSP) 453
16.2.6 Internet Telephony 455
16.3 Multimedia over ATM Networks 459
16.3.1 Video Bitrates over ATM 459
16.3.2 ATM Adaptation Layer (AAL) 460
16.3.3 MPEG-2 Convergence to ATM 461
16.3.4 Mulficast over ATM 462
16.4 Transport of MPEG-4 462
16.4.1 DMIFinMPEG-4 462
16.4.2 MPEG-4 over IP 463
16.5 Media-on-Demand (MOD) 464
16.5.1 Interactive TV (ITV) and Set-Top Box (STB) 464
16.5.2 Broadcast Schemes for Video-on-Demand 465
16.5.3 Buffer Management 472
16.6 Further Exploration 475
16.7 Exercises 476
16.8 References 477
17 Wireless Networks 479
17.1 Wireless Networks 479
17.1.1 Analog Wireless Networks 480
17.1.2 Digital Wireless Networks 481
17.1.3 TDMAandGSM 481
17.1.4 Spread Spectrum and CDMA 483
17.1.5 Analysis of CDMA 486
17.1.6 3G Digital Wireless Networks 488
17.1.7 Wireless LAN (WLAN) 492
17.2 Radio Propagation Models 493
17.2.1 MultipathFading 494
17.2.2 Path Loss 496
17.3 Multimedia over Wireless Networks 496
17.3.1 Synchronization Loss 497
17.3.2 Error Resilient Entropy Coding 499
17.3.3 Error Concealment 501
17.3.4 Forward Error Correction (FEC) 503
17.3.5 Trends in Wireless Interactive Multimedia 506
17.4 Further Exploration 508
17.5 Exercises 508
17.6 References 510
18 Content-Based Retrieval in Digital Libraries 511
18.1 How Should We Retrieve Images? 511
18.2 C-BIRD--A Case Study 513
18.2.1 C-BIRD GUI 514
18.2.2 Color Histogram 514
18.2.3 Color Density 516
18.2.4 Color Layout 516
18.2.5 Texture Layout 517
18.2.6 Search by Illumination Invariance 519
18.2.7 Search by Object Model 520
18.3 Synopsis of Current Image Search Systems 533
18.3.1 QBIC 535
18.3.2 UC Santa Barbara Search Engines 536
18.3.3 Berkeley Digital Library Project 536
18.3.4 Chabot 536
18.3.5 Blobworld 537
18.3.6 Columbia University Image Seekers 537
18.3.7 Informedia 537
18.3.8 MetaSEEk 537
18.3.9 Photobook and FourEyes 538
18.3.10 MARS 538
18.3.11 Vtrage 538
18.3.12 Viper 538
18.3.13 Visual RetrievalWare 538
18.4 Relevance Feedback 539
18.4.1 MARS 539
18.4.2 iFind 541
18.5 Quantifying Results 541
18.6 Querying on Videos 542
18.7 Querying on Other Formats 544
18.8 Outlook for Content-Based Retrieval 544
18.9 Further Exploration 545
18.10 Exercises 546
18.11 References 547
Index 551
A course in multimedia is rapidly becoming a necessity in computer science and engineering curricula, especially now that multimedia touches most aspects of these fields. Multimedia was originally seen as a vertical application area; that is, a niche application with methods that belong only to itself. However, like pervasive computing, multimedia is now essentially a horizontal application area and forms an important component of the study of computer graphics, image processing, databases, real-time systems, operating systems, information retrieval, computer networks, computer vision, and so on. Multimedia is no longer just a toy but forms part of the technological environment in which we work and think. This book fills the need for a university-level text that examines a good deal of the core agenda computer science sees as belonging to this subject area. Multimedia has become associated with a certain set of issues in computer science and engineering, and we address those here.
The book is not an introduction to simple design issues--it serves a more advanced audience than that. On the other hand, it is not a reference work -- it is more a traditional textbook. While we perforce discuss multimedia tools, we would like to give a sense of the underlying principles in the tasks those tools carry out. Students who undertake and succeed in a course based on this text can be said to really understand fundamental matters in regard to this material; hence the title of the text.
In conjunction with this text, a full-fledged course should also allow students to make use of this knowledge to carry out interesting or even wonderful practical projects in multimedia, interactive projects that engage and sometimes amuse and, perhaps, even teach these same concepts.
Who Should Read This Book?
This text aims at introducing the basic ideas in multimedia to an audience comfortable with technical applications--that is, computer science and engineering students. It aims to cover an upper-level undergraduate multimedia course but could also be used in more advanced courses and would be a good reference for anyone, including those in industry, interested in current multimedia technologies. Graduate students needing a solid grounding in materials they may not have seen before would undoubtedly benefit from reading it.
The text mainly presents concepts, not applications. A multimedia course, on the other hand, teaches these concepts and tests them but also allows students to use coding and presentation skills they already know to address problems in multimedia. The accompanying web site shows some of the code for multimedia applications, along with some of the better projects students have developed in such a course and other useful materials best presented electronically.
The ideas in the text drive the results shown in student projects. We assume the reader knows how to program and is also completely comfortable learning yet another tool. Instead of concentrating on tools, however, we emphasize what students do not already know. Using the methods and ideas collected here, students are also able to learn more themselves, sometimes in a job setting. It is not unusual for students who take the type of multimedia course this text aims at to go on to jobs in a multimedia-related industry immediately after their senior year, and sometimes before.
The selection of material in the text addresses real issues these learners will face as soon as they show up in the workplace. Some topics are simple but new to the students; some are more complex but unavoidable in this emerging area.
Have the Authors Used This Material in a Real Class?
Since 1996, we have taught a third-year undergraduate course in multimedia systems based on the introductory materials set out in this book. A one-semester course could very likely not include all the material covered in this text, but we have usually managed to consider a good many of the topics addressed and to mention a select number of issues in Part III within that time frame.
Over the same time period as an introduction to more advanced materials, we have also taught a one-semester graduate-level course using notes covering topics similar to the ground covered by this text. A fourth-year or graduate course would do well to consider material from Parts I and II of the book and then some material from Part III, perhaps in conjunction with some of the original research references included here and results presented at topical conferences.
We have attempted to fill both needs, concentrating on an undergraduate audience but including more advanced material as well. Sections that can safely be omitted on a first reading are marked with an asterisk.
What is Covered in This Text?
In Part I, Multimedia Authoring and Data Representations, we introduce some of the no- tions included in the term multimedia and look at its history as well as its present. Practi- cally speaking, we carry out multimedia projects using software tools, so in addition to an overview of these tools, we get down to some of the nuts and bolts of multimedia authoring. Representing data is critical in multimedia, and we look at the most important data repre-sentations for multimedia applications, examining image data, video data, and audio data in detail. Since color is vitally important in multimedia programs, we see how this important area impacts multimedia issues.
In Part II, Multimedia Data Compression, we consider how we can make all this data fly onto the screen and speakers. Data compression tums out to be an important enabling technology that makes modem multimedia systems possible, so we look at lossless and lossy compression methods. For the latter category, JPEG still-image compression standards, including JPEG2000, are arguably the most important, so we consider these in detail. But since a picture is worth a thousand words and video is worth more than a million words per minute, we examine the ideas behind MPEG standards MPEG-1, MPEG-2, MPEG-4, MPEG-7, and beyond. Separately, we consider some basic audio compression techniques and take a look at MPEG Audio, including MP3.
In Part III, Multimedia Communication and Retrieval, we consider the great demands multimedia places on networks and systems. We go on to consider network technologies and protocols that make interactive multimedia possible. Some of the applications discussed include multimedia on demand, multimedia over IP, multimedia over ATM, and multimedia over wireless networks. Content-based retrieval is a particularly important issue in digital libraries and interactive multimedia, so we examine ideas and systems for this application in some detail. Textbook Web Site The book's web site is www.cs.sfu.caJmmbook. There, you will find copies of figures from the book, an errata sheet updated regularly, programs that help demonstrate concepts in the text, and a dynamic set of links for the Further Exploration section of each chapter. Since these links are regularly updated (and of course URLs change often) they are mostly online rather than in the text. Instructors' Resources
The main text web site has no ID and password, but access to sample student projects is at the instructor's discretion and is password-protected. Prentice Hall also hosts a web site containing Course Instructor resources for adopters of the text. These include an extensive collection of online course notes, a one-semester course syllabus and calendar of events, solutions for the exercises in the text, sample assignments and solutions, sample exams, and extra exam questions. Acknowledgements We are most grateful to colleagues who generously gave of their time to review this text, and we wish to express our thanks to Shu-Ching Chen, Edward Chang, Qianping Gu, Rachelle S. Heller, Gongzhu Hu, S. N. Jayaram, Tiko Kameda, Xiaobo Li, Siwei Lu, Dennis Richards, and Jacques Vaisey.
The writing of this text has been greatly aided by a number of suggestions from present and former colleagues and students. We would like to thank James Au, Chad Ciavarro, Hao Jiang, Steven Kilthau, Michael King, Cheng Lu, Yi Sun, Dominic Szopa, Zinovi Tauber, Malte von Ruden, Jian Wang, Jie Wei, Edward Yan, Yingchen Yang, Osmar Zaiane, Wenbiao Zhang, and William Zhong for their assistance. As well, Mr. Ye Lu made great contributions to Chapters 8 and 9 and his valiant efforts are particularly appreciated. We are also most grateful for the students who generously made their course projects available for instructional use for this book.