Crowne Plaza Hotel, Seattle, USA
We propose a novel delivery mechanism called 2-Phase Service Model to deliver video data to home users connected to the Internet through a low-bandwidth device such as a modem. In our scheme, non-adjacent fragments of the requested video file are first downloaded to the client during Initialization Phase. The missing fragments are transmitted to the client as the video is being played out, using a novel pipelining technique. This scheme offers several benefits as follows. First, it allows the user to perform a quick preview through the video with minimal delay. Second, it naturally supports VCR functionality with almost no delay as demonstrated by the simulation results shown in the paper. Finally, our mathematical analysis shows that despite the desirable features it offers, 2-Phase Service Model does not incur any more initialization delay than that of the conventional pipelining technique.
Video player, VCR functions, home users, World-Wide-Web, digital libraries, pipelining
World-Wide-Web has become extremely popular during the recent years. It enables a wide variety of information to be disseminated to a wide range of users including home users who typically access the Web from home via modems. Though modem technology has advanced to operate at a higher speed (e.g, 57600 bps), its bandwidth is still significantly low compared to that obtained in Local Area Networks (LANs) or Wide Area Networks (WANs).
The modem bandwidth is barely enough to support the retrieval of texts and graphics which are the primitive media types offered on the Web. As the Web progresses to provide continuous media (e.g., audios and videos), the narrow bandwidth of the modem implicitly limits the access to these media to those home users who can tolerate either long wait time or highly jittering pictures.
Figure 1 depicts a simplified on-demand video streaming over the Web. A home user visits the Web site which provides meta information about video clips (i.e., video titles, descriptions, and playback rates). Once the user selects a video, the web server returns a video script corresponding to the selected video to the user's web browser (Label 1 in Figure 1). The web browser then invokes a local video player, which interprets the script, and makes a connection to the video server specified in the script (Label 2 in Figure 1). After some negotiation between the video server and the player, the video object is streamed to the player.
Two approaches are available for streaming a video object to a remote player:
A drawback of pipelining is that it can only be done in the forward direction making it unsuitable for implementing VCR-style operations.
Despite the employment of the pipelining technique, a significant delay is still unavoidable in the low bandwidth environment if decent video quality is insisted. To encourage the home users to wait for the service, we propose a framework for a novel delivery mechanism called 2-Phase Service Model (2PSM) which offers several desirable features over the conventional pipelining technique as follows:
The remainder of this paper is organized as follows. In Section 2, we review the conventional pipelining scheme to make the paper self-contained. In Section 3, we present our proposed technique in details. Implementation effort is discussed in Section 4. We develop a simulation model and measure the performance of our technique compared to that of the pipelining approach in Section 5. In Section 6, we analyze the initialization delay for both techniques. Finally, we give our concluding remarks and discuss our on-going and future research in Section 7.
In this section, we review the concept of the pipelining mechanism, and summarize some analyses relevant to this paper. Interested readers are referred to [Ghand94, Wang96, Hua97] for more detail.
A video file is logically divided into a sequence of slices , where the playback time of must eclipse the time required to materialize , . This time is dominated by the transmission time of for systems whose communication bandwidth is a bottleneck. As soon as the entire first slice ( ) has been downloaded, the playback can begin. Figure 2 illustrates the pipelining mechanism.
Let S represent a video file, and |S| be its size (in bits). We define a Production Consumption Rate (pcr) as a ratio between the average download rate (m) (i.e., the speed of the modem in bps) and the average playback rate (p) in bps. Thus, we have
pcr = .
Since the time to download must eclipse the time to playback , the following holds:
Since and ,
Since the time to download the sequence of is equal to the time to playback the sequence of . We have
We choose the size of the last slice to be that of a minimum retrieval unit (i.e., a group of frames (GOF)) to reduce the initialization delay (i.e., needs to be as small as possible). GOF represents a decodable unit whose frames in the unit can be decoded independently from other units. For example, GOF for MPEG [Mpeg92] videos contains a sequence of I-, P-, and B-frames.
Since the average size (in bits) of GOF ( ) is much smaller than |S|, we can approximate the size of the first slice as
Pipelining allows one to reduce the initialization latency. However, it is not suitable for search operations such as fast-forward or fast-reverse. To provide the preview feature and allow quick search operations, 2PSM logically organizes video files stored at the server as illustrated in Figure 3.
Each video file is divided into a sequence of playback units ( ). Each unit consists of a pair of L-fragment and R-fragment. Thus, the two types of fragments interleave each other throughout the video file. All L-fragments are first downloaded before the R-fragments are downloaded. The L-fragment and the R-fragment contain l and r number of GOFs, respectively.
2PSM consists of two phases as follows:
We note that once the playback is initiated, a playback buffer is used to assemble the playback units using the R-fragments from the server and the corresponding L-fragments from the local L-file.
To ensure continuous play, the time required to download the R-fragment must eclipse the time to playback the playback unit , for .
Let and be the sizes (in bits) of the R-fragment ( ) and a playback unit , respectively. The continuous requirement can now be expressed mathematically as follows:
Unlike the conventional pipelining technique, in which the sizes of the slices vary significantly, the sizes of the R-fragments and the L-fragments are reasonably uniform in 2PSM. Let , , and be the average sizes (in bits) of an L-fragment, an R-fragment, and a playback unit, respectively. Equation (1) can be estimated in terms of the average values as follows:
Substitute the Production Consumption Rate (pcr) for ; the following holds:
After some number of GOFs of the L-fragments, say v, are downloaded in the Initialization Phase, the user is allowed to preview the video. That is, the player plays the already downloaded GOFs, and skips those that have not been downloaded. v is a tuning parameter, which depends on several factors such as the types of videos and amount of wait time users are willing to tolerate. For example, if the video is of head-talking type, which has few changes in the scene, v could be small. However, if the video is of action type, where one frame is much different from neighboring frames, v could be larger. Nevertheless, v should not incur more wait time than what users can tolerate. As the Initialization Phase proceeds, the quality of the preview is gradually increased as more L-fragments are downloaded.
We present three different strategies to download the L-fragments: Linear strategy, Spreading strategy, and Binary-tree strategy. These strategies show our effort to improve the preview quality defined in this paper as how well the preview covers the entire video. We note that if the slice is used to allow previewing in the conventional pipelining technique, the preview quality is very poor because all L-fragments are clustered at the begining of the file. As a result, the preview covers only the beginning of the video.
Recall that 2PSM divides the video file into a number of playback units, say n. Each unit contains an L-fragment and an R-fragment consisting of l and r GOFs, respectively. v is the number of GOFs required before a preview is allowed.
Figure 6 shows 4 steps of downloading. The L-fragments can be seen as forming a binary tree. These fragments are downloaded in the following order: 90, 42, 138, 18, 66, 114, 162, and so forth. The preview can begin as soon as Step 3 is finished if at least 7 GOFs are enough for previewing. During the preview, the player plays GOFs 18, 42, 66, 90, 114, 138, and 162, respectively.
Since L-fragments are downloaded first and are spread throughout the video file, fast-forward is implemented simply by playing these L-fragments in a forward direction. Fast-reverse is achieved by playing them in a reverse direction. When the normal play is resumed, the corresponding R-fragment of the the playback unit at the resumption point needs to be downloaded if it is not cached locally. The normal play can begin as soon as the R-fragment has been downloaded.
We are currently implementing 2-Phase Service Model using the Continuous Media Toolkit (CMT) [Smith96]. The system is functioning, but not very stable to perform fruitful experiments. Nevertheless, it gave us more insight on our technique when adapted to a practical environment.
CMT provides a flexible and complete programming environment for developing continuous media applications on several platforms. CMT extends Tcl/Tk and Tcl-DP to support media in continuous domain. It supports several video file formats such as MPEG, Motion JPEG, and H.261. Transport protocols such as RTP and UDP can be used to deliver the media to the client.
The current prototype implements the linear downloading strategy for L-fragments on a Linux system. The GOFs are stored in the server in the same order they are downloaded to facilitate sequential access. To minimize disk seek overhead, we reorganize the video file at the server in advance according to the transmission order of the GOFs. Once the re-organization is done, the server could send the media data in the order as described in the previous section, yet retrieve the data from its disk sequentially.
We modify cmplayer [Patel96] to accept an additional parameter called preview in a CM script file. Cmplayer interprets the script file for several types of information such as hostnames of the media servers, locations of the media files, etc. Preview parameter indicates the wait time required (in secs) before a preview is allowed. We note that the script file can be saved on a local disk and possibly be changed by the user to some unreasonable value. In such cases, the player must ensure that enough L-fragments are downloaded, and ignore the value of the preview.
At the current stage, we focus on supporting MPEG-1 videos. Figure 7 shows CMT objects related to our prototype. At the server, MPEGSegment object periodically prepares MPEG frames for PacketSource object. In each cycle, MPEGSegment object determines the number of frames, the frames to be transmitted, and their transmission order. MPEGSegment object relies on MPEGDataFile object to retrieve the needed MPEG frames from the server disk. PacketSource object reorgarnizes these frames into packets and transmits them to PacketDest object residing on the client. PacketDest object assembles the packets and passes them to MPEGPlay object. PacketSource and PacketDest objects employ UDP transport protocol and handle all low level communication including packet loss and retransmission of the lost packets. MPEGPlay object reassembles the frames, and MPEGSoftwareDevice object renders these frames on to the screen. We modify MPEGDataFile, MPEGSegment, and MPEGPlay objects to function as described in Section 3.
Under UDP protocol [Kochan89], media data could arrive at the player in different order. It also could get lost while traveling through the network. Several protocols have been developed to reduce the lost rate [Jeffay92, Tally94]. When experiencing frame loss or unexpected out-of-sequence frames, 2PSM adjusts itself as follows:
The user can perform fast-forward and fast-reverse as soon as the Playback Phase begins. The player plays the L-fragments and skips the R-fragments during the fast-forward and the fast-reverse.
In this section, we evaluate the performance of 2PSM by comparing it to the conventional pipelining approach. We develop a simulation model and measure the performance of the two techniques based on the following metrics.
We recall that in 2PSM, the playback phase cannot start until the first R-fragment ( ) has been downloaded to ensure a continuous playback. The same requirement applies when a fast-forward takes the playback to some future point in the video. The R-fragment of the playback unit, which contains the desired resumption point, must first be downloaded before the normal play can resume. Fortunately, this delay should be very brief. In the case of conventional pipelining, however, the delay can be very significant since many of the data slices are large. These bigger slices must be downloaded before the resumption. Therefore, a longer delay is expected.
It first seems that fast-reverse would not incur any communication overhead due to the caching of the downloaded data. However, a fast-forward can cause the player to skip forward to a future point. Some time later, a fast-reverse may take the playback to a video frame skipped earlier. In order for the video player to resume the normal play starting from this frame, one or more missing frames must first be downloaded causing a delay in the resumption.
Since communication bandwidth is the bottleneck in our environment, the metrics mentioned above are measured based solely on the communication overhead as it dominates other costs.
The model focuses on the user's behavior during the playback period. The simulator maintains 2 pointers: display pointer and download pointer. The display pointer records the retrieval unit being displayed, while the download pointer keeps track of the retrieval unit being downloaded. The time to display one retrieval unit (GOF) is one second. That means, at every second, the display pointer is advanced. The download pointer is advanced only when the time to download the entire retrieval unit has passed (i.e., depending on the modem transmission rate). Initially, the display pointer and the download pointer are set to the first retrieval unit. The simulator proceeds as follows:
We use the following parameters and their values in our study.
We fixed m and p at 57600 and 288000 bps, respectively, and varied f between 60 and 300. In other words, the video length was varied between 1 and 5 minutes. The results are shown in Figure 10 and 11.
Figure 10 demonstrates that 2PSM requires much less time to resume after playing in fast-forward mode than that required in the pipelining scheme. For example, when the video length is 5 minutes, 2PSM takes under 10 seconds to resume, while pipelining takes about 43 seconds. Moreover, 2PSM is invulnerable to the increase in the video length, while the pipelining scheme is severely affected. This is because 2PSM needs to download only the R-fragment of the desirable playback unit, while the pipelining scheme needs to download a much bigger slice to ensure the continuous play after resumption. The fact that the resumption times of 2PSM is insensitive to the video size is a very desirable property.
Figure 11 shows that the pipelining scheme requires less time to resume than that of 2PSM after playing in fast-reverse mode. However, the difference is under 5 seconds. Fast-reverse mostly jumps back to play already displayed frames, which are cached locally. These cases result in no communication overhead. Nevertheless, there are cases where fast-forward skips some frames, and fast-reverse jumps back to play those frames. This causes 2PSM to download an R-fragment before it resumes while most of the skipped frames are already downloaded before the fast-forward occurs for the pipelining scheme. As a result, the pipelining scheme incurs a very small resumption time.
When compared Figure 10 to Figure 11, we notice that the average resumption times in either direction (i.e., forwarding or reversing) are consistent for 2PSM, while they are so much different in the pipelining scheme. Small and consistent resumption times is another important property to enhance users' satisfaction for the service.
We set up the same parameters as in Section 5.2.1, and measured the preview delay. The results are plotted in Figure 12.
We allow the user to preview as soon as Step 1 of 2PSM has finished. The preview is about 10% of the total retrieval units. Our preliminary experiments with our video player discussed in Section 4 suggest that a preview of 10% of the entire video is good enough to provide general ideas about the videos we experimented with.
Figure 12 demonstrates that the preview delay is much lower than the initialization delay. For example, when the video length is of 3 minutes, users are able to watch the preview after about 1.5 minutes of waiting, while the normal play can begin only after about 12 minutes. Thus, the user has the option to cancel the video after 1.5 minutes if it is not the desirable one, instead of having to waste the next 10.5 minutes for the wrong video. The figure also confirms our analysis that the initialization delay of both techniques are comparable.
In this experiment, we varied PCR (pcr) by varying the playback rate between 96000 and 480000 bps, and we fixed f at 180. Varying playback rate has the same effect to PCR as varying the modem transmission rate with fixed playback rate. Thus, the results of this experiment also explain the behavior of both techniques when the modem rate is lower than 57600 bps as we typically experience in many situations. The resumption time after fast-forward and fast-reverse for both techniques are shown in Figure 13.
We observe that the resumption times for 2PSM are under 10 seconds and are in between those of the pipelining technique. Resumption times after playing in fast-reverse mode of both techniques are small and not affected by the increases in PCR. However, the resumption time after playing in fast-forward mode is much higher for pipelining when the PCR is small. That is, in most situations where the modem transmission rate is insufficient, the pipelining scheme incurs much more wait time than 2PSM.
In this section, we demonstrate our mathematical analysis that 2PSM incurs no worse initialization delay than that of the pipelining technique. We formally define the initialization delay as follows:
We recall the following notations:
We recall from Section 2 that
Since the playback can begin as soon as the entire first slice is downloaded, the initialization delay for this scheme can be derived as follows:
Since , we derive in terms of |S| as shown below:
Substitute for (from Section 3), we have
The initialization delay of 2PSM is then estimated as follows.
Since, is much less than |S|,
We can conclude that 2PSM does not incur any more wait time than that of the conventional pipelining technique.
Video players playing videos from remote sites have not been popular in the homes due to the following reasons. They either have to sacrifice the playback quality (conventional data streaming technique), or they require long delay to initialize the client's buffer with a significant portion of the video (pipelining techniques). Furthermore, during the playback period, the user is only allowed to skip forward or reverse to some random position in the video instead of playing in fast-forward or fast-reverse mode.
2-Phase Service Model is designed to address these drawbacks. It offers a major step forward in making playback of remote videos from home more interesting. In particular, it allows the user to quickly preview a video to avoid waiting for an unintended one. The novel data fragmentation technique and the unique data streaming strategy provide a natural environment for implementing VCR-style operations. The efficient search makes it an excellent candidate for applications such as digital libraries and distance learning systems.
We are currently implementing a video player based on this technique. The system is currently functioning but not very stable. The preliminary observation of our prototype shows that it supports the features discussed very well. We are working to incorporate the audio capability and to further reduce the service delay.