Linux Kernel Hacking free course - 2003
Transcript of Lecture 09 by Alex Miocinovic
(Blue text are comments added by Alex)
In the previous lessons we have been developing the driver for our sound card, starting from the part now contained in the als4000_main.c . This is a framework, common to all the PCI devices, that takes care of registering a number of device files that are related to corresponding devices in our sound card (DSP, mixer, MIDI, codec, etc.). We did this by using the functions of a kernel subsystem called "Soundcore". This kernel component allows us to register a number of devices using the same major number (common to all sound card drivers) but with different minor numbers.
In Lesson 08 we have also implemented the code of the open() and release() system calls for the DSP device. The task of the als4000_dsp_open() call is to find the device descriptor data structure, given the minor number of the device, and this was done with a call to get_dsp_descriptor().
This function navigates the list of all the descriptors related to the DSP device of each als4000 board that's equipped in the system. As each descriptor is encountered, a check is made on the minor number that was stored in the descriptor itself, and a pointer to the wanted descriptor is returned if a match is found. The function also checks that only one process is accessing the device in each of the possible modes (for reading and for writing). Furthermore, if the user mode process is trying to open for writing, the function initializes the DMA controller with a call to init_dma(). This allocates a buffer in RAM where the user mode process will write data to be played by the DSP and the sound card will read the same data (using its DMA controller as a bus master) before transferring it to the digital to analog converter that's connected to the loudspeakers.
We have also implemented the corresponding release() system call in which we undo all that was done in open(), in particular we release (with caution) the DMA resources.
The sound card hardware programming
In this lesson we will implement the write() system call that translates to als4000_dsp_write() . This function has two tasks; one is the management of the DMA buffer, and the other is the control of the hardware that's responsible for sound playback on our card. In other words we must have a way to tell the DMA controller on the sound card :
"read sound samples, of a particular format (8bit, 16bit, mono or stereo etc.) from buffer, starting at address *dma_start and ending at address
*dma_end"
The way to do this is described on the board's technical manual. This type of document is generally difficult to read, since it's not written as a tutorial, but rather as a mere list of data with very few comments. Nevertheless we can see that our card support some formats for the sound samples, and in particular :
8 bit unsigned (0 , 255) mono
8 bit unsigned (0 , 255) stereo
8 bit signed (-128 , 127) mono
8 bit signed (-128 , 127) stereo
The card also supports the same combinations of formats for 16 bit samples.
All this translates in a number of parameter definitions that we collect in the als4000.h file (i.e. PCM_DATA_US, PCM_DATA_LS, PCM_DATA_MS and others).
As a PCI device our card has not a fixed IO address space, rather it has a fixed length and structure of the address space, relative to an initial address that's assigned at configuration time (PCI plug-and-play). So we can define a series of macros that, given the starting address A of the IO address space, return the address of the wanted IO region, as in
#define MIXER_DATA(A) ((A) + 0x14)
which states that the mixer data IO register is at 0x14 bytes of offset from A.
Another thing we learn from the manual is that, to read a data from the mixer device we have to (see static inline unsigned char mixer_read(int base, unsigned char reg) ) send to the MIXER_INDEX(A) register a value that corresponds to the channel that we want to read, then we wait a little and then we can read our data from register MIXER_DATA(A).
So we define this procedure as an "inline" function in als4000.h and in so doing we don't even pay any performance penalty, since the code of an "inline" function gets substituted during compilation at each recurrence of the function call.
We do the same for all the procedures related with the management of the hardware in our sound card (sound card programming) as writing to the mixer, reading and writing to the ESP (the DSP device) and other actions.
In our card there are a few registers known as GCR (Global Control Registers) that are common to all the devices in the card and that are related to the control of DMA transfers. We then write a function to read and write these registers and so on.
Recalling fig. 1 of lesson 08, we see that in the circular buffer the zone between *dma_start and *dma_end is used by the card to transfer data. This means that the driver should have previously programmed the card to do a DMA transfer starting from *dma_start and up until *dma_end . So the card is "playing back" the data and the driver has no rights to access this area of the buffer.
In the "filled" zone we have sound samples to be played, but for which no data transfer has been initiated by the card. This will be done when the driver instructs the card to do so, but until then, we have no rights to touch at this area either.
The free zone is the only one on which the driver could write.
Because of the way the card's DMA controller works, we can only have a situation where *dma_start comes before *dma_end in the buffer ordering of the addresses. As describer in fig. 1, we can have a situation in which the DMA area comes both before and after the "filled" area, but we would never have the situation of fig. 3 in which the DMA area folds back from the end of the buffer to the beginning.
So in case of fig. 1 we shall first program a DMA transfer that reaches the end of the buffer, followed by a second DMA transfer that starts from the beginning of the buffer.
One of the advantages of the circular buffer is that the als4000_dsp_write() system call shall only modify the *free_start pointer to decrement the free area in proportion of the amount of data written to the buffer. At the same time the DMA controller, that ends its task of transferring the data to the DSP, generates an interrupt and the interrupt handler will only modify *dma_start and *dma_end and it won't touch *free_start.
Fig.3 this is not permitted by the sound card DMA controller hw.
So each of the driver's software components will modify his pointer (or pointers) to the circular buffer areas and will never touch the pointers that are managed by the other components. Since everything is moving from left to right, the final result is that the circular buffer data structure is auto-synchronizing.
To see this with an example suppose that the als4000_dsp_write() reads *dma_start and *dma_end and then writes to the buffer and modify *free_start. Suppose also that between reading *dma_start and *dma_end and modifying *free_start the asynchronous event of an interrupt happens that modify *dma_start and *dma_end. The result is that als4000_dsp_write() has now an old concept of *dma_start and *dma_end , but a conservative concept (i.e. it thinks to have a shorter free area than what's really available). This doesn't interfere with the correct functioning of the driver.
Implementation of the als4000_dsp_write( ) system call
Lets now consider how we implement the als4000_dsp_write() system call.
The first thing we do is to recover the driver's peripheral descriptor and we remember that within the "open()" call we stored a reference to it in file->private_data .
Then we must find what's the starting address of the buffer, how long the buffer is, and what is being done at the moment of the call.
We decided to store these information in the struct als4000_dma dma_dac element of the driver's peripheral descriptor (struct als4000_descr { } ). The name dma_dac refers to the DMA transfer in the digital to analog direction which means in "playback mode".
All the information we need is in
dma_dac->buflen
dma_dac->buffer
dma_dac->dma_start
dma_dac->dma_end
dma_dac->free_start
When we want the DMA to start we need to set the format of the data to be transferred (i.e. if it's stereo, 16bit, signed, at 44.1 Khz sample rate). We store these values in
dma_dac->rate
dma_dac->format
So we begin our als4000_dsp_write() system call by initializing some variables with
startbuf = d->buffer ;
endbuf = startbuf + d->buflen – 1 ;
so that endbuf points to the last byte in the buffer.
Our als4000_dsp_write() system call receives information concerning the buffer from the process in user mode. Before we manipulate data, the address of which we get from a user mode process, it's a good rule to check that the memory allocated to the buffer really belongs to the linear address space of the process.
The memory address space is divided in different areas of which there are two types. One is dedicated to the kernel, and in this area the kernel will not make any control. The other type is space dedicated to the process and there are different areas for different processes. A user mode process can't normally use the memory that's reserved to the kernel, but a malicious way to use these addresses anyway would be to pass them to a system call, since a system call can naturally access kernel mode addresses.
So a normal check that's done in a system call implementation is the one concerning the correctness of the passed addresses (i.e. that they belongs to the user mode address space ). This is the role of the access_ok() function.
Another check we do is to see if the struct file *file we receive as the first argument of the call is a "pipe" (i.e. an object on which lseek operations are not allowed).
The als4000_dsp_write() system call receives an address of a buffer in the user memory area (containing bytes to be transferred to our device file), and its dimension. Within the call we should pass all these bytes from the user buffer to the DMA buffer. For a number of reasons this can't be done all at once, the main reason being that, in our DMA buffer, we may have less room than the dimension of the entire user space buffer. So we must transfer data in chunks from user space to the DMA buffer.
We then enter a while loop in which rest represents the dimension of the remaining user space buffer to be transferred and is initially set to count, that's the full extension of the user space buffer.
The main task within the while loop is to understand how much free area there is in our circular DMA buffer. So we start with reading the free_start element of our DMA descriptor and setting the *free_start pointer with its value
char *free_start = d->free_start ;
If the buffer has no free space, our function shall block the process since we are in a situation in which the process has written to the buffer all the data that it could, and now it must stop and wait for the DMA controller to empty the buffer (while a sound is being played back through the loudspeakers).
To put the process in a sleeping state we use the function
wait_event_interruptible(d->wait, buffer_free(d));
that puts our user mode process in a wait queue (d->wait), sleeping but capable of receiving signals. Normally the process leaves the sleeping state when the condition buffer_free(d) evaluates to TRUE.
We know that the DMA controller is active when the area of the buffer between *dma_start and *dma_end is not null. To indicate that there is no DMA transfer ongoing we can simply set the *dma_start and *dma_end pointers to the same value. This is the reason we implement the dma_running() function as in als4000.h; the function evaluates to TRUE or FALSE depending on the pointers being different or equal.
WE know that there is free space in the DMA buffer when *free_start is not equal to *dma_start (see fig.1), but we also have to consider a special case when there is free space, but *free_start and *dma_end are equal. In fact this is exactly the case when we initialize the DMA buffer with
*free_start = *dma_start = *dma_end
and all the pointers point to the initial address of the buffer.
Note that the condition *dma_start = *dma_end can only be interpreted as an empty "DMA active" area, never as a "DMA active" area that fills the buffer. In fact the *dma_end pointer always points to the address that's one position (i.e. one byte) after the last address in the "DMA active" area. When the "DMA active" area reaches the end of the buffer *dma_end points to the first byte following the end of our buffer and we have to "manually" reset its value to point again to the first byte in the buffer.
Since *free_start is "automatically" wrapped around to the beginning of the buffer after it reaches the end (i.e. the value of the address of the last byte in the DMA buffer), this ensures that, if *free_start == *dma_end , then :
1- we have an empty "DMA active" area and
2- there certainly is free space in the buffer.
To better understand this point note that, with reference to Fig.1, the "free" area of the DMA buffer must come before the "DMA active" area, while the "filled" area always starts at the first byte position after the "DMA active" area. If this byte position is pointed to by *free_start , the "filled" area must be empty so the "free" area of the DMA buffer must extend to the full DMA buffer length, less the size of the "DMA active" area.
So we return from the wait_event_interruptible( ); call when the buffer_free(d) condition becomes TRUE and we need to evaluate how big is our "free" area in the buffer and we start with
free_end = d->dma_start – 1;
since the free area ends at the byte before *dma_start, but a particular case we have to handle is when the end of the free area comes before its beginning (in the sequential ordering of the addresses in our DMA buffer). In this case we are happy to set the end of the free area to the value of the last address in the buffer:
if (free_end < free_start) free_end = endbuf;
Then we can use the c variable to store the length of the free area:
c = free_end – free_start + 1;
and this represents the number of bytes we can write in the present iteration of the while loop. If this number is greater than the remaining number of bytes in the user space buffer (rest), then we can limit us to write the strictly needed number of bytes:
if (rest < c) c = rest;
To transfer data from a buffer in user space to the DMA buffer we use the function
__copy_from_user(free_start, p, c)
where p points to the current position in the buffer in user space.
Having transferred c bytes, we decrement rest of c bytes so that when rest equals c we can leave the while loop at next iteration. We also increment the pointer of the buffer in user space of the same amount c. Since the write() system call returns the number of bytes that have been written, we must calculate this value and return it.
At this point we must update the value in *free_start since the free area of the DMA buffer has shrunk, and this is simply done with d->free_start = free_start + c , but we must handle the case when, in doing this, we reach the end of the DMA buffer. If we do reach it we say d->free_start = startbuf so that next time we'll start writing from the beginning of the buffer.
Now that we have copied some data from the user space buffer to the DMA buffer there is a need to activate the DMA transfer in case the sound card is not already playing back some sound. In any case during the first call to als4000_dsp_write()we will certainly have to activate the DMA transfer. Then if everything goes normally the DMA transfer will feed itself automatically, since the interrupt handler will automatically reset and update the DMA active area, until it finds some data in the "filled" area.
If we need to start the DMA transfer one or more times after the first call to als4000_dsp_write(), this means that there have been problems during the playback of our sound file. In fact the CPU may have not being able to provide sound data fast enough, while the DMA controller had all the time to empty the DMA buffer.
After the instruction
if (!dma_running(d)) check_and_start_dma(descr, d);
the while loop continues, asking if there is some more data in the user space buffer to be copied to the DMA buffer, and going to sleep if there is no free area in the DMA buffer.
The check_and_start_dma() function is implemented in als4000_dsp.c and takes as arguments the driver's peripheral descriptor and the DMA buffer descriptor. It starts by calculating the pointer at the end of the buffer
char * endbuf = d->buffer + d->buflen;
Then the function evaluates and sets the *dma_start pointer. Note that, while this is a pointer that's normally managed by the interrupt handler, here we can make the assumption that this check_and_start_dma() function is invoked when there can't be interrupts running, since we are invoking it after a check on !dma_running(d). So we can touch the *dma_start pointer without fear of messing things up.
Suppose that, for some reason, the DMA transfer blocks as in fig.4 (i.e. the CPU was slower than the DMA controller).
This is the general situation of a blocked DMA transfer and we would set d->dma_start to the value of d->dma_end since this is the address we shall use to begin a new DMA transfer, but we have also to consider the special case corresponding to a blocked DMA transfer with
d->dma_start = d->dma_end = endbuf
Fig. 4 DMA transfer is blocked
In this case we would set d->dma_start to the value of *buffer. So we write
d->dma_start = (d->dma_end >= endbuf ? d->buffer
: d->dma_end);
which implements the exact wraparound of the *dma_start pointer since the *dma_end pointer always points to the address that's one position (i.e. one byte) after the last address in the "DMA active" area.
Proceeding with the implementation of the check_and_start_dma() function we now must calculate the extension of the DMA area that we want to activate. This area can potentially go from *dma_start to *free_start. So if *dma_start is less than *free_start we say
dmalen = d->free_start - d->dma_start;
Fig. 5 DMA transfer is blocked
otherwise if we are in the situation of Fig.5 with *dma_start greater or equal than *free_start we say
dmalen = endbuf - d->dma_start;
and no more than that, since we have reached the wraparound point and we must brake the DMA transfer in two smaller transfers.
In calculating dmalen we must take care of playing back a number of bytes that's compatible with the format that we are using for our sound samples. We do this with a call to round_to_sample() (defined in als4000_dsp.c). In this function, if we are playing 16 bit samples, we divide by two the number of samples to be transferred. Furthermore if we are playing stereo samples, we shall again divide by two the number of samples to be transferred.
So the number of bytes that gets transferred equals an exact number of sound samples and we are protected from the user process transferring a number of bytes not compatible with the used format of the sound samples. If there are bytes left in the buffer, they will be transferred with added bytes during the following iteration, but always in a consistent way.
Were we using two or more buffers , instead of our circular buffer, this operation would have been more complex, since we should have copied the remaining uneven bytes from one buffer to the other at each iteration.
Before starting the DMA transfer we do a check to see if we are ready to transfer a number (dmalen) of bytes greater than a threshold ( DMA_START_THRESHOLD ). Since starting a DMA transfer has a cost in term of CPU cycles it's useless to start a DMA for say, only 8 bytes. If we would allow many small DMA transfers, we risk to overload the CPU with DMA setup activities and we could get a very unpleasant sound.
Even if the technical manual doesn't say nothing about such a threshold, it's easy to find that trying to transfer less than 64 bytes will cause the sound card to block in a way that the only mean to exit is with an hardware reset. So we set DMA_START_THRESHOLD to 64 and if we try to transfer less bytes we return 0, as if the DMA transfer did not succeed. The bytes not transferred will be part of the next DMA.
Finally, if we do have to transfer a sufficient number of bytes, we invoke
start_dam(p, d, dmalen);
passing the driver's peripheral descriptor, the DMA buffer descriptor, and the length of the transfer. Returning from this call we update the *dam_end pointer to signal that the "DMA active" zone is not empty.
Within the start_dam(p, d, dmalen); function we do some hardware programming for our sound card, with particular reference to the functioning of the on board DMA controller. First of all, since we are using a PCI card, we shall call the pi_map_single() function. Remember from Lesson 07 that a buffer used for DMA transfer has to be associated both with a linear address (used by the CPU) and with a PCI bus address (used by the peripheral's DMA circuitry). To this end the kernel defines an architecture independent data type to store a bus address: dma_addr_t .
When a DMA mapping has to be used for a limited time interval (as for sound cards DMA buffers) we use the "Streaming DMA mapping", which is one of the two types of mapping offered by the kernel (the other is the "Consistent DMA mapping", used whenever a DMA mapping has to be used "forever", as in the case of SCSI command mailboxes).
In our case ("Streaming DMA mapping") the DMA buffer space in RAM was created within our driver ( init_dma() in als4000_dsp.c ) since the kernel functions that manage this type of DMA mapping do not allocate memory space for the buffer. In fact they only create a consistent DMA mapping between the logical addresses of the memory area that was allocated for the buffer by the driver, and the bus addresses of the local memory on the device.
This is done by pci_map_single(), in which the first argument is the pci descriptor for our device, the second argument is the address of the first byte in the DMA active area, the third argument is the number of bytes to be transferred, and the last argument indicates the direction of the transfer (PCI_DMA_TODEVICE).
Remember also that, to transfer data to the device, the driver must first fill the buffer and then create the mapping since, once the mapping is created, the device driver can't access the buffer anymore, until either the mapping is released or the pci_dma_sync_single() function is invoked, and neither can be done until the DMA data transfer on the buffer completes.
The pci_map_single() function returns an address of type dma_addr_t .This address corresponds to a physical memory address for an Intel x86 CPU, but in general this may not be the case, since its meaning is architecture dependent. What's not architecture dependent is the data itself and we must store it somewhere since we'll need it when we'll have to "unmap" the buffer at the end of the DMA transfer. We do this in the element dma_start_handle of the dma_dac data structure (see the definition of struct als4000_dma { } in als4000.h).
In general, before touching the IO ports of our device, we must be sure that nobody else is doing a similar operation on them. For instance we have a "mixer_write()" function, defined as in als4000.h , where we write a data to an IO port that we selected before. If another function writes to a third port in the middle of these two operations, we'll end up with a board's malfunction.
So in start_dma() we invoke spin_lock_irqsave() on the field hwlock of the driver's peripheral descriptor. Remember that the spinlok protects a resource from concurrent access only in systems with multiple CPUs (SMPs). In a single processor the spinlok is not even compiled since it can cause the machine to block, neither is it needed since we have only one thread of execution and the type of concurrent access we are worried about is impossible.
In any case the spin_lock_irqsave() function, not only acquires the spinlock but it also disables the interrupts (even if we are confident that an interrupt can't happen since we have not jet started the DMA transfer). This is needed for SMPs and due to the same reasoning we made about concurrent writes on IO ports.
Having acquired the control of DMA we must tell the card what's the rate at which the sound samples are to be played back.
From the Technical Manual we know that to do this we must write 0x41 to the index register of the DSP, then write to another IO port of the DSP the 8 less significant bit of our rate, followed by a second write on the same register for the 8 most significant bit. This is the implementation of set_hw_dma_rate(base, d->rate); in als4000.h and with a call to this function we inform the board that it shall play the samples with the given rate. By the way this is a fairly common solution for writing 16 bit data into 8 bit ports.
Another thing to do, within the hardware programming phase, before starting the DMA transfer to our card is to call set_playback_dma(), also implemented in als4000.h .
This function receives as a second argument the address of the first sample to be played (dmabuf) and it writes this address into an IO register (in particular the GCR91 register) of the sound card, as detailed in the technical manual. As a third argument this function receives the amount of bytes to be played back. This data is written in register GCR92 after computing the bitwise OR with a bit pattern (0x00180000) that causes two things to happen:
1- enables the auto-init feature on the board's DMA controller
2- establishes a RAM->DSP transfer (10 in bit 19 and 18)
The result of enabling the "auto-init" feature is that the DMA starts immediately upon writing this configuration on register GCR92.
Returning from the set_playback_dma() call we have a running DMA controller that's transferring bytes from the DMA buffer in RAM to the local memory on the sound card that will be read by the digital to analog converter, but our task is not yet completed since we still have to tell the digital to analog converter that it has to read the data an play them back.
We do this by calling the start_sb_dma() function, also implemented in als4000.h .
This function consist in a series of calls to esp_write() that cause some data to be written in the relevant IO ports. The needed data are :
d->hw_cmd
d->hw_fmt
d->hw_on
d->hw_off
and their combinations determine the type of reproduction we want from the digital to analog converter (DSP ) and in particular :
- the format of the samples 8 bit 16 bit
- the mono or stereo type of reproduction
- the total number of samples (that's different from the amount of bytes since it depends on the format of the samples).
As an example, from the technical manual we read that, to playback 16 bit samples, we need to set :
d->hw_cmd = 0xb2;
d->hw_on = 0xd6;
d->hw_off = 0xd5;
Other values are required for playing back 8 bit samples, and to set mono/stereo with signed/unsigned reproduction we need to define d->hw_fmt. All this setting is done by calling the set_dac_format() (defined in the als4000_dsp.c file) from within the als4000_probe() function.
So within the start_sb_dma() function we inform the digital to analog converter on how to handle the data received through the DMA transfer.
Returning from this function we are again within the start_dma() function and now we are finished with the transfer and reproduction of samples and we can relinquish the spin lock with spin_unlock_irqrestore().
Our card is playing back sound samples and the chain of functions closes backwards.
We return from start_dma() into check_and_start_dma(), where we increment
d->dma_end before returning at the end of an iteration of the while(rest){} loop, in the als4000_dsp_write() call.
From here we go waiting, in the wait_event_interruptible() call, that the free area of the DMA buffer gets again incremented.
Implementation of the interrupt handler
As the DMA active area gets transferred to the sound card and the DSP plays, an interrupt is generated at each end of a transfer. On our sound card we have other devices in addition to the DSP (the MIDI port, the Mixer, the codec ) and each generates interrupts for various reasons. So when we implement the interrupt handler as with als4000_inthandl() in als4000_main.c , we write one function that's capable of handling interrupts from various devices on the same board. Remember that the interrupt handler is registered to the PCI layer of the kernel within the "probe()" system call (als4000_probe() ), and that our sound card is seen by the PCI layer of the kernel as one peripheral to which only one PCI identifier is assigned.
So within the registered als4000_inthandl() function we check to see if the interrupt comes from the DSP device and, if this is the case, we call the appropriate handler :
if (status & 0x80) als4000_sb_playback_interrupt(p);
This handler is implemented in als4000_dsp.c and is quite simple.
We begin by verifying that the DMA is running. If it isn't there must be an error condition and we print a warning and return.
If the DMA is running we first must call the stop_dma() function. The task of this function is to program the sound card to tell it to stop playing. In fact the DMA controller has stopped before generating the interrupt, but the digital to analog converter is still functioning. So we call stop_sb_dma() (implemented in als4000.h) within the stop_dma() function and this blocks the DSP. If we don't do this, even if the DSP will stop after playing back nsamples number of samples, it may be reproducing the next samples with a format that is not what we want.
Then we must calculate the length of the DMA transfer that's been done (dmalen) and call pci_unmap_single() on p->pcidev and for the dmalen length starting from
d->dma_start_handle. This will undo the DMA mapping.
Returning from the stop_dma() function into the als4000_sb_playback_interrupt() function we must now check if the DMA buffer is empty. We do this by checking if the *dma_end is different from *free_start and we also have to manage the "wraparound" condition.
If indeed there is something to be played back, we invoke again the check_and_start_dma() function. If the pointers of the DMA buffer are saying that there is nothing left to be played (because the buffer is all empty, or because what's left in the buffer is too little to be played, due to the DMA_START_THRESHOLD, then we must tell that the DMA stops and we do this by setting
d->dma_end = d->dma_start
with the usual condition that, if we reached the end of the buffer, d->dma_start has to be set to d->buffer.
Then we reach the crucial point where we have to call
wake_up(&d->wait);
Remember that our user mode process that invoked the "write()" system call did go to sleep (on the &d->wait wait queue) if there wasn't any free area in the DMA buffer, and now we must wake it up, since there certainly is free space in the DMA buffer (i.e. the space that has just been played).
This is basically our driver, and we can use it just to do a simple "cat" shell command on our /dev/dsp device file.
The driver has a number of printk() calls that allow to follow the sequence of events and the corresponding function calls. The "jiffies" is a counter that's incremented each 10 ms .
Final remarks
For the correct operation of our driver we shall take care to allow the user mode process to invoke the "write()" system call for transferring an amount of bytes that's less than the full length of the DMA buffer.
Let's say that we have a buffer of 8192 bytes; if we allow the application to fill all the buffer with one write() then, when the DMA transfer ends, the DMA controller has no more bytes to transfer. So the DMA controller stops and before it can transfer other data it must wait for the process to be scheduled again. The time it takes for a sleeping process to be scheduled again is proportional to the CPU load, and in general is not small.
The solution is to allow the application to only write, say, 4096 bytes at each invocation of the "write()" system call. As we will see during the next lesson, the driver can say to the application that the largest amount of data it accepts is 4096 bytes. In this case the user process sleeps on writing half the DMA buffer and the driver starts the DMA transfer. Having written 4096 bytes our "write()" system call quickly ends and the user process gets awakened immediately and calls again the write(). Since the DMA buffer is half free it finds room to immediately transfer another 4096 bytes, while the first 4096 bytes are being played and new free buffer space is being created (at the rate of about 50 ms for each block of 4096 bytes).
In this situation the DMA is always functioning since, when it empties a "DMA active" area, it can immediately start working on a new "DMA active" area. Starting and stopping the DMA is managed by the interrupt handler so, even when the CPU is heavily loaded, there is no user process to awake (i.e. no need for a context switch in the CPU thread of operations).
In this situation the only limit is that the CPU should not become so heavily loaded that the user mode process can't fill the DMA buffer at a rate at least equal to the depletion rate due to the digital to analog converter.
As an example consider that at 44,1 Khz (audio CD sound quality) a 16 bit stereo stream requires about 26 ms which is quite a long time for the timings within a kernel that's not heavily loaded. So the user mode process generally will be scheduled well in time to fill the part of the buffer that was freed by the DSP (i.e. before the end of the playback of the previous block of samples).
So our driver will work well if we take care to fill only part of the DMA buffer with each invocation of the write() system call.