Thanks to all that answer my question. My application is receiving audio data from newtork and play it with ALSA library. The environment is in ARM not in PC. The same codes can not perform well in ARM as in PC. I have not seen the implementation of snd_pcm_writei(), and only read the document of ALSA to use this function. I guess snd_pcm_writei() will generate a thread that can write data into the DMA of soundcard at each period. This may explain why the duration of snd_pcm_writei() is not regular, and why ALSA lib can use function call to implement mixing without a mixing server. Sometimes the duration is long and sometimes is short. This may be caused by the generation of the thread in snd_pcm_writei().
I think the thread plays the role of mixing server and terminates when all the periods of buffer are played. Then XRUN state is entered. In my environment, the speed of receiving audio data from network is slow than playing audio data with snd_pcm_writei(). In the beginning, the pcm will enter XRUN state, because the audio data from network is not full enough to play. I use more buffer to receive data and play less buffer with snd_pcm_writei(), so the time that audio buffer runs out can be delayed. The delay can let the thread in snd_pcm_writei() to have enough space to play audio data. When the buffer runs out, the thread in snd_pcm_writei() may still has remaining data to play, so XRUN won't happen.
I don't know whether I am right. After I see the implementation of snd_pcm_writei(), I may have more correct explanation.