POSIX Threads

Introduction

This article describes some aspects of using pthreads.

POSIX (可移植操作系统接口) 线程是提高代码响应和性能的有力手段。

线程类似于进程。如同进程，线程由内核按时间分片进行管理。在单处理器系统中，内核使用时间分片来模拟线程的并发执行，这种方式和进程的相同。而在多处理器系统中，如同多个进程，线程实际上一样可以并发执行。

Introduction
Table of Contents
API
man-page
futex
线程同步
- 信号量
- posix 命名信号量
Mutex
参考

API

#include <pthread.h>
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                  void *(*start_routine) (void *), void *arg);
int pthread_join(pthread_t thread, void **retval);
int pthread_detach(pthread_t thread);
int pthread_cancel(pthread_t thread);
void pthread_testcancel(void);
void pthread_cleanup_push(void (*routine)(void *), void *arg);
void pthread_cleanup_pop(int execute);

Note

Either pthread_join(3) or pthread_detach() should be called for each thread that an application creates, so that system resources for the thread can be released. (But note that the resources of all threads are freed when the process terminates.)

第二个问题，新线程结束时如何处理。答案，新线程先停止，然后作为其清理过程的一部分，等待与另一个线程合并或“连接”。如果没有合并一个新线程，则它仍然对系统的最大线程数限制不利。这意味着如果未对线程做正确的清理，最终会导致 pthread_create() 调用失败。

那么为什么对于大多数合作性任务，多线程比多个独立的进程更优越呢？

这是因为，线程共享相同的内存空间。不同的线程可以存取内存中的同一个变量。所以，程序中的所有线程都可以读或写声明过的全局变量。如果曾用 fork() 编写过重要代码，就会认识到这个工具的重要性。为什么呢？虽然 fork() 允许创建多个进程，但它还会带来以下通信问题: 如何让多个进程相互通信，这里每个进程都有各自独立的内存空间。对这个问题没有一个简单的答案。虽然有许多不同种类的本地 IPC (进程间通信），但它们都遇到两个重要障碍:

强加了某种形式的额外内核开销，从而降低性能。
对于大多数情形，IPC 不是对于代码的“自然”扩展。通常极大地增加了程序的复杂性。

双重坏事: 开销和复杂性都非好事。如果曾经为了支持 IPC 而对程序大动干戈过，那么您就会真正欣赏线程提供的简单共享内存机制。由于所有的线程都驻留在同一内存空间，POSIX 线程无需进行开销大而复杂的长距离调用。只要利用简单的同步机制，程序中所有的线程都可以读取和修改已有的数据结构。而无需将数据经由文件描述符转储或挤入紧窄的共享内存空间。仅此一个原因，就足以让您考虑应该采用单进程/多线程模式而非多进程/单线程模式。

线程是快捷的

不仅如此。线程同样还是非常快捷的。与标准 fork() 相比，线程带来的开销很小。内核无需单独复制进程的内存空间或文件描述符等等。这就节省了大量的 CPU 时间，使得线程创建比新进程创建快上十到一百倍。因为这一点，可以大量使用线程而无需太过于担心带来的 CPU 或内存不足。使用 fork() 时导致的大量 CPU 占用也不复存在。这表示只要在程序中有意义，通常就可以创建线程。

当然，和进程一样，线程将利用多 CPU。如果软件是针对多处理器系统设计的，这就真的是一大特性（如果软件是开放源码，则最终可能在不少平台上运行）。特定类型线程程序（尤其是 CPU 密集型程序）的性能将随系统中处理器的数目几乎线性地提高。如果正在编写 CPU 非常密集型的程序，则绝对想设法在代码中使用多线程。一旦掌握了线程编码，无需使用繁琐的 IPC 和其它复杂的通信机制，就能够以全新和创造性的方法解决编码难题。所有这些特性配合在一起使得多线程编程更有趣、快速和灵活。

线程是可移植的

如果熟悉 Linux 编程，就有可能知道 __clone() 系统调用。__clone() 类似于 fork()，同时也有许多线程的特性。例如，使用 __clone()，新的子进程可以有选择地共享父进程的执行环境（内存空间，文件描述符等）。这是好的一面。但 __clone() 也有不足之处。正如__clone() 在线帮助指出:

“__clone 调用是特定于 Linux 平台的，不适用于实现可移植的程序。欲编写线程化应用程序（多线程控制同一内存空间），最好使用实现 POSIX 1003.1c 线程 API 的库，例如 Linux-Threads 库。参阅 pthread_create(3thr)。”

虽然 __clone() 有线程的许多特性，但它是不可移植的。当然这并不意味着代码中不能使用它。但在软件中考虑使用 __clone() 时应当权衡这一事实。值得庆幸的是，正如 __clone() 在线帮助指出，有一种更好的替代方案：POSIX 线程。如果想编写 可移植的 多线程代码，代码可运行于 Solaris、FreeBSD、Linux 和其它平台，POSIX 线程是一种当然之选。

Example

// thread2.c
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <err.h>

int global_var;

void *thread_function(void *arg) {
    int i,j;
    for (i=0; i < 20; i++) {
        j = global_var;
        ++j;
        printf(".");
        fflush(stdout);
        sleep(1);
        global_var = j;
    }
    return NULL;
}
int main(void) {
    pthread_t tid;
    int i;
    if (pthread_create(&tid, NULL, thread_function, NULL))
        err(1, "pthread_create");
    for (i=0; i < 20; i++) {
        global_var = global_var + 1;
        printf("o");
        fflush(stdout);
        sleep(1);
    }
    if (pthread_join(tid, NULL)) err(1, "pthread_join");
    printf("\nglobal_var = %d\n", global_var);
    exit(0);
}

$ gcc test.c -o test -lpthread
$ ./test
..o.o.o.o.oo.o.o.o.o.o.o.o.o.o..o.o.o.o.o
myglobal = 21

man-page

PTHREADS(7)               Linux Programmer's Manual              PTHREADS(7)

NAME

       pthreads - POSIX threads

DESCRIPTION

       POSIX.1 specifies a set of interfaces (functions, header files) for
       threaded programming commonly known as POSIX threads, or Pthreads.  A
       single process can contain multiple threads, all of which are
       executing the same program.  These threads share the same global
       memory (data and heap segments), but each thread has its own stack
       (automatic variables).

       POSIX.1 also requires that threads share a range of other attributes
       (i.e., these attributes are process-wide rather than per-thread):

       -  process ID
       -  parent process ID
       -  process group ID and session ID
       -  controlling terminal
       -  user and group IDs
       -  open file descriptors
       -  record locks (see fcntl(2))
       -  signal dispositions
       -  file mode creation mask (umask(2))
       -  current directory (chdir(2)) and root directory (chroot(2))
       -  interval timers (setitimer(2)) and POSIX timers (timer_create(2))
       -  nice value (setpriority(2))
       -  resource limits (setrlimit(2))
       -  measurements of the consumption of CPU time (times(2)) and
          resources (getrusage(2))

       As well as the stack, POSIX.1 specifies that various other attributes
       are distinct for each thread, including:

       -  thread ID (the pthread_t data type)
       -  signal mask (pthread_sigmask(3))
       -  the errno variable
       -  alternate signal stack (sigaltstack(2))
       -  real-time scheduling policy and priority (sched_setscheduler(2)
          and sched_setparam(2))

       The following Linux-specific features are also per-thread:

       -  capabilities (see capabilities(7))
       -  CPU affinity (sched_setaffinity(2))

   Pthreads function return values
       Most pthreads functions return 0 on success, and an error number of
       failure.  Note that the pthreads functions do not set errno.  For
       each of the pthreads functions that can return an error, POSIX.1-2001
       specifies that the function can never fail with the error EINTR.

   Thread IDs
       Each of the threads in a process has a unique thread identifier
       (stored in the type pthread_t).  This identifier is returned to the
       caller of pthread_create(3), and a thread can obtain its own thread
       identifier using pthread_self(3).  Thread IDs are guaranteed to be
       unique only within a process.  A thread ID may be reused after a
       terminated thread has been joined, or a detached thread has
       terminated.  In all pthreads functions that accept a thread ID as an
       argument, that ID by definition refers to a thread in the same
       process as the caller.

   Thread-safe functions
       A thread-safe function is one that can be safely (i.e., it will
       deliver the same results regardless of whether it is) called from
       multiple threads at the same time.

       POSIX.1-2001 and POSIX.1-2008 require that all functions specified in
       the standard shall be thread-safe, except for the following
       functions:

           asctime()
           basename()
           catgets()
           crypt()
           ctermid() if passed a non-NULL argument
           ctime()
           dbm_clearerr()
           dbm_close()
           dbm_delete()
           dbm_error()
           dbm_fetch()
           dbm_firstkey()
           dbm_nextkey()
           dbm_open()
           dbm_store()
           dirname()
           dlerror()
           drand48()
           ecvt() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           encrypt()
           endgrent()
           endpwent()
           endutxent()
           fcvt() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           ftw()
           gcvt() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           getc_unlocked()
           getchar_unlocked()
           getdate()
           getenv()
           getgrent()
           getgrgid()
           getgrnam()
           gethostbyaddr() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           gethostbyname() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           gethostent()
           getlogin()
           getnetbyaddr()
           getnetbyname()
           getnetent()
           getopt()
           getprotobyname()
           getprotobynumber()
           getprotoent()
           getpwent()
           getpwnam()
           getpwuid()
           getservbyname()
           getservbyport()
           getservent()
           getutxent()
           getutxid()
           getutxline()
           gmtime()
           hcreate()
           hdestroy()
           hsearch()
           inet_ntoa()
           l64a()
           lgamma()
           lgammaf()
           lgammal()
           localeconv()
           localtime()
           lrand48()
           mrand48()
           nftw()
           nl_langinfo()
           ptsname()
           putc_unlocked()
           putchar_unlocked()
           putenv()
           pututxline()
           rand()
           readdir()
           setenv()
           setgrent()
           setkey()
           setpwent()
           setutxent()
           strerror()
           strsignal() [Added in POSIX.1-2008]
           strtok()
           system() [Added in POSIX.1-2008]
           tmpnam() if passed a non-NULL argument
           ttyname()
           unsetenv()
           wcrtomb() if its final argument is NULL
           wcsrtombs() if its final argument is NULL
           wcstombs()
           wctomb()

   Async-cancel-safe functions
       An async-cancel-safe function is one that can be safely called in an
       application where asynchronous cancelability is enabled (see
       pthread_setcancelstate(3)).

       Only the following functions are required to be async-cancel-safe by
       POSIX.1-2001 and POSIX.1-2008:

           pthread_cancel()
           pthread_setcancelstate()
           pthread_setcanceltype()

   Cancellation points
       POSIX.1 specifies that certain functions must, and certain other
       functions may, be cancellation points.  If a thread is cancelable,
       its cancelability type is deferred, and a cancellation request is
       pending for the thread, then the thread is canceled when it calls a
       function that is a cancellation point.

       The following functions are required to be cancellation points by
       POSIX.1-2001 and/or POSIX.1-2008:

           accept()
           aio_suspend()
           clock_nanosleep()
           close()
           connect()
           creat()
           fcntl() F_SETLKW
           fdatasync()
           fsync()
           getmsg()
           getpmsg()
           lockf() F_LOCK
           mq_receive()
           mq_send()
           mq_timedreceive()
           mq_timedsend()
           msgrcv()
           msgsnd()
           msync()
           nanosleep()
           open()
           openat() [Added in POSIX.1-2008]
           pause()
           poll()
           pread()
           pselect()
           pthread_cond_timedwait()
           pthread_cond_wait()
           pthread_join()
           pthread_testcancel()
           putmsg()
           putpmsg()
           pwrite()
           read()
           readv()
           recv()
           recvfrom()
           recvmsg()
           select()
           sem_timedwait()
           sem_wait()
           send()
           sendmsg()
           sendto()
           sigpause() [POSIX.1-2001 only (moves to "may" list in POSIX.1-2008)]
           sigsuspend()
           sigtimedwait()
           sigwait()
           sigwaitinfo()
           sleep()
           system()
           tcdrain()
           usleep() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
           wait()
           waitid()
           waitpid()
           write()
           writev()

       The following functions may be cancellation points according to
       POSIX.1-2001 and/or POSIX.1-2008:

           access()
           asctime()
           asctime_r()
           catclose()
           catgets()
           catopen()
           chmod() [Added in POSIX.1-2008]
           chown() [Added in POSIX.1-2008]
           closedir()
           closelog()
           ctermid()
           ctime()
           ctime_r()
           dbm_close()
           dbm_delete()
           dbm_fetch()
           dbm_nextkey()
           dbm_open()
           dbm_store()
           dlclose()
           dlopen()
           dprintf() [Added in POSIX.1-2008]
           endgrent()
           endhostent()
           endnetent()
           endprotoent()
           endpwent()
           endservent()
           endutxent()
           faccessat() [Added in POSIX.1-2008]
           fchmod() [Added in POSIX.1-2008]
           fchmodat() [Added in POSIX.1-2008]
           fchown() [Added in POSIX.1-2008]
           fchownat() [Added in POSIX.1-2008]
           fclose()
           fcntl() (for any value of cmd argument)
           fflush()
           fgetc()
           fgetpos()
           fgets()
           fgetwc()
           fgetws()
           fmtmsg()
           fopen()
           fpathconf()
           fprintf()
           fputc()
           fputs()
           fputwc()
           fputws()
           fread()
           freopen()
           fscanf()
           fseek()
           fseeko()
           fsetpos()
           fstat()
           fstatat() [Added in POSIX.1-2008]
           ftell()
           ftello()
           ftw()
           futimens() [Added in POSIX.1-2008]
           fwprintf()
           fwrite()
           fwscanf()
           getaddrinfo()
           getc()
           getc_unlocked()
           getchar()
           getchar_unlocked()
           getcwd()
           getdate()
           getdelim() [Added in POSIX.1-2008]
           getgrent()
           getgrgid()
           getgrgid_r()
           getgrnam()
           getgrnam_r()
           gethostbyaddr() [SUSv3 only (function removed in POSIX.1-2008)]
           gethostbyname() [SUSv3 only (function removed in POSIX.1-2008)]
           gethostent()
           gethostid()
           gethostname()
           getline() [Added in POSIX.1-2008]
           getlogin()
           getlogin_r()
           getnameinfo()
           getnetbyaddr()
           getnetbyname()
           getnetent()
           getopt() (if opterr is nonzero)
           getprotobyname()
           getprotobynumber()
           getprotoent()
           getpwent()
           getpwnam()
           getpwnam_r()
           getpwuid()
           getpwuid_r()
           gets()
           getservbyname()
           getservbyport()
           getservent()
           getutxent()
           getutxid()
           getutxline()
           getwc()
           getwchar()
           getwd() [SUSv3 only (function removed in POSIX.1-2008)]
           glob()
           iconv_close()
           iconv_open()
           ioctl()
           link()
           linkat() [Added in POSIX.1-2008]
           lio_listio() [Added in POSIX.1-2008]
           localtime()
           localtime_r()
           lockf() [Added in POSIX.1-2008]
           lseek()
           lstat()
           mkdir() [Added in POSIX.1-2008]
           mkdirat() [Added in POSIX.1-2008]
           mkdtemp() [Added in POSIX.1-2008]
           mkfifo() [Added in POSIX.1-2008]
           mkfifoat() [Added in POSIX.1-2008]
           mknod() [Added in POSIX.1-2008]
           mknodat() [Added in POSIX.1-2008]
           mkstemp()
           mktime()
           nftw()
           opendir()
           openlog()
           pathconf()
           pclose()
           perror()
           popen()
           posix_fadvise()
           posix_fallocate()
           posix_madvise()
           posix_openpt()
           posix_spawn()
           posix_spawnp()
           posix_trace_clear()
           posix_trace_close()
           posix_trace_create()
           posix_trace_create_withlog()
           posix_trace_eventtypelist_getnext_id()
           posix_trace_eventtypelist_rewind()
           posix_trace_flush()
           posix_trace_get_attr()
           posix_trace_get_filter()
           posix_trace_get_status()
           posix_trace_getnext_event()
           posix_trace_open()
           posix_trace_rewind()
           posix_trace_set_filter()
           posix_trace_shutdown()
           posix_trace_timedgetnext_event()
           posix_typed_mem_open()
           printf()
           psiginfo() [Added in POSIX.1-2008]
           psignal() [Added in POSIX.1-2008]
           pthread_rwlock_rdlock()
           pthread_rwlock_timedrdlock()
           pthread_rwlock_timedwrlock()
           pthread_rwlock_wrlock()
           putc()
           putc_unlocked()
           putchar()
           putchar_unlocked()
           puts()
           pututxline()
           putwc()
           putwchar()
           readdir()
           readdir_r()
           readlink() [Added in POSIX.1-2008]
           readlinkat() [Added in POSIX.1-2008]
           remove()
           rename()
           renameat() [Added in POSIX.1-2008]
           rewind()
           rewinddir()
           scandir() [Added in POSIX.1-2008]
           scanf()
           seekdir()
           semop()
           setgrent()
           sethostent()
           setnetent()
           setprotoent()
           setpwent()
           setservent()
           setutxent()
           sigpause() [Added in POSIX.1-2008]
           stat()
           strerror()
           strerror_r()
           strftime()
           symlink()
           symlinkat() [Added in POSIX.1-2008]
           sync()
           syslog()
           tmpfile()
           tmpnam()
           ttyname()
           ttyname_r()
           tzset()
           ungetc()
           ungetwc()
           unlink()
           unlinkat() [Added in POSIX.1-2008]
           utime() [Added in POSIX.1-2008]
           utimensat() [Added in POSIX.1-2008]
           utimes() [Added in POSIX.1-2008]
           vdprintf() [Added in POSIX.1-2008]
           vfprintf()
           vfwprintf()
           vprintf()
           vwprintf()
           wcsftime()
           wordexp()
           wprintf()
           wscanf()

       An implementation may also mark other functions not specified in the
       standard as cancellation points.  In particular, an implementation is
       likely to mark any nonstandard function that may block as a
       cancellation point.  (This includes most functions that can touch
       files.)

   Compiling on Linux
       On Linux, programs that use the Pthreads API should be compiled using
       cc -pthread.

   Linux implementations of POSIX threads
       Over time, two threading implementations have been provided by the
       GNU C library on Linux:

       LinuxThreads
              This is the original Pthreads implementation.  Since glibc
              2.4, this implementation is no longer supported.

       NPTL (Native POSIX Threads Library)
              This is the modern Pthreads implementation.  By comparison
              with LinuxThreads, NPTL provides closer conformance to the
              requirements of the POSIX.1 specification and better
              performance when creating large numbers of threads.  NPTL is
              available since glibc 2.3.2, and requires features that are
              present in the Linux 2.6 kernel.

       Both of these are so-called 1:1 implementations, meaning that each
       thread maps to a kernel scheduling entity.  Both threading
       implementations employ the Linux clone(2) system call.  In NPTL,
       thread synchronization primitives (mutexes, thread joining, and so
       on) are implemented using the Linux futex(2) system call.

   LinuxThreads
       The notable features of this implementation are the following:

       -  In addition to the main (initial) thread, and the threads that the
          program creates using pthread_create(3), the implementation
          creates a "manager" thread.  This thread handles thread creation
          and termination.  (Problems can result if this thread is
          inadvertently killed.)

       -  Signals are used internally by the implementation.  On Linux 2.2
          and later, the first three real-time signals are used (see also
          signal(7)).  On older Linux kernels, SIGUSR1 and SIGUSR2 are used.
          Applications must avoid the use of whichever set of signals is
          employed by the implementation.

       -  Threads do not share process IDs.  (In effect, LinuxThreads
          threads are implemented as processes which share more information
          than usual, but which do not share a common process ID.)
          LinuxThreads threads (including the manager thread) are visible as
          separate processes using ps(1).

       The LinuxThreads implementation deviates from the POSIX.1
       specification in a number of ways, including the following:

       -  Calls to getpid(2) return a different value in each thread.
       -  Calls to getppid(2) in threads other than the main thread return
          the process ID of the manager thread; instead getppid(2) in these
          threads should return the same value as getppid(2) in the main
          thread.
       -  When one thread creates a new child process using fork(2), any
          thread should be able to wait(2) on the child.  However, the
          implementation only allows the thread that created the child to
          wait(2) on it.
       -  When a thread calls execve(2), all other threads are terminated
          (as required by POSIX.1).  However, the resulting process has the
          same PID as the thread that called execve(2): it should have the
          same PID as the main thread.
       -  Threads do not share user and group IDs.  This can cause
          complications with set-user-ID programs and can cause failures in
          Pthreads functions if an application changes its credentials using
          seteuid(2) or similar.

       -  Threads do not share a common session ID and process group ID.

       -  Threads do not share record locks created using fcntl(2).

       -  The information returned by times(2) and getrusage(2) is per-
          thread rather than process-wide.

       -  Threads do not share semaphore undo values (see semop(2)).

       -  Threads do not share interval timers.

       -  Threads do not share a common nice value.

       -  POSIX.1 distinguishes the notions of signals that are directed to
          the process as a whole and signals that are directed to individual
          threads.  According to POSIX.1, a process-directed signal (sent
          using kill(2), for example) should be handled by a single,
          arbitrarily selected thread within the process.  LinuxThreads does
          not support the notion of process-directed signals: signals may be
          sent only to specific threads.

       -  Threads have distinct alternate signal stack settings.  However, a
          new thread's alternate signal stack settings are copied from the
          thread that created it, so that the threads initially share an
          alternate signal stack.  (A new thread should start with no
          alternate signal stack defined.  If two threads handle signals on
          their shared alternate signal stack at the same time,
          unpredictable program failures are likely to occur.)

   NPTL
       With NPTL, all of the threads in a process are placed in the same
       thread group; all members of a thread group share the same PID.  NPTL
       does not employ a manager thread.  NPTL makes internal use of the
       first two real-time signals (see also signal(7)); these signals
       cannot be used in applications.

       NPTL still has at least one nonconformance with POSIX.1:

       -  Threads do not share a common nice value.

       Some NPTL nonconformances occur only with older kernels:

       -  The information returned by times(2) and getrusage(2) is per-
          thread rather than process-wide (fixed in kernel 2.6.9).

       -  Threads do not share resource limits (fixed in kernel 2.6.10).

       -  Threads do not share interval timers (fixed in kernel 2.6.12).

       -  Only the main thread is permitted to start a new session using
          setsid(2) (fixed in kernel 2.6.16).

       -  Only the main thread is permitted to make the process into a
          process group leader using setpgid(2) (fixed in kernel 2.6.16).

       -  Threads have distinct alternate signal stack settings.  However, a
          new thread's alternate signal stack settings are copied from the
          thread that created it, so that the threads initially share an
          alternate signal stack (fixed in kernel 2.6.16).

       Note the following further points about the NPTL implementation:

       -  If the stack size soft resource limit (see the description of
          RLIMIT_STACK in setrlimit(2)) is set to a value other than
          unlimited, then this value defines the default stack size for new
          threads.  To be effective, this limit must be set before the
          program is executed, perhaps using the ulimit -s shell built-in
          command (limit stacksize in the C shell).

   Determining the threading implementation
       Since glibc 2.3.2, the getconf(1) command can be used to determine
       the system's threading implementation, for example:

           bash$ getconf GNU_LIBPTHREAD_VERSION
           NPTL 2.3.4

       With older glibc versions, a command such as the following should be
       sufficient to determine the default threading implementation:

           bash$ $( ldd /bin/ls | grep libc.so | awk '{print $3}' ) |
                           egrep -i 'threads|nptl'
                   Native POSIX Threads Library by Ulrich Drepper et al

   Selecting the threading implementation: LD_ASSUME_KERNEL
       On systems with a glibc that supports both LinuxThreads and NPTL
       (i.e., glibc 2.3.x), the LD_ASSUME_KERNEL environment variable can be
       used to override the dynamic linker's default choice of threading
       implementation.  This variable tells the dynamic linker to assume
       that it is running on top of a particular kernel version.  By
       specifying a kernel version that does not provide the support
       required by NPTL, we can force the use of LinuxThreads.  (The most
       likely reason for doing this is to run a (broken) application that
       depends on some nonconformant behavior in LinuxThreads.)  For
       example:

           bash$ $( LD_ASSUME_KERNEL=2.2.5 ldd /bin/ls | grep libc.so |
                           awk '{print $3}' ) | egrep -i 'threads|ntpl'
                   linuxthreads-0.10 by Xavier Leroy

COLOPHON

       This page is part of release 3.65 of the Linux man-pages project.  A
       description of the project, and information about reporting bugs, can
       be found at http://www.kernel.org/doc/man-pages/.

Linux                            2010-11-14                      PTHREADS(7)

futex

引用自：Linux Futex的设计与实现

1. 什么是Futex
Futex 是Fast Userspace muTexes的缩写，由Hubertus Franke, Matthew Kirkwood, Ingo Molnar and Rusty Russell共同设计完成。几位都是linux领域的专家，其中可能Ingo Molnar大家更熟悉一些，毕竟是O(1)调度器和CFS的实现者。

Futex按英文翻译过来就是快速用户空间互斥体。其设计思想其实不难理解，在传统的Unix系统中，System V IPC(inter process communication)，如 semaphores, msgqueues, sockets还有文件锁机制(flock())等进程间同步机制都是对一个内核对象操作来完成的，这个内核对象对要同步的进程都是可见的，其提供了共享的状态信息和原子操作。当进程间要同步的时候必须要通过系统调用(如semop())在内核中完成。可是经研究发现，很多同步是无竞争的，即某个进程进入互斥区，到再从某个互斥区出来这段时间，常常是没有进程也要进这个互斥区或者请求同一同步变量的。但是在这种情况下，这个进程也要陷入内核去看看有没有人和它竞争，退出的时侯还要陷入内核去看看有没有进程等待在同一同步变量上。这些不必要的系统调用(或者说内核陷入)造成了大量的性能开销。为了解决这个问题，Futex就应运而生，Futex是一种用户态和内核态混合的同步机制。首先，同步的进程间通过mmap共享一段内存，futex变量就位于这段共享的内存中且操作是原子的，当进程尝试进入互斥区或者退出互斥区的时候，先去查看共享内存中的futex变量，如果没有竞争发生，则只修改futex,而不用再执行系统调用了。当通过访问futex变量告诉进程有竞争发生，则还是得执行系统调用去完成相应的处理(wait 或者 wake up)。简单的说，futex就是通过在用户态的检查，（motivation）如果了解到没有竞争就不用陷入内核了，大大提高了low-contention时候的效率。 Linux从2.5.7开始支持Futex。

2. Futex系统调用
Futex是一种用户态和内核态混合机制，所以需要两个部分合作完成，linux上提供了sys_futex系统调用，对进程竞争情况下的同步处理提供支持。
其原型和系统调用号为
#include <linux/futex.h>
#include <sys/time.h>

int futex(int *uaddr, int op, int val, const struct timespec *timeout,
          int *uaddr2, int val3);
虽然参数有点长，其实常用的就是前面三个，后面的timeout大家都能理解，其他的也常被ignore。
uaddr就是用户态下共享内存的地址，里面存放的是一个对齐的整型计数器。
op存放着操作类型。定义的有5中，这里我简单的介绍一下两种，剩下的感兴趣的自己去man futex
FUTEX_WAIT: 原子性的检查uaddr中计数器的值是否为val,如果是则让进程休眠，直到FUTEX_WAKE或者超时(time-out)。也就是把进程挂到uaddr相对应的等待队列上去。
FUTEX_WAKE: 最多唤醒val个等待在uaddr上进程。

可见FUTEX_WAIT和FUTEX_WAKE只是用来挂起或者唤醒进程，当然这部分工作也只能在内核态下完成。有些人尝试着直接使用futex系统调用来实现进程同步，并寄希望获得futex的性能优势，这是有问题的。应该区分futex同步机制和futex系统调用。futex同步机制还包括用户态下的操作，我们将在下节提到。

3. Futex同步机制
所有的futex同步操作都应该从用户空间开始，首先创建一个futex同步变量，也就是位于共享内存的一个整型计数器。
当进程尝试持有锁或者要进入互斥区的时候，对futex执行"down"操作，即原子性的给futex同步变量减1。如果同步变量变为0，则没有竞争发生，进程照常执行。如果同步变量是个负数，则意味着有竞争发生，需要调用futex系统调用的futex_wait操作休眠当前进程。
当进程释放锁或者要离开互斥区的时候，对futex进行"up"操作，即原子性的给futex同步变量加1。如果同步变量由0变成1，则没有竞争发生，进程照常执行。如果加之前同步变量是负数，则意味着有竞争发生，需要调用futex系统调用的futex_wake操作唤醒一个或者多个等待进程。

这里的原子性加减通常是用CAS(Compare and Swap)完成的，与平台相关。CAS的基本形式是：CAS(addr,old,new),当addr中存放的值等于old时，用new对其替换。在x86平台上有专门的一条指令来完成它: cmpxchg。

可见: futex是从用户态开始，由用户态和核心态协调完成的。

4. 进/线程利用futex同步
进程或者线程都可以利用futex来进行同步。
对于线程，情况比较简单，因为线程共享虚拟内存空间，虚拟地址就可以唯一的标识出futex变量，即线程用同样的虚拟地址来访问futex变量。
对于进程，情况相对复杂，因为进程有独立的虚拟内存空间，只有通过mmap()让它们共享一段地址空间来使用futex变量。每个进程用来访问futex的虚拟地址可以是不一样的，只要系统知道所有的这些虚拟地址都映射到同一个物理内存地址，并用物理内存地址来唯一标识futex变量。

小结：

Futex变量的特征：1)位于共享的用户空间中 2)是一个32位的整型 3)对它的操作是原子的

Futex在程序low-contention的时候能获得比传统同步机制更好的性能。

不要直接使用Futex系统调用。

Futex同步机制可以用于进程间同步，也可以用于线程间同步。

线程同步

在linux中进行多线程开发，同步是不可回避的一个问题。在POSIX标准中定义了三种线程同步机制: Mutexes(互斥量), Condition Variables(条件变量)和POSIX Semaphores(信号量)。NPTL基本上实现了POSIX，而glibc又使用NPTL作为自己的线程库。因此glibc中包含了这三种同步机制的实现(当然还包括其他的同步机制，如APUE里提到的读写锁)。

信号量

sem_t *sem;

void *taskfunc(void *arg) {
    pthread_t ptid = *(pthread_t *)(arg);
    errpro(-1 == sem_wait(sem), "sem_wait"); // request
    printf("task of thread [%lu] wait semn", (unsigned long)ptid);
    sleep(2);
    int sv = 0;
    sem_getvalue(sem, &sv);
    printf("sem value = %dn", sv);
    sem_post(sem); // release
    printf("task of thread [%lu] post semn", (unsigned long)ptid);
    return NULL;
}

#define TCNT 5U // thread count

int main(int , char **) {
    sem_t s;
    sem = &s;
    errpro(-1 == sem_init(sem, 0, 2), "sem_init");
    pthread_t ptids[TCNT];

    for (int i = 0; i < 5; ++i) { // create five threads
        errpro(0 != pthread_create(&ptids[i], NULL, taskfunc, &ptids[i]),
                "pthread_create");
    }
    for (int i = 0; i < 5; ++i) {
        pthread_join(ptids[i], NULL);
    }

    errpro(-1 == sem_destroy(sem), "sem_destroy");

    return 0;
}
// 运行结果：
% ./test
task of thread [4422524928] wait sem
task of thread [4423061504] wait sem
sem value = 0
task of thread [4422524928] post sem
task of thread [4423598080] wait sem
sem value = 0
task of thread [4423061504] post sem
task of thread [4424134656] wait sem
sem value = 0
task of thread [4423598080] post sem
task of thread [4424671232] wait sem
sem value = 0
task of thread [4424134656] post sem
sem value = 0
task of thread [4424671232] post sem

posix 命名信号量

posix 命名信号量是通过内核持续的，一个进程创建一个信号灯，另外的进程可以通过该信号灯的外部名（创建信号灯使用的文件名）来访问它。posix基于内存的信号灯的持续性却是不定的，如果基于内存的信号灯是由单个进程内的各个线程共享的，那么该信号灯就是随进程持续的，当该进程终止时它也会消失。如果某个基于内存的信号灯是在不同进程间同步的，该信号灯必须存放在共享内存区中，这要只要该共享内存区存在，该信号灯就存在。

命名信号量例子:

sem_t *sem;

void *taskfunc(void *arg) {
    pthread_t ptid = *(pthread_t *)(arg);
    errpro(-1 == sem_wait(sem), "sem_wait"); // request
    printf("task of thread [%lu] wait semn", (unsigned long)ptid);
    sleep(2);
    int sv = 0;
    sem_getvalue(sem, &sv);
    printf("sem value = %dn", sv);
    sem_post(sem); // release
    printf("task of thread [%lu] post semn", (unsigned long)ptid);
    return NULL;
}

#define TCNT 5U // thread count
#define SEMPATH "./sem" // sem path

int main(int , char **) {
    sem = sem_open(SEMPATH, O_CREAT|O_EXCL, 0644, 2);
    errpro(SEM_FAILED == sem, "sem_open");
    pthread_t ptids[TCNT];

    for (int i = 0; i < 5; ++i) { // create five threads
        errpro(0 != pthread_create(&ptids[i], NULL, taskfunc, &ptids[i]),
                "pthread_create");
    }
    for (int i = 0; i < 5; ++i) {
        pthread_join(ptids[i], NULL);
    }

    errpro(-1 == sem_close(sem), "sem_close");
    errpro(-1 == sem_unlink(SEMPATH), "sem_unlink");

    return 0;
}

Mutex

Name

pthread_mutex_destroy, pthread_mutex_init - destroy and initialize a mutex

Synopsis

#include <pthread.h>

int pthread_mutex_destroy(pthread_mutex_t *mutex);
int pthread_mutex_init(pthread_mutex_t *restrict mutex,
const pthread_mutexattr_t *restrict attr);
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

Description

The pthread_mutex_destroy() function shall destroy the mutex object referenced by mutex; the mutex object becomes, in effect, uninitialized. An implementation may cause pthread_mutex_destroy() to set the object referenced by mutex to an invalid value. A destroyed mutex object can be reinitialized using pthread_mutex_init(); the results of otherwise referencing the object after it has been destroyed are undefined.

It shall be safe to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.

The pthread_mutex_init() function shall initialize the mutex referenced by mutex with attributes specified by attr. If attr is NULL, the default mutex attributes are used; the effect shall be the same as passing the address of a default mutex attributes object. Upon successful initialization, the state of the mutex becomes initialized and unlocked.

Only mutex itself may be used for performing synchronization. The result of referring to copies of mutex in calls to pthread_mutex_lock(),pthread_mutex_trylock(), pthread_mutex_unlock(), and pthread_mutex_destroy() is undefined.

Attempting to initialize an already initialized mutex results in undefined behavior.

In cases where default mutex attributes are appropriate, the macro PTHREAD_MUTEX_INITIALIZER can be used to initialize mutexes that are statically allocated. The effect shall be equivalent to dynamic initialization by a call to pthread_mutex_init() with parameter attr specified as NULL, except that no error checks are performed.

Return Value

If successful, the pthread_mutex_destroy() and pthread_mutex_init() functions shall return zero; otherwise, an error number shall be returned to indicate the error.

The [EBUSY] and [EINVAL] error checks, if implemented, act as if they were performed immediately at the beginning of processing for the function and shall cause an error return prior to modifying the state of the mutex specified by mutex.

程序实例：

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int val; // global value used by per thread

void *taskfunc(void *arg) {
    pthread_t ptid = *(pthread_t *)(arg);
    for (int i = 0; i < 2; ++i) {
        pthread_mutex_lock(&mutex);
        int v = val + 1;
        printf("task of thread [%lu] with val = [%d]n", (unsigned long)ptid,
                v);
        val = v;
        sleep(2);
        pthread_mutex_unlock(&mutex);
    }
    return NULL;
}

#define TCNT 5U // thread count

int main(int , char **) {
    pthread_t ptids[TCNT];

    for (int i = 0; i < 5; ++i) { // create five threads
        errpro(0 != pthread_create(&ptids[i], NULL, taskfunc, &ptids[i]),
                "pthread_create");
    }
    for (int i = 0; i < 5; ++i) {
        pthread_join(ptids[i], NULL);
    }


    return 0;
}

output:

% ./test
task of thread [4523405312] with val = [1]
task of thread [4523941888] with val = [2]
task of thread [4524478464] with val = [3]
task of thread [4525015040] with val = [4]
task of thread [4525551616] with val = [5]
task of thread [4523405312] with val = [6]
task of thread [4523941888] with val = [7]
task of thread [4524478464] with val = [8]
task of thread [4525015040] with val = [9]
task of thread [4525551616] with val = [10]

Introduction

Table of Contents

API