文件是Linux系统中的重要概念。CPU,内存,磁盘,网卡,各种输入输出设备是操作系统管理的各种资源。其中除了CPU、内存用于计算之外,其余的资源均可看成输入输出设备。磁盘作为最核心的数据持久化设备,其上的数据一般以文件形式加以管理。Linux的文件基于一个很重要的抽象概念: 虚拟文件系统。虚拟文件系统提供一组统一的文件操作接口,从而屏蔽底层硬件设备的差别。只要硬件的驱动支持虚拟文件系统的接口,那么操作系统就可以按照统一的文件操作接口来读写该硬件设备的数据(输入/输出)。
文件本身也是一种很重要的进程间通讯方式,进程以文件的方式来共享各种数据。文件锁则是一种重要的进程间同步机制。因文件锁独特的进程生存期机制,从而大大简化文件锁的管理(尤其是错误处理)。
本文从内核中的文件数据结构谈起,讲解如何使用flock/fcntl,NFS文件锁,以及一些替代的文件锁机制。
因项目需要,自行设计一套通用的文件读写锁,要求该机制能用于本地文件系统和NFS文件系统。
内核的文件数据结构
内核中有3个数据结构和文件直接相关,分别是:file descriptor
table, file table and i-node table。其中file descriptor table是进程私有的;file table和inode table是独立于进程的。进程内部的文件描述符fd只在该进程有才有意义。File
table可在进程内共享,也可在进程间共享。I-node table则在整个系统中共享。
文件描述符fd可由Open,dup和fork等操作创建;file table只能有open创建;i-node table在创建真实文件是创建。
摘自Apue-3.12
摘自Apue-14.3
摘自TLPI-5.4
对TLPI-5.4的图作如下说明:
1) A中的fd 1和20共享file table记录,可认为fd 20复制于fd 1(可也反过来)
2) A和B的fd 2共享file table记录,可认为B由A fork产生,B重定向fd 0和1。
3) File table中的0和86指向相同的文件,这意味着0和86由open相同的文件。
理解这个3个数据结构的关系大体可知道linux是如何管理文件。Stat和fcntl可获取或者修改相关的文件信息,结合这两个接口将有助于理解这3个文件结构。
文件记录锁fcntl
文件记录锁位于i-node
table这个结构中,使用PID作为锁拥有者的标识。这使其拥有如下特点:
1) 记录锁采用(PID, start,end)三元组作为锁标识,一个文件可拥有多个记录锁,同一区域只允许有一个记录锁。
2) 当进程终止(正常/不正常),该进程拥有的所有记录锁都将释放。
3) 同一个进程中,指向同一文件(i-node)的fd都可以操作该文件上的记录锁:如释放、修改等。显式调用F_UNLCK和close(fd)都将释放锁,close将释放整个文件中该进程拥有的所有记录锁。
4) 记录锁不被fork的子进程继承(PID不同)。
5) 记录锁的类型转换、改变锁范围等操作均为原子的。
6) 未设置FD_CLOEXEC时,记录锁将被exec后的进程继承(PID相同)。
7) 记录锁对文件打开mode有要求:加读锁要求fd有读权限;加写锁要求fd有写权限。
fcntl的使用示例和测试代码:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
int lock_reg(int, int, int, off_t, int, off_t);
#define read_lock(fd, offset, whence, len) \
lock_reg((fd), F_SETLK, F_RDLCK, (offset), (whence), (len))
#define readw_lock(fd, offset, whence, len) \
lock_reg((fd), F_SETLKW, F_RDLCK, (offset), (whence), (len))
#define write_lock(fd, offset, whence, len) \
lock_reg((fd), F_SETLK, F_WRLCK, (offset), (whence), (len))
#define writew_lock(fd, offset, whence, len) \
lock_reg((fd), F_SETLKW, F_WRLCK, (offset), (whence), (len))
#define un_lock(fd, offset, whence, len) \
lock_reg((fd), F_SETLK, F_UNLCK, (offset), (whence), (len))
pid_t lock_test(int, int, off_t, int, off_t);
#define is_read_lockable(fd, offset, whence, len) \
(lock_test((fd), F_RDLCK, (offset), (whence), (len)) == 0)
#define is_write_lockable(fd, offset, whence, len) \
(lock_test((fd), F_WRLCK, (offset), (whence), (len)) == 0)
int lock_reg(int fd, int cmd, int type, off_t offset, int whence, off_t len)
struct flock lock;
lock.l_type = type; /* F_RDLCK, F_WRLCK, F_UNLCK */
lock.l_start = offset; /* byte offset, relative to l_whence */
lock.l_whence = whence; /* SEEK_SET, SEEK_CUR, SEEK_END */
lock.l_len = len; /* #bytes (0 means to EOF) */
return(fcntl(fd, cmd, &lock));
/* Note: lock_test always success with the obtained lock process */
pid_t lock_test(int fd, int type, off_t offset, int whence, off_t len)
struct flock lock;
lock.l_type = type; /* F_RDLCK or F_WRLCK */
lock.l_start = offset; /* byte offset, relative to l_whence */
lock.l_whence = whence; /* SEEK_SET, SEEK_CUR, SEEK_END */
lock.l_len = len; /* #bytes (0 means to EOF) */
if (fcntl(fd, F_GETLK, &lock) < 0)
perror("fcntl error");
/* printf("F_RDLCK=%d, F_WRLCK=%d, F_UNLCK=%d\n", F_RDLCK, F_WRLCK, F_UNLCK); */
printf(" l_type=%d, l_start=%lu, l_where=%d, l_len=%lu, l_pid=%d\n"
, lock.l_type, lock.l_start, lock.l_whence, lock.l_len, lock.l_type);
if (lock.l_type == F_UNLCK)
return(0); /* false, region isn't locked by another proc */
return(lock.l_pid); /* true, return pid of lock owner */
/* ======================================================== */
void Usage(const char* program)
fprintf(stderr, "Usage: %s read/read2/write/rd-wr/wr-rd/fork/dup file time\n", program);
exit(-1);
int main(int argc, char *argv[])
if(argc < 3){
Usage(argv[0]);
int interval = 10;
if(argc > 3) {
interval = atoi(argv[3]);
printf("PID=%d\n", getpid());
char* path = argv[2];
if( strcmp(argv[1], "read") == 0 ) {
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
printf("read lock try\n");
/* read lock entire file */
if( read_lock(fd, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Read Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
sleep(interval);
exit(0);
else if( strcmp(argv[1], "read2") == 0 ){
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
printf("read lock try\n");
/* read lock entire file */
if( read_lock(fd, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Read Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
if( read_lock(fd, 0, SEEK_SET, 1) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock 2 success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "write") == 0) {
int fd = open(path, O_WRONLY );
printf("file: %s, fd=%d\n", path, fd);
if(fd == -1) {
perror("open");
exit(0);
printf("write lock try\n");
/* write lock entire file */
if( write_lock(fd, 0, SEEK_SET, 0) < 0){
perror("write_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
printf("Read Lock by pid=%d\n", lock_test(fd, F_RDLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("write lock success.\n");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
int ret = write(fd, path, strlen(path) );
if(ret != strlen(path)){
perror("write");
sleep(interval);
close(fd);
exit(0);
else if( strcmp(argv[1], "rd-wr") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* read lock entire file */
if( read_lock(fd, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
/* if( un_lock(fd, 0, SEEK_SET, 0) < 0){
perror("un-lock:");
} */
if( write_lock(fd, 0, SEEK_SET, 1) < 0){
perror("write_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("write lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
sleep(interval);
exit(0);
else if( strcmp(argv[1], "wr-rd") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd, 0, SEEK_SET, 0) < 0){
perror("write_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("write lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
/* if( un_lock(fd, 0, SEEK_SET, 0) < 0){
perror("un-lock:");
} */
if( read_lock(fd, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
sleep(interval);
exit(0);
else if( strcmp(argv[1], "fork") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd, 0, SEEK_SET, 0) < 0){
perror("write_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("write lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
pid_t pid;
if ((pid = fork()) < 0) {
perror("un-lock:");
exit(1);
else if(pid == 0){
sleep(1);
printf("New PID=%d\n", getpid());
if( read_lock(fd, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
sleep(interval);
exit(0);
else{
exit(1);
else if( strcmp(argv[1], "dup") == 0) {
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd, 0, SEEK_SET, 0) < 0){
perror("write_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("write lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd, F_WRLCK, 0, SEEK_SET, 0 ));
int fd2 = 0;
if( (fd2=dup(fd)) < 0){
perror("dup:");
exit(1);
close(fd);
if( read_lock(fd2, 0, SEEK_SET, 0) < 0){
perror("read_lock: ");
printf("Write Lock by pid=%d\n", lock_test(fd2, F_WRLCK, 0, SEEK_SET, 0 ));
exit(0);
printf("read lock success.\n");
printf("Lock by pid=%d\n", lock_test(fd2, F_WRLCK, 0, SEEK_SET, 0 ));
sleep(interval);
exit(0);
} else {
printf("Unknown action!\n");
return 0;
文件锁位于file
table这个数据结构中,一个文件可有多个file table entry(多次open),一个entry有且只有一个文件锁(共享/独占)。文件锁有如下特点:
1)每个file table entry有且只有一个文件锁,fork和dup后的fd指向同一个file table entry,故拥有同一个文件锁。这意味着文件锁可被fork后的子进程继承。
2)显式调用LOCK_UN将释放该file table entry的文件锁;当所有指向同一file table entry的fd关闭后,该file
table entry将被释放,其上的文件锁也将随之释放。
3)一个进程中多次打开同一文件将创建多个file table entry,这意味着这些entry的文件锁是独立的。
4)一个文件只能有一个文件锁,锁类型可转换,flock不保证锁类型转换是原子操作。
5)未设置FD_CLOEXEC时,记录锁将被exec后的进程继承(file
entry未释放)。
6)文件锁对文件的读写权限没有要求。
flock的使用示例和测试代码:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
/* #include <fcntl.h> */
#include <sys/file.h>
#define read_lock(fd) \
flock((fd), LOCK_SH|LOCK_NB)
#define readw_lock(fd) \
flock((fd), LOCK_SH)
#define write_lock(fd) \
flock((fd), LOCK_EX|LOCK_NB)
#define writew_lock(fd, offset, whence, len) \
flock((fd), LOCK_EX)
#define un_lock(fd) \
flock((fd), LOCK_UN)
/* ======================================================== */
void Usage(const char* program)
fprintf(stderr, "Usage: %s read/read2/write/rd-wr/wr-rd/fork/dup/unlink file time\n", program);
exit(-1);
int main(int argc, char *argv[])
if(argc < 3){
Usage(argv[0]);
int interval = 10;
if(argc > 3) {
interval = atoi(argv[3]);
printf("PID=%d\n", getpid());
char* path = argv[2];
if( strcmp(argv[1], "read") == 0 ) {
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
printf("read lock try...\n");
/* read lock entire file */
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "read2") == 0 ){
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
printf("read lock try...\n");
/* read lock entire file */
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock 2 success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "write") == 0) {
int fd = open(path, O_WRONLY );
printf("file: %s, fd=%d\n", path, fd);
if(fd == -1) {
perror("open");
exit(0);
printf("write lock try...\n");
/* write lock entire file */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
int ret = write(fd, path, strlen(path) );
if(ret != strlen(path)){
perror("write");
sleep(interval);
close(fd);
exit(0);
else if( strcmp(argv[1], "rd-wr") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* read lock entire file */
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
/* if( un_lock(fd) < 0){
perror("un-lock:");
} */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "wr-rd") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
/* if( un_lock(fd, 0, SEEK_SET, 0) < 0){
perror("un-lock:");
} */
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "fork") == 0){
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
pid_t pid;
if ((pid = fork()) < 0) {
perror("fork:");
exit(1);
else if(pid == 0){
sleep(1);
printf("New PID=%d\n", getpid());
if( read_lock(fd) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
fd = open(path, O_RDWR); /* O_RDWR */
sleep(interval);
exit(0);
else{
sleep(interval);
exit(1);
else if( strcmp(argv[1], "dup") == 0) {
int fd = open(path, O_RDWR); /* O_RDWR */
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
int fd2 = 0;
if( (fd2=dup(fd)) < 0){
perror("dup:");
exit(1);
close(fd);
if( read_lock(fd2) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "open") == 0) {
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( write_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
int fd2 = open(path, O_WRONLY);
if(fd2 < 0){
perror("open:");
exit(1);
if( read_lock(fd2) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "open2") == 0) {
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( read_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("write lock success.\n");
int fd2 = open(path, O_WRONLY);
if(fd2 < 0){
perror("open:");
exit(1);
if( write_lock(fd2) < 0){
perror("read_lock: ");
exit(0);
printf("read lock success.\n");
sleep(interval);
exit(0);
else if( strcmp(argv[1], "unlink") == 0) {
int fd = open(path, O_RDONLY);
if(fd == -1) {
perror("open");
exit(0);
/* write lock entire file */
if( read_lock(fd) < 0){
perror("write_lock: ");
exit(0);
printf("read lock success.\n");
int fd2 = open(path, O_WRONLY);
if(fd2 < 0){
perror("open:");
exit(1);
unlink(path);
sleep(interval);
exit(0);
} else {
printf("Unknown action!\n");
return 0;
NFS文件锁-fcntl
NFS3支持文件记录fcntl和O_EXCL创建文件,但不支持文件锁。NFS的文件锁是通过Network Lock Manager(lockd后台进程)这个独立程序实现的,NFS server本身依旧是无状态的,所有的锁状态均由NLM维护。从NFS4开始,锁协议融入NFS自身的协议中,锁状态由NFS server维护,这意味这NFS4是有状态的,未简化锁设计,采用租赁锁。
文件读写锁
双文件读写锁机制
双文件读写锁机制采用目标文件(datafile)和“众所周知”的临时写文件(datafile.lock),依赖于新文件的原子创建来保证进程间的互斥写操作。其本质:新文件的原子创建模拟写锁,保证有且只有一个进程能够进行写操作;多进程直接读数据,并依赖操作系统保证打开后的文件在fd关闭之前一直可读的技巧保证读不会因文件删除而中断。
缺点:1) 不支持部分修改;2)某个进程拥有“读锁”,无法保证后续进程也将拿到“读锁”(文件被移除)。3)临时文件(锁文件)遗留问题。如果临时文件(datafile.lock)未能在写操作终止后(正常/不正常)被移除,那么将阻塞其他写进程。这个问题很棘手。可通过atexit()或者顶层脚本部分解决,但两者都很难避免删除的文件不是另一个写进程新创建的临时文件。
双文件读写锁机制的程序流程如下:
改进的双文件读写锁机制
改进的双文件读写锁机制采用文件硬链接数来模拟“读锁”,硬链接大于1表示有进程拥有目标文件的读锁。
改进的双文件机制和实际的“读写锁”吻合,只有当“读锁”全部被释放后,进程才能拿到“写锁”。但是仍然无法解决“锁文件遗留问题”。
上面的流程可进一步简化,既然建立link必不可少,可让读进程直接读link文件,通过link()的原子操作保证获取“读锁”的原子性,从而简化读流程。
替代的类文件锁技术
open(file, O_CREAT | O_EXCL,...) plus
unlink(file)
open(file, O_CREAT | O_EXCL,...)获取锁;unlink(file)释放锁。
处理遗留锁文件的一个较可靠的办法:将PID写入锁文件;原子创建失败后,读取PID,kill(pid,0)检查PID是否有效,若无效,删除后,重新原子创建。(删除和创建存在“竞态”,无法避免!)
link(file, lockfile) plus unlink(lockfile)
用法:link和unlink均为原子的。每个需要加锁的进程,先创建一个临时文件,并link一个“众所周知”的“锁文件”到该目标文件。Link成功获取锁;unlink释放锁。
open(file, O_CREAT | O_TRUNC | O_WRONLY, 0)
plus unlink(file)
用法:当使用O_TRUNC
| O_WRONLY去打开一个已存在的文件,若该进程不具有写权限时,open将失败。
依赖于“锁文件”的替代文件锁机制都无法真正可靠地解决“锁文件”遗留问题。这在实际工程项目会成为一个很棘手的问题。在未找到可靠解决“锁文件”遗留问题的方案之前,还是尽可能地使用系统自带的文件锁机制(flock/fcntl)。
参考书籍:
高级UNIX环境编程
The Linux Programming Interface
相关文章:
NFS4文件锁机制探秘
NFS4.1规范研究:session