Linux系统的时间管理及优化
一直以来对Linux下的时间管理知之不详,GFree_wind在微博发起过几次Linux下时钟的讨论,和Godbach这些大牛比,我完全插不上话,因为不懂。近来闲暇时间研究了下Linux下的时间管理,分享出来,请大家指正。 从我们大白话的角度想,时间管理其实分成两部分,就像我们小时候学习物理的时候物理老师不断强调时间和时刻的区别。一个是时刻,比如现在是20:44:37秒,指的是时刻,我们手机上
一直以来对Linux下的时间管理知之不详,GFree_wind在微博发起过几次Linux下时钟的讨论,和Godbach这些大牛比,我完全插不上话,因为不懂。近来闲暇时间研究了下Linux下的时间管理,分享出来,请大家指正。
从我们大白话的角度想,时间管理其实分成两部分,就像我们小时候学习物理的时候物理老师不断强调时间和时刻的区别。一个是时刻,比如现在是20:44:37秒,指的是时刻,我们手机上看时间,指的也是时刻。另一块是时间,比如说,我每天工作八小时,再比如说,半小时之后,我要出门了,结束时间指向的是未来,但是仍然是一段时间。OK。无论是时刻还是时望间,都是需要硬件支持的,你手里只有一块最小刻度只有1秒的手表,就不要指用这块手表给百米大赛度量成绩了,何哉,硬件太挫。Linux也是如此,之所以Linux启动之后,可以精确的计时,那是因为Linux的下面有相应的硬件为依托。
RTC
RTC,real time clock,实时时钟和其他的硬件是不同的,RTC吐出来的是时刻,而其他硬件时钟吐出来的是时间。也就是说,RTC能告诉我们,当前是2013年9月12日,21:49:38,但是其他的硬件如TSC,PIT,HPET只能告诉我们,我应该走过了XX个cycle,按照我的频率,已经过去了10分钟了。
为啥RTC这么牛X,可以告诉我们当前时刻,哪怕用户关了机?以X86为例,RTC是主板上的一块CMOS芯片,哪怕你的Linux关了机,她也可以依赖主板上的电池维持时钟的准确。当然了,在Linux下,RTC存储的是UTC时间,而不会考虑timezone。
所以,Linux启动的时候,一定会拜访RTC来获得当前的时刻值,尽管精度不高(精确到秒)。When and How?
首先回答When。Linux启动的时候,start_kernel有四大time相关的函数调用:
1 init_timers();
2 hrtimers_init();
3 timekeeping_init()
4 time_init();
从RTC中读取当前的UTC时间是timekeeping_init中做的事情,调用路径如下:
timekeeping_init
|___________read_persistent_clock (arch/x86/kernel/rtc.c)
|_____x86_platform.get_wallclock()
|_____mach_get_cmos_time (arch/x86/kernel/x86_init.c)
- /************arch/x86/kernel/rtc.c*****************/
- void read_persistent_clock(struct timespec *ts)
- {
- unsigned long retval;
- retval = x86_platform.get_wallclock();
- ts->tv_sec = retval;
- ts->tv_nsec = 0;
- }
- /*****************arch/x86/kernel/x86_init.c ****************/
- struct x86_platform_ops x86_platform = {
- .calibrate_tsc = native_calibrate_tsc,
- .wallclock_init = wallclock_init_noop,
- .get_wallclock = mach_get_cmos_time,
- .set_wallclock = mach_set_rtc_mmss,
- .iommu_shutdown = iommu_shutdown_noop,
- .is_untracked_pat_range = is_ISA_range,
- .nmi_init = default_nmi_init,
- .get_nmi_reason = default_get_nmi_reason,
- .i8042_detect = default_i8042_detect,
- .save_sched_clock_state = tsc_save_sched_clock_state,
- .restore_sched_clock_state = tsc_restore_sched_clock_state,
- }
- unsigned long mach_get_cmos_time(void)
- {
- unsigned int status, year, mon, day, hour, min, sec, century = 0;
- unsigned long flags;
- spin_lock_irqsave(&rtc_lock, flags);
- /*
- * If UIP is clear, then we have >= 244 microseconds before
- * RTC registers will be updated. Spec sheet says that this
- * is the reliable way to read RTC - registers. If UIP is set
- * then the register access might be invalid.
- */
- while ((CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP))
- cpu_relax();
- sec = CMOS_READ(RTC_SECONDS);
- min = CMOS_READ(RTC_MINUTES);
- hour = CMOS_READ(RTC_HOURS);
- day = CMOS_READ(RTC_DAY_OF_MONTH);
- mon = CMOS_READ(RTC_MONTH);
- year = CMOS_READ(RTC_YEAR);
- #ifdef CONFIG_ACPI
- if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
- acpi_gbl_FADT.century)
- century = CMOS_READ(acpi_gbl_FADT.century);
- #endif
- status = CMOS_READ(RTC_CONTROL);
- WARN_ON_ONCE(RTC_ALWAYS_BCD && (status & RTC_DM_BINARY));
- spin_unlock_irqrestore(&rtc_lock, flags);
- if (RTC_ALWAYS_BCD || !(status & RTC_DM_BINARY)) {
- sec = bcd2bin(sec);
- min = bcd2bin(min);
- hour = bcd2bin(hour);
- day = bcd2bin(day);
- mon = bcd2bin(mon);
- year = bcd2bin(year);
- }
- if (century) {
- century = bcd2bin(century);
- year += century * 100;
- printk(KERN_INFO "Extended CMOS year: %d\n", century * 100);
- } else
- year += CMOS_YEARS_OFFS;
- return mktime(year, mon, day, hour, min, sec);
- }
获得更多更详细的信息。mktime是将年月日时分秒组装成1970年1月1日00:00:00这个UNIX基准时间以来的秒数。我们在Linux下可以通过一下方式获得这个值:
- root@manu:/sys/class/rtc/rtc0# date +%s ;cat /sys/class/rtc/rtc0/since_epoch
- 1379081060
- 1379081060
- #include <stdio.h>
- #include <stdlib.h>
- #include <linux/rtc.h>
- #include <fcntl.h>
- #include <sys/ioctl.h>
- int main(int argc,char *argv[])
- {
- int retval,fd;
- struct rtc_time rtc_tm;
- fd=open("/dev/rtc",O_RDONLY);
- if(fd==-1)
- {
- perror("error open /dev/rtc");
- return -1;
- }
- retval=ioctl(fd,RTC_RD_TIME,&rtc_tm);
- if(retval==-1)
- {
- perror("error exec RTC_RD_TIME ioctl");
- return -2;
- }
- printf("RTC time is %d-%d-%d %d:%d:%d \n",
- rtc_tm.tm_year+1900,rtc_tm.tm_mon,rtc_tm.tm_mday,
- rtc_tm.tm_hour,rtc_tm.tm_min,rtc_tm.tm_sec);
- close(fd);
- return 0;
- }
- root@manu:~/code/c/self/rtc# ./rtc_test
- RTC time is 2013-8-14 15:46:2
已经说过了,RTC在时间相关的硬件中是个独树一帜的奇葩,作用和其他的硬件不同。而其他的硬件只是以一定的频率产生时钟中断,帮助OS完成计时。前面我也提到过,你手里拿着个手表,就不要指望给百米大赛计时,原因就是精度太低。硬件也是如此,有精度高的有精度低的。Linux操作系统抽象出了clocksource(时钟源)来管理这些硬件。Linux会在所有的硬件时钟中选择出精度最高作为当前在用的时钟源。
如何查看当前所有的可用的时钟源已经当前在用的时钟源呢?
- manu@manu:/$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
- tsc hpet acpi_pm
- manu@manu:/$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
- tsc
PIT
PIT全称Programmable Interval Timer,是出现比较早的,比较菜的硬件。这种设备有8253/8254,对底层感兴趣的可以读drivers/clocksource/i8253.c,这种硬件的频率是1MHZ左右:
- #define PIT_TICK_RATE 1193182ul
HPET
PIT 的精度较低,HPET 被设计来替代 PIT 提供高精度时钟中断(至少 10MHz)。它是由微软和 Intel 联合开发的。一个 HPET 包括了一个固定频率的数值增加的计数器以及 3 到 32 个独立的计时器,这每一个计时器有包涵了一个比较器和一个寄存器(保存一个数值,表示触发中断的时机)。每一个比较器都比较计数器中的数值和寄存器的数值,相等就会产生中断。
HPET这个时钟源的检测和注册是在前文提到的四大初始化中的最后一个:time_init
- time_init
- |________________x86_late_time_init
- |_________x86_init.timers.timer_init (arch/x86/kernel/x86_init.c)
- |________hpet_time_init
- |_____hpet_enable
- |____hpet_clocksource_register
- |_____set_default_time_irq
- |________________tsc_init
- |________x86_platform.calibrate_tsc (x86_init.c)
- |______native_calibrate_tsc
- |___quit_pit_calibrate
- static struct irqaction irq0 = {
- .handler = timer_interrupt,
- .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
- .name = "timer"
- };
- void __init setup_default_timer_irq(void)
- {
- setup_irq(0, &irq0);
- }
- /* Default timer init function */
- void __init hpet_time_init(void)
- {
- if (!hpet_enable())
- setup_pit_timer();
- setup_default_timer_irq();
- }
- static __init void x86_late_time_init(void)
- {
- x86_init.timers.timer_init();
- tsc_init(); //TSC part
- }
- /*
- * Initialize TSC and delay the periodic timer init to
- * late x86_late_time_init() so ioremap works.
- */
- void __init time_init(void)
- {
- late_time_init = x86_late_time_init;
- }
从我的笔记本Linux的dmesg可看到:
- [ 0.664201] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
讲到这里,就不得不讲clocksource。Linux将真实的时钟做了抽象,用数据结构clocksource来管理这些硬件时钟源。
- /**
- * struct clocksource - hardware abstraction for a free running counter
- * Provides mostly state-free accessors to the underlying hardware.
- * This is the structure used for system time.
- *
- * @name: ptr to clocksource name
- * @list: list head for registration
- * @rating: rating value for selection (higher is better)
- * To avoid rating inflation the following
- * list should give you a guide as to how
- * to assign your clocksource a rating
- * 1-99: Unfit for real use
- * Only available for bootup and testing purposes.
- * 100-199: Base level usability.
- * Functional for real use, but not desired.
- * 200-299: Good.
- * A correct and usable clocksource.
- * 300-399: Desired.
- * A reasonably fast and accurate clocksource.
- * 400-499: Perfect
- * The ideal clocksource. A must-use where
- * available.
- * @read: returns a cycle value, passes clocksource as argument
- * @enable: optional function to enable the clocksource
- * @disable: optional function to disable the clocksource
- * @mask: bitmask for two's complement
- * subtraction of non 64 bit counters
- * @mult: cycle to nanosecond multiplier
- * @shift: cycle to nanosecond divisor (power of two)
- * @max_idle_ns: max idle time permitted by the clocksource (nsecs)
- * @maxadj: maximum adjustment value to mult (~11%)
- * @flags: flags describing special properties
- * @archdata: arch-specific data
- * @suspend: suspend function for the clocksource, if necessary
- * @resume: resume function for the clocksource, if necessary
- * @cycle_last: most recent cycle counter value seen by ::read()
- */
- struct clocksource {
- /*
- * Hotpath data, fits in a single cache line when the
- * clocksource itself is cacheline aligned.
- */
- cycle_t (*read)(struct clocksource *cs);
- cycle_t cycle_last;
- cycle_t mask;
- u32 mult;
- u32 shift;
- u64 max_idle_ns;
- u32 maxadj;
- #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
- struct arch_clocksource_data archdata;
- #endif
- const char *name;
- struct list_head list;
- int rating;
- int (*enable)(struct clocksource *cs);
- void (*disable)(struct clocksource *cs);
- unsigned long flags;
- void (*suspend)(struct clocksource *cs);
- void (*resume)(struct clocksource *cs);
- /* private: */
- #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
- /* Watchdog related data, used by the framework */
- struct list_head wd_list;
- cycle_t cs_last;
- cycle_t wd_last;
- #endif
- } ____cacheline_aligned;
- 1--99: 不适合于用作实际的时钟源,只用于启动过程或用于测试;
- 100--199:基本可用,可用作真实的时钟源,但不推荐;
- 200--299:精度较好,可用作真实的时钟源;
- 300--399:很好,精确的时钟源;
- 400--499:理想的时钟源,如有可能就必须选择它作为时钟源;
- static struct clocksource clocksource_hpet = {
- .name = "hpet",
- .rating = 250,
- .read = read_hpet,
- .mask = HPET_MASK,
- .flags = CLOCK_SOURCE_IS_CONTINUOUS,
- .resume = hpet_resume_counter,
- #ifdef CONFIG_X86_64
- .archdata = { .vclock_mode = VCLOCK_HPET },
- #endif
- };
ACPI_PM
这个是传说中的ACPI Power Management Time,这个其实我也知之不详,对这个感兴趣的可以去找下CU的彭东,这小子写OS,应该会经常和这种硬件纠缠不清。到了硬件驱动层,我水平基本温饱线以下。
- #define PMTMR_TICKS_PER_SEC 3579545
- 66 static struct clocksource clocksource_acpi_pm = {
- .name = "acpi_pm",
- .rating = 200,
- .read = acpi_pm_read,
- .mask = (cycle_t)ACPI_PM_MASK,
- .mult = 0, /*to be calculated*/
- .shift = 22,
- .flags = CLOCK_SOURCE_IS_CONTINUOUS,
-
- };
- fs_initcall(init_acpi_pm_clocksource)
TSC
TSC是 Time Stamp Counter。CPU 执行指令需要一个外部振荡器产生时钟信号,从 CLK 管脚输入。x86 提供了一个 TSC 寄存器,该寄存器的值在每次收到一个时钟信号时加一。比如 CPU 的主频为 1GHZ,则每一秒时间内,TSC 寄存器的值将增加 1G 次,或者说每一个纳秒加一次。x86 还提供了 rtdsc 指令来读取该值,因此 TSC 也可以作为时钟设备。TSC 提供了比 RTC 更高精度的时间,即纳秒级的时间精度。 这个很牛X,看时钟频率是和CPU的频率一个水平线的。远远超过HPET,PIT这些小鱼小虾米。看下我的笔记本的TSC 频率:
- manu@manu:~/code/c/classical/linux-3.4.61$ dmesg |grep Detected
- [ 0.004000] Detected 2127.727 MHz processor.
- manu@manu:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 37
model name : Intel(R) Core(TM) i3 CPU M 330 @ 2.13GHz
- static struct clocksource clocksource_tsc = {
- .name = "tsc",
- .rating = 300,
- .read = read_tsc,
- .resume = resume_tsc,
- .mask = CLOCKSOURCE_MASK(64),
- .flags = CLOCK_SOURCE_IS_CONTINUOUS |
- CLOCK_SOURCE_MUST_VERIFY,
- #ifdef CONFIG_X86_64
- .archdata = { .vclock_mode = VCLOCK_TSC },
- #endif
- };
- time_init
- |________________x86_late_time_init
- |_________x86_init.timers.timer_init (arch/x86/kernel/x86_init.c)
- |________hpet_time_init
- |_____hpet_enable
- |____hpet_clocksource_register
- |_____set_default_time_irq
- |________________tsc_init
- |________x86_platform.calibrate_tsc (x86_init.c)
- |______native_calibrate_tsc
- |___quit_pit_calibrate
- static int __init init_tsc_clocksource(void)
- {
- if (!cpu_has_tsc || tsc_disabled > 0 || !tsc_khz)
- return 0;
- if (tsc_clocksource_reliable)
- clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
- /* lower the rating if we already know its unstable: */
- if (check_tsc_unstable()) {
- clocksource_tsc.rating = 0;
- clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
- }
- /*
- * Trust the results of the earlier calibration on systems
- * exporting a reliable TSC.
- */
- if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
- clocksource_register_khz(&clocksource_tsc, tsc_khz);
- return 0;
- }
- schedule_delayed_work(&tsc_irqwork, 0);
- return 0;
- }
- /*
- * We use device_initcall here, to ensure we run after the hpet
- * is fully initialized, which may occur at fs_initcall time.
- */
- device_initcall(init_tsc_clocksource);
参考文献:
1 Linux manual
2 Linux Source Code 3.4.61
3 浅析 Linux 中的时间编程和实现原理,第 2 部分: 硬件和 GLibC 库的细节
4 Linux操作系统内核对RTC的编程详解
更多推荐
所有评论(0)