【跟我一起读 linux 源码 01】boot
计算机启动流程在我的上一个学习计划《自制操作系统》系列中,已经从完全不知道,过渡到了现在的了如指掌了,虽然有些夸张,但整个大体流程已经像过电影一样在我脑海里了,所以在看 linux 源码的这个 boot 部分时,几乎是看到的地方即使自己写不出,也知道它究竟在做什么,以及下一步可能要做什么,真的特别庆幸之前从零开始的折腾。计算机最初的启动原理,可以参考《硬核讲解计算机启动流程》
下好 linux 源码,我们总是想找到 main 函数开始看,但其实 main 函数之前,有三个由汇编语言写的代码才是最先被执行的,分别是 bootsect.s,setup.s,head.s。之后,才执行由 main 函数开始的用 C 语言编写的操作系统内核程序。
大致流程是这样的:
- 第一步,由 BIOS 加载 bootsect 到 0x7C00,然后由 bootsect 自己将自己复制到 0x90000
- 第二步,加载 setup 到 0x90200,然后 setup 里做一些准备,进入保护模式
- 第三步,先将 head.s 汇编成目标代码,将用 C 语言编写的内核程序编译成目标代码,然后链接成 system 模块。head 里做了些进入 main 方法前的准备,主要是重建 IDT,GDT,以及建立分页机制
三步走完后,就进入了 main 方法,此时的内存布局是这样的,同时也体现了 boot 的这三个汇编代码做了什么事
以下是三个汇编代码的源码,此时我们已经读完了 linux 源码的一个包
1 ; 2 ; SYS_SIZE is the number of clicks (16 bytes) to be loaded. 3 ; 0x3000 is 0x30000 bytes = 196kB, more than enough for current 4 ; versions of linux 5 ; 6 SYSSIZE = 0x3000 7 ; 8 ; bootsect.s (C) 1991 Linus Torvalds 9 ; 10 ; bootsect.s is loaded at 0x7c00 by the bios-startup routines, and moves 11 ; iself out of the way to address 0x90000, and jumps there. 12 ; 13 ; It then loads 'setup' directly after itself (0x90200), and the system 14 ; at 0x10000, using BIOS interrupts. 15 ; 16 ; NOTE; currently system is at most 8*65536 bytes long. This should be no 17 ; problem, even in the future. I want to keep it simple. This 512 kB 18 ; kernel size should be enough, especially as this doesn't contain the 19 ; buffer cache as in minix 20 ; 21 ; The loader has been made as simple as possible, and continuos 22 ; read errors will result in a unbreakable loop. Reboot by hand. It 23 ; loads pretty fast by getting whole sectors at a time whenever possible. 24 25 .globl begtext, begdata, begbss, endtext, enddata, endbss 26 .text 27 begtext: 28 .data 29 begdata: 30 .bss 31 begbss: 32 .text 33 34 SETUPLEN = 4 ; nr of setup-sectors 35 BOOTSEG = 0x07c0 ; original address of boot-sector 36 INITSEG = 0x9000 ; we move boot here - out of the way 37 SETUPSEG = 0x9020 ; setup starts here 38 SYSSEG = 0x1000 ; system loaded at 0x10000 (65536). 39 ENDSEG = SYSSEG + SYSSIZE ; where to stop loading 40 41 ; ROOT_DEV: 0x000 - same type of floppy as boot. 42 ; 0x301 - first partition on first drive etc 43 ROOT_DEV = 0x306 44 45 entry start 46 start: 47 mov ax,#BOOTSEG 48 mov ds,ax 49 mov ax,#INITSEG 50 mov es,ax 51 mov cx,#256 52 sub si,si 53 sub di,di 54 rep 55 movw 56 jmpi go,INITSEG 57 go: mov ax,cs 58 mov ds,ax 59 mov es,ax 60 ; put stack at 0x9ff00. 61 mov ss,ax 62 mov sp,#0xFF00 ; arbitrary value >>512 63 64 ; load the setup-sectors directly after the bootblock. 65 ; Note that 'es' is already set up. 66 67 load_setup: 68 mov dx,#0x0000 ; drive 0, head 0 69 mov cx,#0x0002 ; sector 2, track 0 70 mov bx,#0x0200 ; address = 512, in INITSEG 71 mov ax,#0x0200+SETUPLEN ; service 2, nr of sectors 72 int 0x13 ; read it 73 jnc ok_load_setup ; ok - continue 74 mov dx,#0x0000 75 mov ax,#0x0000 ; reset the diskette 76 int 0x13 77 j load_setup 78 79 ok_load_setup: 80 81 ; Get disk drive parameters, specifically nr of sectors/track 82 83 mov dl,#0x00 84 mov ax,#0x0800 ; AH=8 is get drive parameters 85 int 0x13 86 mov ch,#0x00 87 seg cs 88 mov sectors,cx 89 mov ax,#INITSEG 90 mov es,ax 91 92 ; Print some inane message 93 94 mov ah,#0x03 ; read cursor pos 95 xor bh,bh 96 int 0x10 97 98 mov cx,#24 99 mov bx,#0x0007 ; page 0, attribute 7 (normal) 100 mov bp,#msg1 101 mov ax,#0x1301 ; write string, move cursor 102 int 0x10 103 104 ; ok, we've written the message, now 105 ; we want to load the system (at 0x10000) 106 107 mov ax,#SYSSEG 108 mov es,ax ; segment of 0x010000 109 call read_it 110 call kill_motor 111 112 ; After that we check which root-device to use. If the device is 113 ; defined (!= 0), nothing is done and the given device is used. 114 ; Otherwise, either /dev/PS0 (2,28) or /dev/at0 (2,8), depending 115 ; on the number of sectors that the BIOS reports currently. 116 117 seg cs 118 mov ax,root_dev 119 cmp ax,#0 120 jne root_defined 121 seg cs 122 mov bx,sectors 123 mov ax,#0x0208 ; /dev/ps0 - 1.2Mb 124 cmp bx,#15 125 je root_defined 126 mov ax,#0x021c ; /dev/PS0 - 1.44Mb 127 cmp bx,#18 128 je root_defined 129 undef_root: 130 jmp undef_root 131 root_defined: 132 seg cs 133 mov root_dev,ax 134 135 ; after that (everyting loaded), we jump to 136 ; the setup-routine loaded directly after 137 ; the bootblock: 138 139 jmpi 0,SETUPSEG 140 141 ; This routine loads the system at address 0x10000, making sure 142 ; no 64kB boundaries are crossed. We try to load it as fast as 143 ; possible, loading whole tracks whenever we can. 144 ; 145 ; in: es - starting address segment (normally 0x1000) 146 ; 147 sread: .word 1+SETUPLEN ; sectors read of current track 148 head: .word 0 ; current head 149 track: .word 0 ; current track 150 151 read_it: 152 mov ax,es 153 test ax,#0x0fff 154 die: jne die ; es must be at 64kB boundary 155 xor bx,bx ; bx is starting address within segment 156 rp_read: 157 mov ax,es 158 cmp ax,#ENDSEG ; have we loaded all yet? 159 jb ok1_read 160 ret 161 ok1_read: 162 seg cs 163 mov ax,sectors 164 sub ax,sread 165 mov cx,ax 166 shl cx,#9 167 add cx,bx 168 jnc ok2_read 169 je ok2_read 170 xor ax,ax 171 sub ax,bx 172 shr ax,#9 173 ok2_read: 174 call read_track 175 mov cx,ax 176 add ax,sread 177 seg cs 178 cmp ax,sectors 179 jne ok3_read 180 mov ax,#1 181 sub ax,head 182 jne ok4_read 183 inc track 184 ok4_read: 185 mov head,ax 186 xor ax,ax 187 ok3_read: 188 mov sread,ax 189 shl cx,#9 190 add bx,cx 191 jnc rp_read 192 mov ax,es 193 add ax,#0x1000 194 mov es,ax 195 xor bx,bx 196 jmp rp_read 197 198 read_track: 199 push ax 200 push bx 201 push cx 202 push dx 203 mov dx,track 204 mov cx,sread 205 inc cx 206 mov ch,dl 207 mov dx,head 208 mov dh,dl 209 mov dl,#0 210 and dx,#0x0100 211 mov ah,#2 212 int 0x13 213 jc bad_rt 214 pop dx 215 pop cx 216 pop bx 217 pop ax 218 ret 219 bad_rt: mov ax,#0 220 mov dx,#0 221 int 0x13 222 pop dx 223 pop cx 224 pop bx 225 pop ax 226 jmp read_track 227 228 /* 229 * This procedure turns off the floppy drive motor, so 230 * that we enter the kernel in a known state, and 231 * don't have to worry about it later. 232 */ 233 kill_motor: 234 push dx 235 mov dx,#0x3f2 236 mov al,#0 237 outb 238 pop dx 239 ret 240 241 sectors: 242 .word 0 243 244 msg1: 245 .byte 13,10 246 .ascii "Loading system ..." 247 .byte 13,10,13,10 248 249 .org 508 250 root_dev: 251 .word ROOT_DEV 252 boot_flag: 253 .word 0xAA55 254 255 .text 256 endtext: 257 .data 258 enddata: 259 .bss 260 endbss:
bootsect.s
1 ; 2 ; setup.s (C) 1991 Linus Torvalds 3 ; 4 ; setup.s is responsible for getting the system data from the BIOS, 5 ; and putting them into the appropriate places in system memory. 6 ; both setup.s and system has been loaded by the bootblock. 7 ; 8 ; This code asks the bios for memory/disk/other parameters, and 9 ; puts them in a "safe" place: 0x90000-0x901FF, ie where the 10 ; boot-block used to be. It is then up to the protected mode 11 ; system to read them from there before the area is overwritten 12 ; for buffer-blocks. 13 ; 14 15 ; NOTE; These had better be the same as in bootsect.s; 16 17 INITSEG = 0x9000 ; we move boot here - out of the way 18 SYSSEG = 0x1000 ; system loaded at 0x10000 (65536). 19 SETUPSEG = 0x9020 ; this is the current segment 20 21 .globl begtext, begdata, begbss, endtext, enddata, endbss 22 .text 23 begtext: 24 .data 25 begdata: 26 .bss 27 begbss: 28 .text 29 30 entry start 31 start: 32 33 ; ok, the read went well so we get current cursor position and save it for 34 ; posterity. 35 36 mov ax,#INITSEG ; this is done in bootsect already, but... 37 mov ds,ax 38 mov ah,#0x03 ; read cursor pos 39 xor bh,bh 40 int 0x10 ; save it in known place, con_init fetches 41 mov [0],dx ; it from 0x90000. 42 43 ; Get memory size (extended mem, kB) 44 45 mov ah,#0x88 46 int 0x15 47 mov [2],ax 48 49 ; Get video-card data: 50 51 mov ah,#0x0f 52 int 0x10 53 mov [4],bx ; bh = display page 54 mov [6],ax ; al = video mode, ah = window width 55 56 ; check for EGA/VGA and some config parameters 57 58 mov ah,#0x12 59 mov bl,#0x10 60 int 0x10 61 mov [8],ax 62 mov [10],bx 63 mov [12],cx 64 65 ; Get hd0 data 66 67 mov ax,#0x0000 68 mov ds,ax 69 lds si,[4*0x41] 70 mov ax,#INITSEG 71 mov es,ax 72 mov di,#0x0080 73 mov cx,#0x10 74 rep 75 movsb 76 77 ; Get hd1 data 78 79 mov ax,#0x0000 80 mov ds,ax 81 lds si,[4*0x46] 82 mov ax,#INITSEG 83 mov es,ax 84 mov di,#0x0090 85 mov cx,#0x10 86 rep 87 movsb 88 89 ; Check that there IS a hd1 :-) 90 91 mov ax,#0x01500 92 mov dl,#0x81 93 int 0x13 94 jc no_disk1 95 cmp ah,#3 96 je is_disk1 97 no_disk1: 98 mov ax,#INITSEG 99 mov es,ax 100 mov di,#0x0090 101 mov cx,#0x10 102 mov ax,#0x00 103 rep 104 stosb 105 is_disk1: 106 107 ; now we want to move to protected mode ... 108 109 cli ; no interrupts allowed ; 110 111 ; first we move the system to it's rightful place 112 113 mov ax,#0x0000 114 cld ; 'direction'=0, movs moves forward 115 do_move: 116 mov es,ax ; destination segment 117 add ax,#0x1000 118 cmp ax,#0x9000 119 jz end_move 120 mov ds,ax ; source segment 121 sub di,di 122 sub si,si 123 mov cx,#0x8000 124 rep 125 movsw 126 jmp do_move 127 128 ; then we load the segment descriptors 129 130 end_move: 131 mov ax,#SETUPSEG ; right, forgot this at first. didn't work :-) 132 mov ds,ax 133 lidt idt_48 ; load idt with 0,0 134 lgdt gdt_48 ; load gdt with whatever appropriate 135 136 ; that was painless, now we enable A20 137 138 call empty_8042 139 mov al,#0xD1 ; command write 140 out #0x64,al 141 call empty_8042 142 mov al,#0xDF ; A20 on 143 out #0x60,al 144 call empty_8042 145 146 ; well, that went ok, I hope. Now we have to reprogram the interrupts :-( 147 ; we put them right after the intel-reserved hardware interrupts, at 148 ; int 0x20-0x2F. There they won't mess up anything. Sadly IBM really 149 ; messed this up with the original PC, and they haven't been able to 150 ; rectify it afterwards. Thus the bios puts interrupts at 0x08-0x0f, 151 ; which is used for the internal hardware interrupts as well. We just 152 ; have to reprogram the 8259's, and it isn't fun. 153 154 mov al,#0x11 ; initialization sequence 155 out #0x20,al ; send it to 8259A-1 156 .word 0x00eb,0x00eb ; jmp $+2, jmp $+2 157 out #0xA0,al ; and to 8259A-2 158 .word 0x00eb,0x00eb 159 mov al,#0x20 ; start of hardware int's (0x20) 160 out #0x21,al 161 .word 0x00eb,0x00eb 162 mov al,#0x28 ; start of hardware int's 2 (0x28) 163 out #0xA1,al 164 .word 0x00eb,0x00eb 165 mov al,#0x04 ; 8259-1 is master 166 out #0x21,al 167 .word 0x00eb,0x00eb 168 mov al,#0x02 ; 8259-2 is slave 169 out #0xA1,al 170 .word 0x00eb,0x00eb 171 mov al,#0x01 ; 8086 mode for both 172 out #0x21,al 173 .word 0x00eb,0x00eb 174 out #0xA1,al 175 .word 0x00eb,0x00eb 176 mov al,#0xFF ; mask off all interrupts for now 177 out #0x21,al 178 .word 0x00eb,0x00eb 179 out #0xA1,al 180 181 ; well, that certainly wasn't fun :-(. Hopefully it works, and we don't 182 ; need no steenking BIOS anyway (except for the initial loading :-). 183 ; The BIOS-routine wants lots of unnecessary data, and it's less 184 ; "interesting" anyway. This is how REAL programmers do it. 185 ; 186 ; Well, now's the time to actually move into protected mode. To make 187 ; things as simple as possible, we do no register set-up or anything, 188 ; we let the gnu-compiled 32-bit programs do that. We just jump to 189 ; absolute address 0x00000, in 32-bit protected mode. 190 191 mov ax,#0x0001 ; protected mode (PE) bit 192 lmsw ax ; This is it; 193 jmpi 0,8 ; jmp offset 0 of segment 8 (cs) 194 195 ; This routine checks that the keyboard command queue is empty 196 ; No timeout is used - if this hangs there is something wrong with 197 ; the machine, and we probably couldn't proceed anyway. 198 empty_8042: 199 .word 0x00eb,0x00eb 200 in al,#0x64 ; 8042 status port 201 test al,#2 ; is input buffer full? 202 jnz empty_8042 ; yes - loop 203 ret 204 205 gdt: 206 .word 0,0,0,0 ; dummy 207 208 .word 0x07FF ; 8Mb - limit=2047 (2048*4096=8Mb) 209 .word 0x0000 ; base address=0 210 .word 0x9A00 ; code read/exec 211 .word 0x00C0 ; granularity=4096, 386 212 213 .word 0x07FF ; 8Mb - limit=2047 (2048*4096=8Mb) 214 .word 0x0000 ; base address=0 215 .word 0x9200 ; data read/write 216 .word 0x00C0 ; granularity=4096, 386 217 218 idt_48: 219 .word 0 ; idt limit=0 220 .word 0,0 ; idt base=0L 221 222 gdt_48: 223 .word 0x800 ; gdt limit=2048, 256 GDT entries 224 .word 512+gdt,0x9 ; gdt base = 0X9xxxx 225 226 .text 227 endtext: 228 .data 229 enddata: 230 .bss 231 endbss:
setup.s
1 /* 2 * linux/boot/head.s 3 * 4 * (C) 1991 Linus Torvalds 5 */ 6 7 /* 8 * head.s contains the 32-bit startup code. 9 * 10 * NOTE!!! Startup happens at absolute address 0x00000000, which is also where 11 * the page directory will exist. The startup code will be overwritten by 12 * the page directory. 13 */ 14 .text 15 .globl _idt,_gdt,_pg_dir,_tmp_floppy_area 16 _pg_dir: 17 startup_32: 18 movl $0x10,%eax 19 mov %ax,%ds 20 mov %ax,%es 21 mov %ax,%fs 22 mov %ax,%gs 23 lss _stack_start,%esp 24 call setup_idt 25 call setup_gdt 26 movl $0x10,%eax # reload all the segment registers 27 mov %ax,%ds # after changing gdt. CS was already 28 mov %ax,%es # reloaded in 'setup_gdt' 29 mov %ax,%fs 30 mov %ax,%gs 31 lss _stack_start,%esp 32 xorl %eax,%eax 33 1: incl %eax # check that A20 really IS enabled 34 movl %eax,0x000000 # loop forever if it isn't 35 cmpl %eax,0x100000 36 je 1b 37 /* 38 * NOTE! 486 should set bit 16, to check for write-protect in supervisor 39 * mode. Then it would be unnecessary with the "verify_area()"-calls. 40 * 486 users probably want to set the NE (#5) bit also, so as to use 41 * int 16 for math errors. 42 */ 43 movl %cr0,%eax # check math chip 44 andl $0x80000011,%eax # Save PG,PE,ET 45 /* "orl $0x10020,%eax" here for 486 might be good */ 46 orl $2,%eax # set MP 47 movl %eax,%cr0 48 call check_x87 49 jmp after_page_tables 50 51 /* 52 * We depend on ET to be correct. This checks for 287/387. 53 */ 54 check_x87: 55 fninit 56 fstsw %ax 57 cmpb $0,%al 58 je 1f /* no coprocessor: have to set bits */ 59 movl %cr0,%eax 60 xorl $6,%eax /* reset MP, set EM */ 61 movl %eax,%cr0 62 ret 63 .align 2 64 1: .byte 0xDB,0xE4 /* fsetpm for 287, ignored by 387 */ 65 ret 66 67 /* 68 * setup_idt 69 * 70 * sets up a idt with 256 entries pointing to 71 * ignore_int, interrupt gates. It then loads 72 * idt. Everything that wants to install itself 73 * in the idt-table may do so themselves. Interrupts 74 * are enabled elsewhere, when we can be relatively 75 * sure everything is ok. This routine will be over- 76 * written by the page tables. 77 */ 78 setup_idt: 79 lea ignore_int,%edx 80 movl $0x00080000,%eax 81 movw %dx,%ax /* selector = 0x0008 = cs */ 82 movw $0x8E00,%dx /* interrupt gate - dpl=0, present */ 83 84 lea _idt,%edi 85 mov $256,%ecx 86 rp_sidt: 87 movl %eax,(%edi) 88 movl %edx,4(%edi) 89 addl $8,%edi 90 dec %ecx 91 jne rp_sidt 92 lidt idt_descr 93 ret 94 95 /* 96 * setup_gdt 97 * 98 * This routines sets up a new gdt and loads it. 99 * Only two entries are currently built, the same 100 * ones that were built in init.s. The routine 101 * is VERY complicated at two whole lines, so this 102 * rather long comment is certainly needed :-). 103 * This routine will beoverwritten by the page tables. 104 */ 105 setup_gdt: 106 lgdt gdt_descr 107 ret 108 109 /* 110 * I put the kernel page tables right after the page directory, 111 * using 4 of them to span 16 Mb of physical memory. People with 112 * more than 16MB will have to expand this. 113 */ 114 .org 0x1000 115 pg0: 116 117 .org 0x2000 118 pg1: 119 120 .org 0x3000 121 pg2: 122 123 .org 0x4000 124 pg3: 125 126 .org 0x5000 127 /* 128 * tmp_floppy_area is used by the floppy-driver when DMA cannot 129 * reach to a buffer-block. It needs to be aligned, so that it isn't 130 * on a 64kB border. 131 */ 132 _tmp_floppy_area: 133 .fill 1024,1,0 134 135 after_page_tables: 136 pushl $0 # These are the parameters to main :-) 137 pushl $0 138 pushl $0 139 pushl $L6 # return address for main, if it decides to. 140 pushl $_main 141 jmp setup_paging 142 L6: 143 jmp L6 # main should never return here, but 144 # just in case, we know what happens. 145 146 /* This is the default interrupt "handler" :-) */ 147 int_msg: 148 .asciz "Unknown interrupt\n\r" 149 .align 2 150 ignore_int: 151 pushl %eax 152 pushl %ecx 153 pushl %edx 154 push %ds 155 push %es 156 push %fs 157 movl $0x10,%eax 158 mov %ax,%ds 159 mov %ax,%es 160 mov %ax,%fs 161 pushl $int_msg 162 call _printk 163 popl %eax 164 pop %fs 165 pop %es 166 pop %ds 167 popl %edx 168 popl %ecx 169 popl %eax 170 iret 171 172 173 /* 174 * Setup_paging 175 * 176 * This routine sets up paging by setting the page bit 177 * in cr0. The page tables are set up, identity-mapping 178 * the first 16MB. The pager assumes that no illegal 179 * addresses are produced (ie >4Mb on a 4Mb machine). 180 * 181 * NOTE! Although all physical memory should be identity 182 * mapped by this routine, only the kernel page functions 183 * use the >1Mb addresses directly. All "normal" functions 184 * use just the lower 1Mb, or the local data space, which 185 * will be mapped to some other place - mm keeps track of 186 * that. 187 * 188 * For those with more memory than 16 Mb - tough luck. I've 189 * not got it, why should you :-) The source is here. Change 190 * it. (Seriously - it shouldn't be too difficult. Mostly 191 * change some constants etc. I left it at 16Mb, as my machine 192 * even cannot be extended past that (ok, but it was cheap :-) 193 * I've tried to show which constants to change by having 194 * some kind of marker at them (search for "16Mb"), but I 195 * won't guarantee that's all :-( ) 196 */ 197 .align 2 198 setup_paging: 199 movl $1024*5,%ecx /* 5 pages - pg_dir+4 page tables */ 200 xorl %eax,%eax 201 xorl %edi,%edi /* pg_dir is at 0x000 */ 202 cld;rep;stosl 203 movl $pg0+7,_pg_dir /* set present bit/user r/w */ 204 movl $pg1+7,_pg_dir+4 /* --------- " " --------- */ 205 movl $pg2+7,_pg_dir+8 /* --------- " " --------- */ 206 movl $pg3+7,_pg_dir+12 /* --------- " " --------- */ 207 movl $pg3+4092,%edi 208 movl $0xfff007,%eax /* 16Mb - 4096 + 7 (r/w user,p) */ 209 std 210 1: stosl /* fill pages backwards - more efficient :-) */ 211 subl $0x1000,%eax 212 jge 1b 213 xorl %eax,%eax /* pg_dir is at 0x0000 */ 214 movl %eax,%cr3 /* cr3 - page directory start */ 215 movl %cr0,%eax 216 orl $0x80000000,%eax 217 movl %eax,%cr0 /* set paging (PG) bit */ 218 ret /* this also flushes prefetch-queue */ 219 220 .align 2 221 .word 0 222 idt_descr: 223 .word 256*8-1 # idt contains 256 entries 224 .long _idt 225 .align 2 226 .word 0 227 gdt_descr: 228 .word 256*8-1 # so does gdt (not that that's any 229 .long _gdt # magic number, but it works for me :^) 230 231 .align 3 232 _idt: .fill 256,8,0 # idt is uninitialized 233 234 _gdt: .quad 0x0000000000000000 /* NULL descriptor */ 235 .quad 0x00c09a0000000fff /* 16Mb */ 236 .quad 0x00c0920000000fff /* 16Mb */ 237 .quad 0x0000000000000000 /* TEMPORARY - don't use */ 238 .fill 252,8,0 /* space for LDT's and TSS's etc */
head.s
由于此部分在上一个系列中《自制操作系统》讲得非常非常详细,而且这个只是进入操作系统内核前做的一些苦力,所以用一章的篇幅迅速带过,我们下章再见