博文

目前显示的是 三月, 2012的博文

Operating System most prestigious conference

The  Symposium on Operating Systems Principles  ( SOSP ) Operating Systems Design and Implementation  (OSDI)

Mesa & Hoare Monitor

Monitor 相当于一个封闭的盒子,Monitor对里面的东西进行监控。 Java的Monitor,类似于Mesa。以前只知道Java里面同不用synchronized,但是其实每一个Object可以被当作Monitor,控制里面所有synchronized 方法,variable。 当有一个process 进入synchronized 函数后,其它的process,都会被block在外。直到,这个process运行完,或者wait,主动退出。buffer 是解释Monitor最好的例子,当一个process,get或者put时,条件不符合,然后主动wait (process 进入一个monitor waiting queue),直到有一个process调用了notify,这时有一个在 waiting queue 里面的waiting process,会醒来。进入while loop再检查一遍条件是否符合,如果符合就继续。 之所以要进入while loop再检查一遍是因为有可能调用的是notifyAll,许多process一起醒来,最先醒来的已经改变了Condition Variable(条件),当另一个process 检查的时候条件已经不符合了。 Mesa Monitor,一个process notify 以后继续执行,其它所有的waiting process都醒来,检查条件,但是只有一个会真正的执行。为了避免starvation(有的waiting process)永远抢不到,时间等的最久的可能会最先执行(增加了scheduling的概念)。Mesa Monitor 优点是,反应速度快,notifying process 不用等,继续执行 Hoare Monitor,的特点是一个process notify以后,条件肯定满足,所以不需要while loop,醒来的process直接执行下一步。 另外Mesa Monitor 有一个invariant,当有process 在monitor里面是为false,其他时候都是true。在wait前和wait return 以后invariant can be assumed,必须是hold,其它时候不能确定。也就是说当要 修改monitor状态的时候invariant 必须hold ,比如修改waiting...

MicroKernel & Exokernel 操作系统未来可能的发展

图片
MicroKernel  可以叫做微内核,有人认为传统的monolithic kernel 提供的东西太多,我们可以给内核减负。 最典型的例子就是L4 micro kernel (图中的kernel)只提供最简单的服务,比如IPC (inter process communication), scheduling, address space  剩下的都交给Server 去处理,Server 类似于传统的monolithic kernel 比如Linux, 但是需要改动,L4 把它改为L4Linux. 源代码修改,此外他们还修改了Windows XP (windows 并非完全的不开源,在授权的情况下可以看到源代码). 修改的目的,很简单:以前Server talks to hardware directly, 现在多加了一层 Kernel (L4),它处理所有的hardware interrupt, 然后作为message 传递给server, server 再传递给software. 比如浏览器process 需要一个network packet。 收到packet 以后,L4 作出interrupt,传给Server(比如L4Linux),Server 再与L4,通过IPC 请求packet内容,L4会map到L4Linux的address space,Server 再传给Software(e.g. Firefox).  再比如page fault(由于内存容量有限,有些内容被暂时swap 到硬盘,application 想用的时候发现不在了,就产生page fault) 产生,首先处理interrupt的是Kernel, Kernel sends IPC to L4Linux, L4Linux从自己manage的physical memory 里面分一个page出来,给application,并更新自己的shadow page table(不是真正的page table) .L4Linux 产生一个返回值告诉L4, 已经完成。L4 updates its true hardware page table. Micro Kernel 从设计之初的想法应该是高效的,反映速度快,因为很多东西都由u...

The Unix Time Sharing System

图片
这些系统最初都源于 Bell Lab 的Unix 在"The UNIX Time Sharing System" 有详细描述 Unix 的优势: -从1969年第一个Unix到1974年改进后的 UNIX, 到现在Linux, MacOS, 它最大的优势就是能 在很便宜的机器上运行 ("UNIX can run on hardware costing as little as $40,000",在当时对于计算机应该非常小的数目) -第二重要的特点就是它的易使用性,shell command 简单,易学 特点: 文件系统(File System): -inode (inode 指定文件的block 在硬盘上的具体位置,每个文件夹可以包含每个子文件的初始inode,注意 inode 不是文件名, inode 的数量是有限的,虽然很大,但是还是会用完的. 当寻找一个文件时"/a/b/c.txt",首先在"/"根目录寻找a,找到后,在"/a/"寻找b,最后寻找c, 这个过程中都是先看文件名,再看inode) -Access control List (Unix 主要使用ACL, 原因是,每个文件都有一个ACL, 当一个用户想要访问这个文件的ACL,如果检查后允许,这个用户就可以做指定的动作。当删除或者添加一个文件的时候,只需要添加,和删除这个文件的ACL,不需要其它操作) -Capabilities (UNIX也有Capabilities, 这个东西好比一个钥匙,可以被传递,通常情况下系统保留一张大的“表”,上面规定哪个用户可以对哪个文件干什么,但是当频繁的添加删除文件时,这个表需要频繁的更新,效率非常低。所以 UNIX不用它。只有一种情况例外,当调用system call open() 的时候 e.g (fd = open(path),'r')),系统会产生一个capability 他就是file discriptor, 这样当每次读写文件的时候就不用check ACL. 

CSE 221 review

THE, Layers Pro: -easy to debug -easy to verify -modularity abstraction Cons: -may nto be efficient (database may not use filesystem (FS cache),it talks to disk directly),you go the kernel,you introduce overhead -require careful design -ned to avoid circular dependencies The first time to design system, it's easy to use layer design, may be the second time you don't need them. -Synchronization Shared Memory Nucleus -small nucleus for supporting multiple simultaneous OS(user level process) implementation -central abstraction is the concurrent process -synchronization via message passing how is is different from Virtual Machine? -VM wants to virtual hardware, different purpose. Build different flavor of OS (batch, multi-media, real-time). Cons and Pros of message passing -clear, hard to mess up -less efficient Hydra: -Capability(reference to an object, attached to a write, e.x: file discriptor has capability with itself, from the user-point) b...

The Nucleus of a Multiprogramming System

Goal: Design a system in which mode of operatin could be changed.(This means it will make a general system to support anything) Process: The system only has two layers: Process & Nucleus ( software extension of hardware structure ) -Internal process: Execution of programs in a given storage area -external process: input/output of a given document -process communication (information transfer): message buffering  Each  process has message buffers and message queue.  We know receiver process, but semaphore doen't know who will use that. The disadvantage is also obvious, additional buffers will be used and managed -Scheduling: round-robin -Initially all resources are owned by the basic operating System S Message Passing & Semaphore are two major approaches for IPC. In Semaphore System ,all the resources are EXPOSED , processes have the knowledge of existing resource, but when they try to access them, they may be blocked. In Message Passing Syste...

Bigtable: A Distributed Storage System for Structured Data

Definition: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size Tablet: Large table broken into tablet at boundary Background: Many Google applications such as web indexing, google earth use BigTable to store data. 这些服务无论从数据大小,or latency requirement 都有很大不同 characteristics: -Simple data model: dynamic control over data layout and format -Data is indexed using row and column names - Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk . -Based on GFS to store data and log files Components: -One master server  -library linked into every client  (client communicate directly with tablet for read&write, 和GFS 原理一样) - Dynamically  added tablet server (Used to store table,each tablet server store ten to a thousand tables)

Mac OS Evolution

图片
THE EVOLUTION SHOULD BE LIKE THIS

Log Structured File System

Background: The constraint for disk performance is disk access time. However this does not improve much because of mechanical motion. Characteristics Write all information to disk sequentially in a log -Fast for crash recovery -increase write performance Challenge Always need large extend of free space to write new data Solution Run a segment cleaner process

SoftUpdate review

Background: Metadata: (directories, inode, free block maps),就是关于文件的信息,存在哪里,inode 在哪里,一个文件分为许多block, 每一个 block 都放在哪儿 Synchronous write: 局限性大,速度完全由disk speed 决定,非常慢。 Issue: System crash, such as power or system failure may result in file system inconsistence 内存中的信息可能会全部丢失。有可能一个文件在文件夹创建了,但是给它分配inode, 或者inode, 分配了,文件夹没有文件。前者更为严重。 Also, the size of metadata is small, they have strong temporal and spacial locality. Solution: Employ caching to avoid disk access and hide disk access latencies. Combine multiple update in a single write

Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm

图片
Background Major Attacks: buffer overflow attack, and other attacks such as: SQL injection/Attack (enter SQL statement which could be executed in the name field); Network HoneyPot(蜜罐): HoneyPot 是一种诱惑hacker 攻击的服务器,黑客误以为honeypot 里面有自己想要的数据。世界第一个少年黑客曾经攻击,SanDiego Supercomputer center honeyPot, 结果被捕。 HoneyPot 分为两种 low-interaction:只是模拟port, 不运行任何程序,容易扩展到很大规模 high-interaction:运行程序,成本高,每一个IP address ,都需要一个physical host. HoneyMonkey: Emulate human-being to enter malicious website Issue/Motivation: Increase scale of honeypot, while remain high fidelity CPU,memory 利用效率低,通常只有1%利用率; most address don't receive traffic most of the time; most traffic that is received causes not interesting behavior; Don't have much modification Balance high scalability(only emulate simple network) and high performance (full physical machine) Challenge: -Honeypot detection: malware can detect it is a honeypot -Resource exhaustion: under high load, difficult to maintain accurate illusion...

File Cache Size

操作系统通常会保留一块区域作为,系统 文件缓存 文件缓存再内存里,程序可以快速访问。 比如电影:可以迅速的回放,快进。 许多application 由于自身需要,会自己留出单独的缓存,比如IE 浏览器。 文件缓存和workload (系统负载)相关,刚启动的时候文件缓存,会设置的比较大。随着系统运行,内存里的东西越来越多,没有足够的空间留给文件缓存。 Ubuntu里面查看的方法很简单 free -m 其中的"Cached"就是 file cache size (文件缓存大小) 也可以自己写程序测量,方法非常简单 找一些文件大小分别为,100MB,200MB....1GB...2GB 在C里面读取文件两次, 第二次读取的时候,会发现进入某一个数值以后,读取时间会突然增加。 这个只就是file cache size 门限制

Virtual Memory Management in VAX/VMS Operating System

This is a paper review. Goal: Provide a single env for applications such as: time-sharing, real-time and batch. Also it will allow operation on different processors with different physical memory. Real-time and batch applications have the requirement of time. While time-sharing application has specific requirement for resource. Implementation: Each process has 32-bit virtual address space (21 bit for page address, 9 bits for offset that's totally 1Gbit), 当时最大的系统physical memory only has 8MB. The virtual address space is divided into pages of size 512 byte. It also has a 32-bit page table to keep the page characteristics of pages such as: protection and modified. Each region, system, program and control has their own page table. Pager & Swapper Pager is used to decide which page should reside in memory, it swaps pages in and out. Swapper swaps all the pages of a user program in and out.

Fast File System Review

今天读了  A Fast File System for UNIX  1984年 这篇文章可以说是File System 里面最经典的文章之一。现在Unix File System 也是FFS的延伸 作者发现了Old Unix System 的许多问题。做出了改进。 首先 , 以前的FS,由于硬件的限制,把block size 设计的很小,只有512KB 对于大文件的读取overhead 很大。 以读取一个2MB 的文件为例,需要四次读取。 FFS的block size 增加到2MB , 大大减小了overhead, 2MB的文件只需要一次disk access. 但是大的block size 也带来了问题,如果整个系统的文件都很小,比如只有512KB,那么会浪费空间。 解决方法是把block 分为几个fragments, 如果一个文件只占用了block的一部分,其它文件还可以继续使用。不浪费空间。 其次 , 作者观察到以前的FS,随着时间的推移,disk access time 会变得越来越大。 原因是一开始系统有一个list of free space, 随着使用越来越多,删除,添加,更新文件。这个free list 变成了一个random list. 一个文件可能被放在相隔很远的cylinder, 或者sector 里面,读取一个block 之后,disk head move 到下一个block 需要花费很长时间。 Old FS also 把inode 信息和data 分离开 FFS的解决方案是: 给出Global 和 local layout policy, 对文件或者文件夹位置的安排进行优化,尽量缩短 seek 的时间。但是也有缺点:当硬盘使用率非常高的时候,这个优化的时间会变得非常高,因此效率会大幅下降。

创世纪:诺亚方舟(学习)

今天晚上的查经,我们读了创世纪6:11章 上帝在洪水退去后,和诺亚立约,以后再也不会又洪水了毁灭世界。所以一切关于末世论,又洪水毁灭的都是谣言。 同时诺亚的儿子“含”,看到了父亲喝醉酒后裸露下体,就去告诉自己的兄弟,“闪”“雅弗”。诺亚之后诅咒含的后代,要当闪奴隶的奴隶。 虽然诺亚是一个品德高尚的人(曾经walk with God, 之前只有被上帝接走的以诺),但是醉酒是不应该的。之后还有类似的事情发生在亚伯拉罕的身上,预表主耶稣的到来,不清楚是怎么回事。 闪是黄种人的祖先,雅弗是白人祖先,含是黑人的祖先。 但我认为并不代表,黑人是要被奴役。

Google File System Review

Application driven design Issue: Current distributed file system cannot satisfy the need of Google workload. Solution & Characteristics -very large scale -inexpensive components -workload dominated by read -support multiple concurrent append (for map reduce) -large chunk 64MB, chunk replicated (-anti-crash, balance workload) -directories are an illusion, /a/b/foo take the whole string, the string does not matter, to like hashtable to find the file, this only useful for small amounts of files, large amounts of files require large hashtable which may not be able to fit into memory so Unix FS does not use it. -Architecture -master (all the requests start from master); -chunk server(using Unix FS, every it just creates a file); clients -Atomic Record Append Advantages, -Reduce client's need to interact with server (master) -Reduce network overhead, many operations on the same chunk -Reduce the size of metadata stored on master (it could be stored memory for faster...

俄罗斯抗议普京当选

图片
巾帼不让须眉

两会开始啦

中国正式公布了2012年军费预算,6700亿人民币,第一次超过了1000亿美元 李肇星对ITV 记者说: 人类一思考,上帝就发笑”。 怎么好多西方记者年年就特别盯着中国的军费问题 从他的话里面能看出许多态度 http://live.people.com.cn/note.php?id=918120302090607_ctdzb_001

Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism Review

At first this paper explains the advantage of multi-threading, and the importance of thread management. Thread 的概念很重要,一个process, 如果要reponse to user input ,同时又要运行后台程序就必须要thread,否则只有等一件事情做完了,才能做下一个。这就是所谓的sequential executation, 这样效率低,同时也阻碍了现在流行multi-core 同时一个process 里面的thread 最重要的特征就是能share 许多资源,比如memory, CPU, I/O, 方便communication Thread 分为: Kernel thread: processor 可以分配资源 ,尤其对于multi-processor,容易把thread 分配在不同的CPU, 缺点是heavy overhead. 另外a process can register a thread as kernel thread. "Kernel thread is like virtual processor" "Kernel thread is not always running, it only get opportunity to run at event such as timer interrupt" "Kernel level threads have to be checked by kernel, in case of mess up the kernel" It's one scheduling unit for kernel scheduler. User-level thread: easy management, light overhead, good average performance without heavy system call such as I/O. Also it has fast creation time. These threads are cooperative, kernel level...

最近感觉有些迷茫

上午11:00下课,回到家,能做的事情很多但是什么都觉得没有意义。 又是一个周四,早上和学弟聊实习,才恍然觉得时间过的真快,我要毕业了,想起去年的这个时候,感觉和现在的状态也差不了多少。只是当时什么都想做,没有时间。 现在什么都懒得做了,觉得没有意思。 每天刷网站,微博,twitter,看着这个信息爆炸时代的一堆让人眼迷的咨询,自己却陷入迷茫。这个时代很难享受到厚积薄发的快乐,好像现在什么事情都讲究效率。最短的时间内就要看到效率,不然就是浪费时间。 心乱了,不能平静,看到别人得到了什么感觉不错,就也想要。到头来好像不是自己追逐的东西。走的路也变成了模仿别人。生活如果是单纯的模仿是不是就没有意义了?