Xie Yubo's Blog: 12/01/2006

星期五, 十二月 29, 2006

买彩票问题

同事出了一道有意思的题，不光有意思，还有一定难度，这题是这样说的：有一种博彩游戏，一张彩票上可以选择10个不重复的数字，范围从1-100，开奖时，也会摇出10个数字，如果没有一个摇出的数字与你彩票上所选的数字相同，你就赢了。问你至少需要买多少张彩票才能确保赢。

我想到了一种买法，总共需要22张彩票就可确保获胜，买法是这样的：首先，买10张，上面所选的数字分别是1-10、11-20、21-30、……、91-100，这10张彩票相当于将1-100这100个数划分为10个区，根据鸽巢原理，如果不中奖，那么每个区里必然有且仅有一个摇出的数字。。现在再买10张彩票，买法如下：第一张数字为1-4及95-100，第二张及其余八张分别为5-14、15-24、25-34、……、85-94。如果第一个区里摇出的数字小于5，那么已后每个区里摇出的数字都必须小于X5（x可为1、2、3、……、9，下同），否则第二次买的10张彩票里必有一张没有所摇出的10个数字，获胜。因此再买一张彩票，每个所选数字都是每个区中大于5的数字必可获胜。如果第一个区里摇出的数字大于等于5，那么每个区里摇出的数字也必须大于等X5，故而再买一张每个所选数字都是每个区中小于5的数必可获胜。买齐这22张选票，则无论摇出的号码是多少，总有一张彩票可以获胜。

结果后来同事给出了一种更数量更少的买法，只需要14张即可，与我的买法一样，还是先买10张，分别是1-10、11-20、21-30、……、91-100，即将100个数划成互不相临的区，根据鸽巢原理，摇出的10个数字必然在这10个区里，且每区有且只有一个数字。现在来考虑第一个摇出的数字，必然只有两种可能，要么是1-5、要么是6-10。同样，对于第二个摇出的数字，也必然只有两种可能，要么是21-25，要么是26-30。因此，同样买4张彩票，选号分别为（1-5，21-25）、（1-5、26-30）、(6-10、21-25）、（6-10、26-30），无论前两个摇出的数是多少，这4张彩票总有一张上的数字与所摇出的数字不同。比如，摇出的数字是在(1-5）及(26-30）之间，那么(6-10，21-25）这张彩票即可获胜了。

现在来证明上面的买法的确可以确保张数最少。根据鸽巢原理，所买的彩票数如果小于等于10必不能确保获胜。同样，要想所买的彩票数最少，各张彩票所选数字要尽可能不重复，故而，这10张彩票可以将100划分为10个相反不重复的区域，也即每个区域中有10个数。无论这10个数是什么样的数，总可以重新为这10个数按顺序编号为1-10、11-20、……、91-100。故而，最优解必然以上面解法的第一步开始。并且在此前提下，考虑了摇出数字所在区间的所有可能（比如，第一区中摇的数要么小于5，要么大于等于5，第二区中摇出的数也要么小于15，要么大于等于15），故而可得知上面的解法是一种最优解。证毕。

星期四, 十二月 28, 2006

《操作系统：设计与实现》（第三版）（翻译）

第1章序论 (pdf, 975kB) (2006-12-28, v0.3）
第2章 2.1 进程简介 (pdf, 526KB) (2006-12-28, v0.1)

星期四, 十二月 21, 2006

人生如梦

打电话回家，爷爷告诉我父母不在，小姨父的父亲去世了，父母看他们去了。顿时语塞，刹那间一种人生如梦的感觉闪显在眼前。老人家我见过两次，看上去身体很棒，很矍铄的老人，但没想到突然间就两世相隔，再也见不到了。

爷爷婆婆年纪都大了，每次打电话回去之前，心里总有一丝紧张，特别害怕听到从电话的那头传来什么大家都不愿听到的消息。不敢想象，万一那种情况出现，我会怎样。但愿永远也不要让我碰见这种情况。

但愿人长久，千里共婵娟~~

星期三, 十二月 20, 2006

信息检索领域相关资料 (转载)

FYI from http://net.pku.edu.cn/~webg/IR-Guide.txt
==========================================

信息检索领域相关资料 (A Guide to Information Retrieval)
Organized by Hongfei Yan
Last updated on April 19, 2006

---------------------
Contents
Books
+ Finding Out About: Search Engine Technology from a cognitive
Perspective (Belew, R.K., 2000)
http://www-cse.ucsd.edu/~rik/foa/
+ Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
+ Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
(full text)
http://www.dcs.gla.ac.uk/Keith/Preface.html
+ Information Retrieval: A Survey (Ed Greengrass, 2000)
http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
+ Information Retrieval: Data Structures & Algorithms
(Frakes, W. and Baeza-Yates, R., 1992)
http://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html
+ Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)
http://www.db.dk/pi/iri/
+ Managing Gigabytes:compressing and indexing documents and images,
2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)
+ Mining the Web: Discovering Knowledge from Hypertext Data
(Soumen Chakrabarti, 2003)
+ Modeling the Internet and the Web:
probabilistic Methods and Algorithms
(Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)
+ Modern Information Retrieval
(Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)
+ Readings in Information Retrieval.
(Sparck-Jones, K. and Willett, P., 1997)
+ Search Engine: Principle,Technology and Systems
搜索引擎-原理、技术与系统
(Xiaoming Li,et al., 2005 ), (full text)
http://sewm.pku.edu.cn/book/dlbook.html
+ The Geometry of Information Retrieval
(C.J. van Rijsbergen, 2004)
http://ir.dcs.gla.ac.uk/GeometryOfIR/
+ The Turn: Integration of Information Seeking and Retrieval in Context
(Ingwersen, P., and Jarvelin, K., 2005)
+ TREC: Experiment and Evaluation in Information Retrieval
(Voorhees, E.M., and Harman, D.K., 2005)
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10667

Conferences and Workshops
+ CIKM: Conference on Information and Knowledge Management
http://www.csee.umbc.edu/cikm/
+ SIGIR: Special Interest Group on Information Retrieval
http://www.sigir.org/
+ World Wide Web
http://www.iw3c2.org/
+ SEWM: Symposium of Search Engine and WebMining
全国搜索引擎和网上信息挖掘学术研讨会
http://net.pku.edu.cn/~sewm/

Courses
+ CMU Information Retrieval
http://nyc.lti.cs.cmu.edu/classes/11-741/ (Spring 2006)
Instructors: Jamie Callan and Yiming Yang
+ Cornell University The Structure of Information Networks (Spring 2006)
http://www.cs.cornell.edu/courses/cs685/2006sp/
Instructor: Jon Kleinberg
+ Peking University Web Based Information Architectures (Fall 2005)
http://net.pku.edu.cn/~wbia/
Instructor: Xiaoming Li, Jimin Wang and Bo Peng
+ Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)
http://www.stanford.edu/class/cs276/
Instructor: Christopher Manning and Prabhakar Raghavan
+ UIUC Introduction to Text Information Systems (Spring 2006)
http://sifaka.cs.uiuc.edu/course/498cxz06s/
Instructor: ChengXiang Zhai
+ UMass Univ. Information retrieval course (Spring 2005)
http://ciir.cs.umass.edu/cmpsci646/
Instructors: James Allan
+ Washington Univ. Search Engines course
http://courses.washington.edu/lis544/

Evaluation Resources
+ CLEF: Cross-Language Evaluation Forum
http://clef.iei.pi.cnr.it/
+ CWIRF: Chinese Web Information Retrieval Forum
http://www.cwirf.org/
+ DUC: Document Understanding Conferences
http://duc.nist.gov/
+ INEX: INitiative for the Evaluation of XML Retrieval
http://inex.is.informatik.uni-duisburg.de/
+ NTCIR: NII-NACSIS Test Collection for IR Systems
http://research.nii.ac.jp/ntcir/
+ TREC: Text REtrieval Conference
http://trec.nist.gov/

Journals
+ Briefings in Bioinformatics (full text)
http://bib.oxfordjournals.org/archive/
+ Computational Linguistics, The MIT Press
http://mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=10
+ Data & Knowledge Engineering (DKE), Elsevier
http://www.elsevier.com/wps/find/journaldescription.cws_home/505608/description?navopenmenu=-2
+ D-Lib Magazine
http://www.dlib.org/
+ Information Processing Letters, Elsevier
http://www.elsevier.com/locate/issn/00200190
+ Information Processing and Management (IP&M), Elsevier
http://www.elsevier.com/locate/infoproman
+ Information Retrieval, Springer
http://www.springer.com/sgw/cda/frontpage/0,11855,3-0-70-35744790-detailsPage%253Djournal%257Cdescription%257Cdescription,00.html
+ Information Research
http://informationr.net/ir
+ International Journal on Digital Libraries, Springer
http://link.springer.de/link/service/journals/00799/index.htm
+ International Journal of Cooperative Information Systems (IJCIS),
World Scientific
http://ejournals.wspc.com.sg/ijcis/ijcis.shtml
+ International Journal on Document Analysis and Recognition, Springer
http://link.springer.de/link/service/journals/10032/index.htm
+ International Journal of Intelligent Systems, Wiley
http://www3.interscience.wiley.com/cgi-bin/jhome/36062
+ International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
http://ejournals.wspc.com.sg/ijufks/ijufks.shtml
+ Journal of the American Society for Information Science and Technology (JASIST), Wiley
http://www3.interscience.wiley.com/cgi-bin/jhome/76501873
+ Journal of Documentation (JDoc). Emerald
http://www.emeraldinsight.com/0022-0418.htm
+ Journal of Intelligent Information Systems (JIIS), Springer
http://www.wkap.nl/journalhome.htm/0925-9902
+ Knowledge and Information Systems (KAIS), Springer
http://link.springer.de/link/service/journals/10115/index.htm
+ Natural Language Engineering, Cambridge University Press
http://www.cambridge.org/journals/journal_catalogue.asp?mnemonic=NLE
+ Transactions On Information Systems (TOIS), ACM
http://www.acm.org/tois/
+ Transactions on Knowledge and Data Engineering (TKDE), IEEE
http://www.computer.org/tkde/

List Archives
+ SIG-IRList, http://www.sigir.org/sigirlist/index.html

Organizations and Special Interest Groups
+ Cambridge NLIP, http://www.cl.cam.ac.uk/Research/NL/
+ CMU LTI, http://www.lti.cs.cmu.edu/
+ DEC laboratories in Palo Alto, Calif.
+ Glasgow Information Retrieval Group, http://www.dcs.gla.ac.uk/ir/
+ Google Labs, http://labs.google.com/
+ LTI, http://www.lti.cs.cmu.edu/
+ Massachusetts CIIR, http://ciir.cs.umass.edu/
+ MSR Asia, Web Search & Data Mining Group http://research.microsoft.com/wsm/
+ Standford InfoLab, http://infolab.stanford.edu/
+ UIUC Information Retrieval Group, http://sifaka.cs.uiuc.edu/ir/
+ 北大天网组, http://sewm.pku.edu.cn/
+ 北京大学计算语言学研究所, http://icl.pku.edu.cn/
+ 复旦大学信息检索和自然语言处理组,
http://www.cs.fudan.edu.cn/mcwil/irnlp/
+ 哈工大信息检索组, http://ir.hit.edu.cn/
#+ 清华大学智能技术与系统国家重点实验室, (fail to visit the URL)
# http://www.csai.tsinghua.edu.cn/
+ 中科院大规模内容计算组, http://159.226.40.18/

Researchers
+ ChengXiang Zhai, developing Lemur
http://www-faculty.cs.uiuc.edu/~czhai/
+ Gerard Salton
http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html
+ Karen Sparck, developing IDF
http://www.cl.cam.ac.uk/users/ksj/
+ Keith van Rijsbergen
http://www.dcs.gla.ac.uk/~keith/
+ Jamie Callan,
http://www.cs.cmu.edu/~callan/
+ Jon Kleinberg, developing HIT
http://www.cs.cornell.edu/home/kleinber/
+ Li Xiaoming, developing Tianwang & Infomall
+ Nick Craswell, developing Terabyte Track
http://research.microsoft.com/~nickcr
+ Susan Dumais, developing LSI
http://research.microsoft.com/~sdumais/
+ Yiming Yang, developing text categorization
http://www.cs.cmu.edu/~yiming/
+ Stephen Robertson,
http://research.microsoft.com/users/robertson/
+ Tefko Saracevic
http://www.scils.rutgers.edu/~tefko/
+ W. Bruce Croft
http://ciir.cs.umass.edu/personnel/croft.html

Research-related Resources
+ http://www-faculty.cs.uiuc.edu/~czhai/research.html

Software
+ Apache Lucene: a full-featured text search engine library
http://lucene.apache.org/java/docs/index.html
+ Gate: a general architecture for text engineering
http://gate.ac.uk/
+ Lemur: A full-text search engine
http://www.lemurproject.org/
+ MG: A full-text search engine
http://www.math.utah.edu/pub/mg/
+ Porter Stemmer: English stemming algorithm
http://www.tartarus.org/martin/PorterStemmer/
+ Nutch: an open source web search engine
http://sourceforge.net/projects/nutch/
+ TSE: A Tiny Search Engine
http://sewm.pku.edu.cn/src/TSE/

---------------------
References:
[1] Information Retrieval Resources, http://www.sigir.org/resources.html
[2] http://ir.dcs.gla.ac.uk/resources.html
[3] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[4] Diekemar, Information Retrieval Links, Jan. 28, 1999.
http://web.syr.edu/~diekemar/ir.html
[5] 陈鸿标，网上研习信息检索，1999年11月.
http://159.226.40.18/freshman/resources/网上研习信息检索.doc
[6] 数据挖掘研究院, http://www.dmresearch.net/
[7] 语音自然语言在线, http://www.snlpinfo.com/index.php
[8] PKU SEWM Group, http://sewm.pku.edu.cn/
[9] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[10] http://icl.pku.edu.cn/member/lisujian/maincontent.htm
[11] http://www.cs.fudan.edu.cn/mcwil/irnlp/link.htm
[12] Robert Krovetz, A Guide to the Literature of Information Retrieval,
http://159.226.40.18/freshman/resources/guide-to-ir-lit.ps
[13] ACM Digital Library,
http://portal.acm.org/portal.cfm
http://acm.lib.tsinghua.edu.cn/acm/
[14] http://www.sigir.org/proceedings/Proc-Browse.html
[15] SIGIR,
http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES278&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[16] WWW, International World Wide Web Conference
http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES968&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[17] China Digital Journal Community, http://wanfang.calis.edu.cn/wf/szhqk/index.html

---------------------

More details are listed as follows
====================
CIIR
(The Center for Intelligent Information Retrieval,
美国Massachusetts大学的智能信息检索中心)
http://ciir.cs.umass.edu/

The Center for Intelligent Information Retrieval, a National Science
Foundation-created S/IUCRC Center, is one of the leading information retrieval
research labs in the world. The CIIR develops tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

CIIR accomplishments include significant research advances in the areas of
distributed information retrieval, information filtering, topic detection,
multimedia indexing and retrieval, document image processing, terabyte
collections, data mining, summarization, resource discovery, interfaces
and visualization, and cross-lingual information retrieval.

The Center for Intelligent Information Retrieval continues to support the
emerging information infrastructure, both through research and technology
transfer. The goal of the CIIR is to develop tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

====================
Glasgow Information Retrieval Group
http://www.dcs.gla.ac.uk/ir/
由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。
这个小组理论和实践并重，旨在建造一个高效、新颖、成功的多媒体信息检索系统，
为终极用户服务。

The Information Retrieval Group led by Professor Keith van Rijsbergen has a
vigorous programme of research, based on both theory and experiment, aimed at
giving end-users novel, effective, and efficient access to the world of
multi-media information. The group, part of the Department of Computing Science,
University of Glasgow, has a strong research history in a wide area of
information retrieval research from theoretical modelling of the retrieval
process to advanced system building and to the user-oriented evaluation of
information retrieval systems. The group's interests also include many areas
of Web information retrieval such as link analysis, summarisation and the
development of novel interaction techniques (e.g., ostension, implicit feedback
and graphical visualisation). Our research preserves a strong emphasis on
the evaluation of interactive IR systems, and the group maintains strong links
with researchers in Human-Computer Interaction and Psychology.

------
Keith van Rijsbergen, http://www.dcs.gla.ac.uk/~keith/
英国格拉斯哥大学。概率IR的逻辑推理学派代表人，出版了著名的IR经典教材
INFORMATION RETRIEVAL，重点介绍用概率研究信息检的方法。

=====================
Cambridge NLIP Group
(Natural Language and Information Processing Group)
http://www.cl.cam.ac.uk/Research/NL/

Research in NLIP has been done in the Computer Laboratory for nearly fifty years.
The earliest work, by Roger Needham and Karen Sparck Jones, was on automatic
thesaurus construction, in the context of document retrieval and machine translation.
Subsequent research by Karen Sparck Jones during the 1960s and 70s focused on
statistical approaches to retrieval and included innovative work on term
weighting. From the later 1970s research in language processing developed,
with work on syntax, semantics and discourse processing,

------
Karen Sparck Jones, http://www.cl.cam.ac.uk/users/ksj/
Karen Sparck Jones has been one of the most influential figures in Computing
since the 1950’s. Her work on Information Retrieval and Natural Language Processing
has never been so central as it is are today, with its implications for
search engine technology, the semantic web and even bioinformatics.

In 1972, Karen Sparck Jones published in the Journal of Documentation the paper
which defined the term weighting scheme now known as inverse document frequency (IDF).

Karen Sparck Jones is emeritus Professor of Computers and Information at the
Computer Laboratory, University of Cambridge. She has worked in automatic
language and information processing research since the late fifties,
and has many publications including several books, most recently `Evaluating
Natural Language Processing Systems' with Julia Galliers, and `Readings in
Information Retrieval', edited with Peter Willett.

1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树，
而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。

====================
LTI
CMU (Carnegie Mellon Universit) Language Technologies Institute,
http://www.lti.cs.cmu.edu/

The Language Technologies Institute (LTI) of the School of Computer Science at
Carnegie Mellon University conducts research and provides graduate education
in all aspects of language technology and information management. The LTI was
established in 1996, as an expansion of the Center for Machine Translation
(CMT).

The Center for Machine Translation (CMT) was a research branch of the School
of Computer Science devoted to basic and applied research in all aspects of
natural language processing, with a primary focus on machine translation,
speech processing, and information retrieval. Containing a unique mix of
academic and industrial researchers specializing in various aspects of
computer science, artificial intelligence, computational linguistics and
theoretical linguistics, the CMT provided a rich and diverse environment for
collaboration among faculty, staff, visiting scholars, and qualified students.

------
Lemur Toolkit
Lemur is a collection of search engine algorithms and information retrieval
applications used for IR research, development and education. Lemur provides a
rich query language that supports search against simple texts, structured
(XML) texts, and texts annotated with part-of-speech, named-entity, and other
annotations used in NLP and text-mining applications. Lemur's search engines
comfortably support collections ranging from a few gigabytes to a few
terabytes of text. The software is distributed under open-source license, and
is used widely in the IR research community.

====================
Standford InfoLab
http://infolab.stanford.edu/

The Stanford WebBase Project
http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/

The Stanford WebBase project is investigating various issues in crawling,
storage, indexing, and querying of large collections of Web pages. The project
builds on the previous Google activity that was part of the DLI1 initiative.
The DLI2 WebBase project aims to build the necessary infrastructure to
facilitate the development and testing of new algorithms for clustering,
searching, mining, and classification of Web content.
====================
北大天网组, http://sewm.pku.edu.cn/

北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发，
技术积累深厚，综合实力和学术影响在国内一直处于领先地位。我们研发的
“天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎，从1997年10月
开始一直运行至今。“天网”在增量搜索技术、快速检索技术，海量信息存储
技术等方面都具有较强的优势，她的不断发展培育了一批批在海量网络文本
信息处理方面有实战经验的学生，受到中外IT企业的普遍欢迎。
从2001年开始，本研究组在搜索引擎技术的基础上，展开了中国互联网
信息历史的收集与存档工作，形成了“中国互联网信息博物馆”，至今已
收藏20亿在不同时期出现过的中文网页，是目前全国规模最大的历史网页收藏
与回放系统。同时，我们还尝试了在其基础上进行多学科交叉的研究。

====================
中科院大规模内容计算组
http://159.226.40.18/

信息检索小组主要针对文本信息的检索开展研究，多次参加TREC会议，
取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门
得到了广泛的应用，目前主要的研究方向包括WEB信息的获取，WEB信息检索等。
信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面，
主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层
自然语言处理等。小组研制了一系列文本信息加工处理的实验平台，目前实验
平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码
计划，其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。

====================
复旦大学信息检索和自然语言处理组,
http://www.cs.fudan.edu.cn/mcwil/irnlp/

大规模文本处理主要研究自然语言（特别是中文信息）的处理技术和方法，
包括二个方面内容：首先是基础性工作，主要是基础性的理论和算法, 包括
自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括
语料库的搜集整理等；其次是中文信息处理的应用技术，包括自动索引、
文本检索、文本摘要、文本分类和文本过滤，特别是上述技术在网络环境下
的应用。这部分工作是文本方向的研究重点。

====================
HIT-IRLab, http://ir.hit.edu.cn/

哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向
包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等，研究室以
语言分析为基础研究，以文本过滤为应用研究，以信息抽取为语言分析从
句子理解向篇章理解的延伸，以句子检索为在语言分析和篇章理解的支持
下的智能化精准检索技术。

====================
SIGIR（美国计算机学会信息检索特别兴趣小组）、
TREC（文本检索学术年会）
MUC（消息理解学术年会）
TIPSTER（美国国防部高级研究计划署的IR实践基地）

====================
北京大学计算语言学研究所
http://icl.pku.edu.cn/

北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言
信息处理的基础资源和应用技术三方面的研究。
围绕计算语言学和自然语言处理，包括如下三个主要的方向：首先基础资源
的研究与建设：计算词典学与机器词典，综合型语言知识库，语料库语言学与
语料库加工技术，术语学、术语自动提取、术语标准化研究等。其次是基础理论、
NLP的模型和方法：计算语言学基础，自然语言处理核心技术，现代汉语语法，
汉语的词/句法/语义分析，NLP统计模型，语言处理的信息论方法等。另外是
应用技术：机器翻译的方法、技术与系统实现，信息检索与提取，自然语言
信息处理系统的评价方法和技术，受限汉语及其辅助写作系统，中国古诗词计算机
辅助研究等。

====================
#清华大学智能技术与系统国家重点实验室 (fail to visit the URL)
#http://www.csai.tsinghua.edu.cn/

智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月
对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究，
包括智能信息处理、机器学习、智能控制，以及神经网络理论等，还从事与
人工智能有关的应用技术与系统集成技术的研究，主要有智能机器人、声音、
图形、图像、文字及语言处理等。

================
Susan Dumais,
http://research.microsoft.com/~sdumais/

I am interested in algorithms and interfaces for improved information
retrieval, as well as general issues in and human-computer interaction. I
joined Microsoft Research in July 1997. I work on a wide variety of
information access and management issues, including: personal information
management, web search, question answering, information retrieval, text
categorization, collaborative filtering, interfaces for improved search and
navigation, and user/task modeling.

Prior to coming to Microsoft, I worked on a statistical method for
concept-based retrieval known as Latent Semantic Indexing. You can find
pointers to this work on the Bellcore (now Telcordia) LSI page.

===============
UIUC Information Retrieval Group
http://sifaka.cs.uiuc.edu/ir/

The Information Retrieval (IR) group is part of the Database and Information
Systems (DAIS) Lab of the Computer Science Department at University of
Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the
general area of text information management, including retrieval,
organization, filtering , and mining of textual information, aiming at
developing advanced text information management techniques and systems that
help people make better use of text information.

------
ChengXiang Zhai,
http://www-faculty.cs.uiuc.edu/~czhai/

Research Interests: Information Retrieval, Text Mining, Natural Language
Processing, Bioinformatics

University of Illinois at Urbana-Champaign, is recognized for
his work on user-centered, adaptive intelligent information access. His
techniques expect to improve search-engine performance, support better
information organization and enable understanding of large volumes of
information. Zhai's work in information retrieval is expected to enhance
curricula and provide new educational tools for the growing information
technology workforce.

===============
Stephen Robertson,
http://research.microsoft.com/users/robertson/

Stephen Robertson joined Microsoft Research Cambridge in April 1998.

In 1998, he was awarded the Tony Kent STRIX award by the Institute of
Information Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.
He is a Fellow of Girton College, Cambridge.

At Microsoft, he runs a group called Information Retrieval and Analysis, which
is concerned with core search processes such as term weighting, document
scoring and ranking algorithms, and combination of evidence from different
sources. These are studied theoretically through the use of formal models,
mainly statistical, and statistical methods including machine learning
methods, and experimentally, through activities such as the Text Retrieval
Conference (TREC) and with internally generated evaluation sets. The group
(with its Keenbow evaluation environment) has had some excellent results at
TREC. The group works closely with product groups to transfer ideas and
techniques.

His main research interests are in the design and evaluation of retrieval
systems. He is the author, jointly with Karen Sparck Jones, of a probabilistic
theory of information retrieval, which has been moderately influential. A
further development of that model, with Stephen Walker, led to the term
weighting and document ranking function known as Okapi BM25, which is used in
many experimental text retrieval systems.

Prior to joining Microsoft, he was at City University London, where he retains
a part-time position as Professor of Information Systems in the Department of
Information Science (homepage). He was Head of Department for eight years,
during which time it achieved the highest possible rating in two successive
research assessment exercises. He also started the Centre for Interactive
Systems Research, the main research vehicle of which is the Okapi text
retrieval system, which has also done well at TREC.

Before joining City, he was a research fellow at University College London,
where he took his PhD in the School of Library Archive and Information
Studies. Before that he was in the research department at Aslib. He has an MSc
in Information Science from City and a first degree in mathematics from
Cambridge.

===================
Nick Craswell
http://research.microsoft.com/~nickcr

I am an associate researcher at Microsoft Research Cambridge, in the
Information Retrieval and Analysis Group.

Research Overview

I am interested in Web search evaluation, mostly on enterprise-scale webs but
also the World Wide Web. I built the VLC, VLC2, WT2g and .GOV test
collections, which have been made available to research groups around the
world. David Hawking and I coordinated the TREC Web Track experiments. I am
currently involved in the TREC Terabyte Track and Enterprise Track. Some
publications: Book chapter preprint (pdf), IR'01 (citeseer) and CSIRO'01
(pdf).

I also work on effective Web search, which means making use of information in
pages, link structure and URL structure to generate more useful Web search
results. Some papers: SIGIR'05 (pdf), SIGIR'01 (pdf), TOIS'03 (pdf) (copying
is by permission of ACM, Inc.) and ADCS'03 (pdf).

My PhD was in distributed information retrieval (thesis pdf) which means
building a system on top of multiple engines/databases that already exist. My
recent work in the area has considered whether (or when) DIR is really
practical. Some papers: ADC'99 (ps), DL'00 (pdf), ADC'03 (pdf) and ADC'04
(pdf).

===============
Web Search & Data Mining Group of MSR Asia
http://research.microsoft.com/wsm/

The goal of the Web Search & Data Mining Group of MSR Asia is to drive the
next generation of Web search by leveraging data mining, machine learning, and
knowledge discovery techniques for information analysis, organization,
retrieval, and visualization. In addition, in contrast with current Web search
methods, which essentially do document-level ranking and retrieval, the Web
Search & Data Mining Group has created search at the object level to bring
increased knowledge and intelligence to users.

A Glimpse at Several Core Innovations:

Large-scale Experimental Web Search Platform

The Web Search & Data Mining Group is creating a large scale search platform
to efficiently store, parse, index and search billions of Web pages and other
types of documents. The search platform is flexible enough to allow for
testing of various state-of-the-art search techniques that have been created
at the lab using new technologies.

Structuralizing the Web

The biggest challenge facing both users and search engines over the next
several decades is the continued unstructured growth of the Internet. As such,
search functions that can effectively and efficiently dig out
machine-understandable information and knowledge layers from unorganized and
unstructured Web data will be the key to supporting relevant search results.
To meet this challenge, the group is exploring technologies, namely Web
information extraction, deep Web mining, and Web structure mining that can
automatically classify structures and extract objects from the Web. The
information and knowledge gathered using these new techniques greatly improves
the performance of current Web search and even facilitates the creation of
more sophisticated next generation search technologies.

Vertical Search

Today's conventional search engines can be described as page-level search
engines whose main function is to rank web pages according to their relevance
to a given query. Driving the future of the search industry are functions that
delve deeper into vertical domains to provide knowledge and intelligence to
query results. At MSR Asia, the Web Search & Data Mining Group is addressing
the greatest challenges faced by vertical search including large scale web
classification, object-level information extraction, object identification and
integration, and object relationship mining and ranking. The results of these
efforts are leading to more advanced search engines that deliver intelligence
and insight to search results.

Mobile Search

The explosive growth of new computing devices such as handheld computers,
Windows Mobile-based PocketPCs, and SmartPhones is driving demand for greater
and more efficient information access. These devices, which leverage the power
of the Web and allow greater access to information than ever before, are still
not capable of performing at the level of a desktop PC. At MSR Asia, the Web
Search & Data Mining Group is inventing new technologies to improve the mobile
search and browsing experience and deliver the capabilities of a PC to users
of these new devices. Project initiatives include developing innovative
presentation schemes and user interfaces to facilitate search and browsing
tasks on mobile devices and developing context aware search technologies to
address the special information needs of mobile users.

Multimedia Search

The Web Search & Data Mining Group is conducting research into new
technologies that index multimedia content such as images, videos, and audio.
Through content analysis and advanced visualization techniques, the group is
transforming today's conventional text based search engines to include
multimedia content thus delivering more intelligent search results to users.
For example, the group recently developed a new multimedia news reader which
mines large archival news databases presenting text, map information, images,
and background music within a unique user interface providing readers with a
more efficient news search engine and a more enjoyable reading experience.

------
Wei-Ying Ma
http://research.microsoft.com/users/wyma/

Senior Researcher, Research Manager, Microsoft Research Asia

Dr. Wei-Ying Ma received the B.S. degree in electrical engineering from the
National Tsing Hua University in Taiwan in 1990, and the M.S. and Ph.D.
degrees in electrical and computer engineering from the University of
California at Santa Barbara in 1994 and 1997, respectively. From 1994 to 1997
he was engaged in the Alexandria Digital Library (ADL) project in UCSB while
completing his Ph.D. He developed a web-based image retrieval system called
Netra which has been frequently cited by other researchers and is regarded as
one of the most representative image retrieval systems. From 1997 to 2001, he
was with HP Labs where he worked in the field of multimedia adaptation and
distributed media services infrastructure. He joined Microsoft Research Asia
in 2001. Since then, he has been leading a research group to conduct research
in the areas of information retrieval, web search, data mining, mobile
browsing, and multimedia management. He currently serves as an Editor for the
ACM/Springer Multimedia Systems Journal and Associate Editor for ACM
Transactions on Information System (TOIS). He has served on the organizing and
program committees of many international conferences including ACM Multimedia,
ACM SIGIR, ACM CIKM, WWW, ICME, CVPR, SPIE Multimedia Storage and Archiving
Systems, SPIE Multimedia Communication and Networking, etc. He is also the
general co-chair of International Multimedia Modeling (MMM) Conference 2005
and International Conference on Image and Video Retrieval (CIVR) 2005. He has
published 5 book chapters and over 100 international journal and conference
papers.

====================
Google Labs
http://labs.google.com/

Google Labs is a playground for Google engineers and adventurous Google users.
Google staffers with wild and crazy ideas post their prototypes on Google Labs
and solicit feedback on how the technology could be used or improved. None of
these experiments are guaranteed to make it onto Google.com, as this is really
the first phase in the development process. Google users with a desire to jump
over the cutting edge are invited to check out any or all of the posted
prototypes and send their comments directly to the Googlers who developed
them. Please, remember to wear your safety goggles while using this site.

Labs.google.com, Google's technology playground.
Google labs showcases a few of our favorite ideas that aren't quite ready for
prime time. Your feedback can help us improve them. Please play with these
prototypes and send your comments directly to the Googlers who developed them.

Want to learn more about Google technology? Here are some papers.
http://labs.google.com/papers/index.html

Passionate about these topics? You should work at Google.
algorithms, artificial intelligence, compiler optimization,
computer architecture, computer graphics,
data compression, data mining, file system design,
genetic algorithms, information retrieval,
machine learning, natural language processing, operating systems,
profiling, robotics,
text processing, user interface design,
web information retrieval, and more!

http://www.google.com/press/podium.html
Google Press Center: The Google Podium
Here you'll find a selection of public presentations made by Google
executives. From time to time, we will continue to add transcripts, audio or
video clips and links to presentations hosted elsewhere.

====================
Jon Kleinberg
http://www.cs.cornell.edu/home/kleinber/

Professor of Computer Science, Cornell University

My research is concerned with algorithms that exploit the combinatorial
structure of networks and information. My recent work has included
* link analysis and modeling of the World Wide Web and related information networks;
* discrete optimization and network algorithms; and
* algorithmic approaches to clustering, indexing, and data mining.
====================

星期二, 十二月 19, 2006

Google推出了域名注册服务

https://www.google.com/a/

每年10美元，注册后可以与你的Gmail、Google Talk、Google Calendar、Google Page Creator绑定，并表示可以提供完全的域名管理。目前看是与www.godaddy.com合作的，后者是全球知名的顶级域名服务商。

每年10美元是比较贵的，不过算上这些服务，还是比较超值的。

星期日, 十二月 17, 2006

Google AdSense上一个有趣的“计算错误”

11.41 + 0.16 = 11.56? 1 + 6 = 6? 显然，Google这是一个小数点后四舍五入上的问题。11.41与0.16都应当是“入”后的结果，这样，简单的用11.41与0.16相加，就是11.57，但是计算了两次“入”。而按原始数据算，只会“入”一次，故而是11.56。

不过，从Google的显示界面，极易误导用户认为其总收入是前面几项分列收入之和。对于Google如此大一个公司来说，这点细节上的处理失识是不应当的。

按这种生法，男女比例会升高吗？

也是前几天吃中饭的时候，聊到中国男女比例异常的问题，同事出了一道有意思的概率问题：假定生男生女的概率一样，如果每户人都想要且只想要一个男孩（也就是说如果没生出男孩就继续生，直到生出男孩为止），最后的男女比例会是多少？

同事给出了一个巧妙的思路：第一次，有一半家庭生男，另一半家庭生女，因此，这时的男女比例是1/1；第二次，前一次生男的家庭不生了，而另一半生女的家庭继续生，这次仍然是一半生男、一半生女，故而男女比例还是1/1；第三次，是第二次仍然生女的家庭继续生，同样是有一半生男、另一半生女……，由此可见男女的比例总是一样的。

星期五, 十二月 15, 2006

If a function have more than one exit, we must prevent reentered.

Today I find a bug in my codes. The bug is caused by that a function has been reentered. The function's structure looks like this:


void foo()
{
   while(true){
      // give a chance to handle message
      if(true == PeekMessage(...)){
          ...
          TranslateMessage(...);
          DispatchMessage(...);
          ...
      }

      // do other things
      ...
   }
}

In the above codes, we give a chance to handle messages in foo(), so there is another exit of this function. And the message's handler maybe call foo() again. If is it, the foo() function will be reentered. That means the function would be called again, even the first call doesn't finished. So if a function have more than one exit, we must prevent reentered. See the updated code:


void foo()
{
   static bFinished = true;

   if(!bFinished)
       return;

   bFinished = false;

   while(true){
      // give a chance to handle message
      if(true == PeekMessage(...)){
          ...
          TranslateMessage(...);
          DispatchMessage(...);
          ...
      }

      // do other things
      ...
   }

   bFinished = true;
}

如此掷骰子公平吗？

前两天中午吃饭，一个同事出了一道很有意思的概率问题，题目是这样的：有一种赌局，庄家拿了两个骰子出来，每次由参赌者随意掷这两个骰子，如果同时掷出两个6，那么庄家输；如果连掷18手，都没有同时掷出两个6来，参赌者输。问这种赌局公平吗？

咋一看，同时掷出两个6来的可能性是：1/6 * 1/6 = 1/36，连掷18次，那么总的概率就是：1/36 * 18 = 1/2，这样看来，此赌局是公平的。

现在我们还一种思路，上面我们考虑的是参赌者赢的概率，现在我们考虑一下庄家赢的概率。因为：参赌者赢的概率 = 1 - 庄家赢的概率。如果庄家要赢，那么也就是说连续18次都没有掷出两个6来，每一次没掷出两个6的概率是：(1 - 1/36)，那么连续18次都没掷出来的概率就是：(1 - 1/36) ^ 18，可以把这个式子用麦克劳仑公式展开为：1 - 1/36 * 18 + O() = 1/2 + O() (后面是一个正的高阶无穷小），因此，庄家赢的概率更大一些。

那么怎么解释最先一次计算中的错误呢？倒底是哪儿算错了呢？其实在第一次计算中，犯了一个很隐蔽的问题。我们用1/36 * 18，也就是认为，每次概率都是1/36，其实并不是这样的。考虑一下，为什么我们会掷第二次？是因为第一次没有掷出两个6来，也就是说，如果第二次掷出两个6的概率是1/36，那么第一次的掷出两个6个概率就是0（也即第一次没有可能掷出两个6来，不然就不会再掷第二次了），但是，我们再计算的时候却用的是乘法，也就是说我们认为第二次掷出两个6的概率是1/36时，第一次的概率还是1/36，这就计算多了。

所以，如果我们要按参赌者赢的概率来算的话，那么应当这样计算：参赌者赢的概率 = 第一次掷出两个6来的概率 + 第一次未掷出两个6来而第二次掷出两个6来的概率 + 第一次未掷出来第二次也未掷出来第三次掷出两个6来的概率 + …… = 1/36 + 35/36 * 1/36 + 35/36 * 35/36 * 1/36 + ……

很明显，按庄家赢的概率来算要简单得多。对于一个问题，如果都能从两个方面考虑并比较一下从两方面计算的结果，定能发现其中隐藏着的不少问题。

星期四, 十二月 14, 2006

Another way to implement a singleton class in C++

A singleton class is very useful when you only need one instance of the class. The other day I took a training, and the speaker introduce an implement of a singleton class. See the codes:

class A{
    public:
        static A* GetInstance(){
             if( !_self ){
                 _self = new A();
             }
             return _self;
        }

        static void Release(){
            if( _self) delete _self;
        }
        ...

    private:
        static A* _self;
        A(){};
     ...
};

A* A::_self = 0;

It's a smart code! But I don't think it's perfect. The above codes need the user to manage the memory. The user must decide when to release the singleton object. Let's consider this scenario. We use a singleton object to refer the whole application object. So, the life time of this singleton object must last from the application begin to run to the application exit. When do the user have the chance to release the object? And the above codes return the singleton object's pointer. I don't think it's a good action. If the user hold more than one pointers of the singleton, and delete one of them, but forget to release others. Later, the user uses some of these left pointers, the application will crash.

I think we can use another way to implement the above singleton class. I don't like say too much :), let's see the codes:

class A{
    public:
        static A& GetInstance(){
             static A _a;
             
             return _a;
        }
        ...

    private:
        A(){};
     ...
};

星期日, 十二月 10, 2006

这日子过得...

上午：15个饺子
中午：20个云吞
晚上：20个饺子（预计）

上帝快来救我吧~~~

外来的和尚为什么不会念经？

Yahoo中国总裁谢文离职、MSN中国总经理罗川离职、微软中国区互联网技术部总经理宫力离职、Google中国总裁周韶宁离职……，2006年的岁末似乎成了IT业界跨国公司总裁最为伤心的季节。Yahoo、Microsoft、Google，这几个名字在全球IT业界都如雷贯耳的超级跨国公司，在中国却几乎无一例外的陷入了经营不善的怪圈，让总裁老总们如走马灯似的换来换去，却似乎总未找到问题的根源。联想到先前sohu总裁张朝阳那句掷地有声的“著名”论断：Google在中国不会成功，大家是似乎真的有种“外来的和尚不会念经”的想法。为什么外来的和尚就不会念经呢？

其实这里面有一个很重要的原因，外来的和尚还没有意识到，来到中国，就应当念中国的经。他们总是认为把自己本国的经翻译成中文然后拿到中国来念就可以了，其实这完完全全错了。东西方几千年来所积累下来的巨大的文化差异，绝对不是靠短短一二十年的改革开放就能弥补得了的。因此，虽然经书已经翻译为中文了，甚至还翻译得相当流畅，但是，这些经书里面描述的都是国外的生活，描述的完全是中国广大所谓的“草根”完全不熟悉的东东，而整个中国互联网最底层的根基就是数以千万计的“草根”阶层。国外的和尚，并不知道这个“草根”阶层想从经书中得到什么，当然就不知道他们从国外带来的翻译为中文的经根本就满足不了“草根”的需要，他们不知道在中国应当去找一本中国的经来念。失去了中国互联网的根基，还想在中国互联网分一羹粥，这不笑话吗？

很多国外企业的中国公司，仅仅是在做一些本地化的东东。然而他们所做的所谓的“本地化”并不是真正的安下心来研究本地的文化沉积，研究本地用户的需要，针对这些需要来推出完全贴合本地用户的真正的本地化产品。相反，大多数时候，这样的本地化仅仅是翻译工作，将英文翻译为中文，然后就号称完成了“本地化”。这样的产品当然无法满足中国用户的需要。一个不能满足用户需要的产品能不失败么？

国内的中国分公司，很少在商业上有决定权，仍何大的商业方案，都需要得到总部的批准与同意。而总部的这些有商业决定权的老外们，由于文化差异、意识形态差异、法律认知差异，或者从维护公司在全球范围内有个统一形象等方面，往往会觉得中国分公司提出的商业方案是不能接受的。这种矛盾无法得到较好的解决，也就导致了没有商业决策权的中国分公司在中国出现各种各样的经营上的问题。这才使得中国成为无论在资本、管理、人材、口碑还是技术上都占尽优势的跨国公司的滑铁卢。从易趣被淘宝打得大败，MSN Messenger、ICQ被QQ赶尽杀绝，Google Search被Baidu抢了大半江上……，如此种种，可见一般。

当然，在国内市场上的失利，并不代表进入中国是失败的，相反，从整体上讲，进入中国还是让这些跨国企业赚了个叮铛响。越来越多的跨国企业将开发的心移到中国，而在自己国内强化设计团队，中国大学毛入学率全球第一，劳动力素质日渐提高，国家强制性的劳动福利非常低廉，再加上中国的低物价因素，使得中国的劳动力成本相当低，劳动力的性价比全球第一。在欧美，雇用一个员工所需的综合成本，在中国可以雇用十个同等水平的员工。正是这一原因，促带中国成为了全球加工厂，将产品让中国的员工开发，拿到欧美销售是目前不少跨国IT企业的中国分公司的主要做为。

目前，中国有不少IT企业也已经跨出国门，跨国公司在中国所遇到的这种怪圈值得所有中国企业借鉴。不要将在国内的成功经验照搬去它国，而应当给在它国的分公司足够的商业自主权，认真研究本地文化及用户的需求，才有可能获得成功。

Resources About Write OS

Operation System (OS) is the foundation of the computer science, at least is the foundation of computer software. And a lot of people want to know how to write their own OS (especially someone comes from China). I'm also very interested in this subject. There are some articles I wroted and a few materials I collected and considered there were very useful. If some of them can do somebody a favour, I'll be happy :).

Hardware

Software

Microsoft Virtual PC 2004 (free)
...

Useful Links

OSRC: The Operating System Resource Center
A excellent website with a lot of resources about how to write OS.
PureC Forum
This is also a good website with a lot of resources. But unlike OSRC, not each link is a valuable link. The language of the website is Chinese.

星期一, 十二月 04, 2006

第一次证券投资以失败告终

在当今这个资本社会，觉得还是有点投资经历比较好:D。于是乎上星期就去开了一个股票户头，存入500元，买了100股的工商银行。今天早上一来看见涨了一毛，一算大约能赚10块钱，兴喜若狂，赶紧卖了。结果就后发现居然亏了2块多钱，仔细一想，原来是忘了考虑手续费了，冤啊~~

看来想搞投资是得交交学费，呵呵，不过第一次投资就以亏本告终，看来我似乎果然不适宜搞这个行当啊~~;)

为了纪念 ——《纯C论坛·电子杂志》

为了纪念

——谨以此纪念《纯C论坛·电子杂志》

…… ……

偶然间，在计算机内整理文档的时候，又翻出了当年胎死腹中的《纯C论坛·电子杂志》第五期的原稿，一时间，一种久违的感觉又泛上心头。不由得想写些文字来纪念当年这个自己为之投入过相当心血又为自己带来相当快乐的东西。然而几度起笔，却又几度划去，实在不知当从何说起，又当说些什么，也许这各中滋味只有当年的各位编辑能从内心中体会到。

《纯C论坛·电子杂志》从2004年9月开始策划，2004年10月出第一期，2004年11月出第二期，后改为双月刊并与CSDN合作，2005年1月出第三期，2005年3月出第四期，2005年5月第五期因故停出，至此，整个杂志完全停刊。从开始到现在，不想一晃竟过去一年有半载了。

杂志的发展也经历了一个很明显的起浮，从第一期、第二期，杂志的发展势头良好，到第三期接近于一个顶峰，随后第四期、第五期又顺着这个顶峰下来了。原因有很多，各位编辑时间及精力有限，未有新鲜血液补弃；国内潜心于低层技术研究的群体太少，稿源艰难；与CSDN合作后，未能取长补短，最后甚至失去了原有的一些自身特点等等，都是此杂志最终停刊的主要原因。

不过，无论如何，这本曾经存在的杂志，也见证了当年那段人生中的岁月，现想起来，点点滴滴，一一清晰在目。世间没有任何事物可以永恒，结束，也是新的开始。

这次，我将《纯C论坛·电子杂志》已经正式出版的一至四期，再补上后来胎死腹中还未正式发行的第五期，做成一个合订本，献给我们的编辑，献给关心我们的亲爱的读者，共同纪念那段已然逝去的岁月。

寥寥数笔，语无伦次，谨以此作为对往事的纪念，让我们为往事干杯，为那段岁月干杯。

谢煜波（iamxiaohan）

2006.1.11 于成都

第五期目录
============================================================================
【卷首语】
纯真，才有希望刘挺 i

【 C与C++ 】
C Develops Endless Future 孙志岗 1
Boost源码笔记：Boost::multi_array 谢轩 5

【编译原理】
词法分析自动生成器CScanGenner的设计与实现谢煜波 17
基于Flex的c/c++代码加亮工具 Tinyfool 28

【病毒研究】
解读经典样本EICAR SwordLea 32

【技术资料】
AS86的MAN hold(译) 34

【代码分析】
一段代码的分析谢煜波 39

【编辑部通迅】
投稿指南编辑部 I

总目录
============================================================================
【卷首语】
开篇孙志岗 1-1
我心目中的程序员赵志刚 2-1
信息时代的挑战与机遇车万翔 3-1
我眼中的计算机科学王宏志 4-1
纯真，才有希望刘挺 5-i
【计算机体系结构】
听大牛们谈未来的体系结构研究方向（一）王凯峰(译) 2-4
听大牛们谈未来的体系结构研究方向（二）王凯峰(译) 4-1
【编译原理】
工欲善其事，必先利其器——lex和yacc工具介绍高立琦 1-4
连接器和加载器（Linkers And Loaders）刘彦博（译） 1-67
命令行计算器的实现高立琦 3-68
词法分析自动生成器CScanGenner的设计与实现谢煜波 5-17
基于Flex的c/c++代码加亮工具 Tinyfool 5-28
【算法理论】
ACM/ICPC 试题解析熊蜀光 1-13
第29届ACM国际大学生程序设计竞赛亚洲区北京赛区预选赛试题题解（一）哈工大ACM/ICPC组织 2-94
A题 Finding Nemo解题报告宋鑫莹 2-95
B题 Searching the Web解题报告刘禹 2-97
C题 Argus解题报告肖颖 2-98
H题 The Separator in Grid解题报告刘子阳 2-99
推箱子游戏的自动求解 hellwolf 3-91
车辆牌照识别系统的预处理算法刘鹏翔 4-59
【病毒研究】
WinXP SP2对病毒和加密技术的影响 Killer 1-20
解读经典样本EICAR SwordLea 5-32
【 C与C++ 】
剖析Intel IA32架构下C语言及CPU浮点数机制谢煜波 1-26
浅析C语言函数传递机制及对变参函数的处理谢煜波 2-75
标准C＋＋类std::string的内存共享和Copy-On-Write技术陈皓 3-75
深度探索编译器安全检查杨新开(译) 3-82
对《浅析C语言函数传递机制及对变参函数的处理》的一点更正谢煜波 3-89
C Develops Endless Future 孙志岗 5-1
Boost源码笔记：Boost::multi_array 谢轩 5-5
【网络安全】
Linux 下SOCK_RAW的原理和应用肖颖 1-61
【操作系统】
操作系统引导探究（Version 0.02）谢煜波 1-41
Linux核心（The Linux Kernel）（英汉对译）（一）毕昕（等） 2-7
Linux核心（The Linux Kernel）（英汉对译）（二）毕昕（等） 2-18
Linux核心（The Linux Kernel）（英汉对译）（三）毕昕（等） 2-27
操作系统概念（第六版）（译）（一）吕建鹏 2-39
保护模式下8259A芯片编程及中断处理探究（上）（Version 0.02）谢煜波 2-57
保护模式下8259A芯片编程及中断处理探究（下）（Version 0.02）谢煜波 2-65
《操作系统概念》第二章计算机系统结构吕建鹏 3-2
Linux核心第3章内存管理毕昕（等） 3-23
《操作系统概念》第三章——操作系统结构吕建鹏 4-4
Pyos中软盘驱动、DMA及文件系统的实现(上) 谢煜波 4-31
【数据库原理】
MyBase?物理存储结构的设计赵锴 4-67
【系统设计】
设计一个十分简单的16位CPU 黄海 4-72
Hello China的体系结构 Garry 4-92
【代码分析】
一段代码的分析谢煜波 5-39
【技术资料】
用GDB调试程序陈皓 3-47
跟我一起写Makefile 陈皓 4-118
AS86的MAN hold(译) 5-34
【论坛视点】
全新的操作系统概念纯C论坛网友 4-168
【特稿】
ACM/ICPC 计算机算法大赛简介熊蜀光 1-10
有空的时候，多读读书吧王凯峰 1-58
【研究方向综述】
现代信息检索技术简介车万翔 1-82
【编辑部通迅】
投稿指南本刊编辑部 1-85
读者俱乐部本刊编辑部 1-87
本期信息汇总本刊编辑部 1-88
SP1版对原版的修订说明本刊编辑部 1-89
关于将本刊改为双月刊的决定本刊编辑部 2-101
投稿指南本刊编辑部 2-102
读者俱乐部（含最新勘误表）本刊编辑部 2-104
本期信息汇总本刊编辑部 2-105
SP1版对原版的修订说明本刊编辑部 2-106
投稿指南本刊编辑部 3-98
投稿指南本刊编辑部 4-170
勘误表本刊编辑部 4-172
投稿指南本刊编辑部 5-I

下载

第一期 (pdf) (源代码)
第二期 (pdf) (源代码）
第三期 (pdf) (源代码）
第四期 (pdf) (源代码）
第五期 (pdf) (源代码）

星期日, 十二月 03, 2006

星期五, 十二月 01, 2006

终于把《创世纪》看完了

很早就听说《创世纪》的大名，上个星期碰见当当打折，终于买来看完了。不过我总觉得这部片子虽然拍得还算不错，但里面的硬伤也不伤，与TVB史上类似题财的经典大剧《义不容情》等比起来还是要差不少。

首先，最大的硬伤就是片里所表现出来的香港司法系统极其混乱，虽然大英法律的宗旨是宁纵莫枉，但也绝不是如片里一般的糊涂，就拿最后张自力逼叶荣添拿高水根来交换证据这一幕来讲，高水根最后生亡，但警察居然对张叶二人基本上没有任何盘问，高水根死后躺在叶荣添车的后备箱中，并且被叶开了一枪，那么整个后备箱中找不到高待过的痕迹，找不到高的血迹么？发生枪击案，难道不需要追查枪的来源及去向么？叶把换回的证据交到法厅，居然不需要向法官讲清楚证据获得的来源，获得证据的途径是否合法，法官就凭什么“专家认定了证据的真实性”做出裁决，真是糊涂之极。再有前面，张自力一拳将许文彪的哥哥打下天台，警官居然认为他是自杀，难道尸体上没有被打过的痕迹么？许文彪的姐姐被张自力掐死，难道尸检会看不出来是被人掐死的？再前面一个八挂记者被许文彪指使去报社放火，造成三死七伤，如此重大的事故居然事后没有检查记者的遗物，没有调查处理任何人，没有调查起火原因，而是认为什么几位死者是死于“不幸”，最后叶荣添已经在警局认罪，道出事情真相，不过警方依然没有做进一步的调查而是草草以另一个罪名起诉荣添，如此之多的糊涂之举，实在让影片真实性大大折扣。

其次，人物形象并不十分突出，几个主要人物的性格都不太明显，没有给人留下特别深刻的印象，情节发展没有什么悬念，也就是围绕个人恩怨展开，报仇成为了主要的话题，社会性，主题立意明显不高。主要人物之间的感情纠葛也处理的较为简单，特别是对于田宁、张自力、马自强几人，对高美娜、叶荣添的感情处理上完全是急转弯式的处理，比之《义不容情》里几位主角之间的展转反折，令人希嘘不以来说差得太远了。

不过整部影片长达120余集，在这么长的集数中，情节还算跌宕，前后照应，比较紧凑，可有可无的剧情少，通看一遍仍感流畅。虽然觉得称不上TVB的经典大作，不过还是一部不错的片子。

订阅：博文 (Atom)

Xie Yubo's Blog