Wikipedia:搜寻器测试
维库,知识与思想的自由文库
|
在维基百科,Google测试包括任何Google和其他搜索引擎。通过这个方法,部分种类的信息能够被准确地收集。但值得强调的是,所有的搜索引擎,都不能得到确凿的答案,而只是简单的初级启发或经验推导。
[编辑] 技巧Google网页搜索并不是Google搜索的全部。进行某项Google測試的时候,试着搜索Groups(Usenet )。这是一个迥然不同的例子表示。 一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
for the most part, conversations in English conducted by people who are not deliberately trying to sell products or reach a mass audience. Other things being equal, a "groups" search will typically return very roughly 1/5 as many hits as a "Web" search. Because group and Web searches have very different "systemic biases," hit numbers are not comparable. Nevertheless Group searches are particularly helpful in identifying entities whose Web presence may have been artificially inflated by promotional techniques; it is suspicious if a phrase gets, say, 100,000 Web hits but only 20 Groups hits. USENET postings are date-stamped and have been archived for over twenty years, making them more useful than Web searches as a record of recent history. Using a Groups "advanced search", it is possible to restrict a search by date, which can help in identifying how recent the widespread use of a term is. en:Google News searches can assess whether something is currently newsworthy.Google News的一个特色:创建一个网页或公告是廉价易行的。 One characteristic of Google News is that whereas it is easy and inexpensive to create websites or post to USENET, it is harder to convince a Google news source to run a story. Thus Google News, in comparison to Web or Groups, is less susceptible to manipulation by self-promoters. Note that Google News indexes many "news" sources that reflect specific points of view, and many news sources that are only of local interest. Depending on the subject, advanced search functions may be useful. For example, adding "site:gov" or "site:edu" will restrict your search to U.S. government sites or U.S. college and university sites. Other tools that may be useful for research include Google Scholar, which searches academic literature, and Google Print, which searchs the contents of books. [编辑] Alexa 测试尽管维基不是一个网页目录,但是我们收集那些满足维基收录条件的关于网站的文章。 如果你有兴趣撰写一篇关于某个特定网站的维基文章,不如在Alexa(http://www.alexa.com),查一下这个网站是否足够重要。多数人认同维基应该收录前100名的网站,当然也可能是前1000名。但是对于甚至没有在前100000名的网站,一般认为我们将很难认证相关文章的准确性而不能收录在维基之中。但是,这个中间的灰色区域则很难达成一致意见。 对于有些在前1000名内的网站(如microsoft.com),有必要对其指向进行一些调整,如Microsoft。(目前仍略有争议) 我们也注意到,因为各种原因的影响,alexa排行也有很大的争议。例如,alexa软件仅对Microsoft Windows操作系统合和微软Internet Explorer的用户有效。所以,例如专门针对Apple Macintosh的相关主题可能将无法进行能够精确反映其流量的排名。反之,有些网站管理者仅仅为了提升他们的网站排名便安装Alexa工具条,然后自己访问自己的网站。Alexa工具栏用户基数非常小,对于单个用户频繁不断的访问将对整个结果产生明显的影响。 参见這裡以获得更多关于web comics的信息。 [编辑] Google上的偏見当使用Google来测试重要性或存在性的时候,请牢记偏见的可能,即这个工具倾向于偏向发达国家有互联网接入条件的人群的当代的标题,所以测试者必须有一定的判断能力。比如,一个美国当代流行乐坛的音乐组合也许需要几千个Google的点击才能够被大部分维基人认为值得包括,而另一个没有太多互联网接入的国家的相同重要的组合就需要少得多的点击数。而14世纪的大音乐家也许根本从Google上查询不到。 一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
Q. What is the minimum number of matches you should see if a term is not made up? (3? 27? 81?) A.也许有上百个!这决定于以下因素:
Further judgment: the Google test checks popular usage, not correctness. For example, a search for the incorrect en:Charles Windsor gives 10 times more results than the correct en:Charles Mountbatten-Windsor. Also, some topics may not be on the Web because of low Internet use in certain areas and cultures of the world. [编辑] Google测试的可靠性一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
Given that the results of a Google test are interpreted subjectively, its implementation is not always consistent. This reflects the nature of the test being used on a case by case basis. In some cases, articles have been kept with Google hit counts as low as 15 and some claim that this undermines the validity of the Google test in its entirety. However, in fact, this reflects on the rather uneven and subjective nature of the en:Wikipedia:Votes for deletion process more than on the usefulness of the Google test. The Google test has always been and very likely always will remain an imperfect tool used to produce a general gauge of notability. It is not and should never be considered definitive. Major factors which may affect Google hit count include subjects from countries where the internet is not prevalent or topics which are of a historical nature but have not yet been well documented on the internet. In other cases, it is completely speculative as to why a subject merits inclusion with a hitcount below 100 while other such articles are frequently deleted. Also note that the number of hits that Google reports is (sometimes or perhaps always; the details are secret) an estimate, not an exact figure. The number of hits reported by Google has little meaning until one navigates to the last page of the results, since it's only then that Google applies all criteria to a query (such as eliminating duplicate and spam control). Often the hit count is cut by a factor of 10 (or much more) after doing this. Jumping to the end of the results (or as far as is practical), also reveals if the hit count is actually related to intended meaning of the search term. Queries are further improved by setting the results per page to the maximum value (which reduces duplicate results) and excluding any domain of a bias party. For instance "JoesRockBand.com" should be excluded when searching for references to "Joe's Rock Band". For longer lasting articles, excluding the term "wikipedia" itself, may be needed, to avoid counting all the mirrors and language versions of a wikipedia article. In fact, the vfd discussion itself, once archived and indexed by Google, may actually add to the Google hit count used the next time the item is discussed. Finally, some human labor has to be inolved, and a manageable sample of sites found must be opened individually, to actually verify the relevance of the hit count. [编辑] 搜尋引擎的限制一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
Much, probably most, of the publicly available web pages in existence are not indexed. Each search engine captures a different percentage of the total. Nobody can tell exactly what portion is captured. The estimated size of the en:World Wide Web is at least 2 billion pages, but a much deeper (and larger) Web, estimated at over 500 billion pages, exists within databases whose contents the search engines do not index. These "dynamic" pages are formatted by a Web server when a user requests them and as such cannot be indexed by conventional search engines. The en:United States Patent and Trademark Office website is an example; although a search engine can find its main page, one can only search its database of individual patents by entering queries into the site itself. [编辑] 外語及非拉丁文字一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
Claims for the non-notability of a topic is occasionally made based on few Google hits, where a considerably larger number of hits would have resulted from searching in the correct script or for various transcriptions. An Arabic name, for instance, needs to be searched for in the original script, which is easily done with Google, provided one knows what to search for, but one also has to take into account that e.g. English, French and German webpages will likely transcribe the name using different conventions. In addition, different forms of a name used in the original language must be searched for. A Russian personal name has to be searched for both including and excluding the en:patronymic, and any search for names and other words in strongly inflected languages should take into account that arriving at the total number of hits may require searching for forms with varying case-endings or other grammatical variations not obvious for someone who does not know the language. Doing a search like this requires a certain linguistic competence which not every individual wikipedian possesses, but the Wikipedia community as a whole includes many bilingual and multilingual people and it is important for nominators and voters on VfD at least to be aware of one's own limitations and not state conclusively a small number of Google hits for, say, a Serbian poet without pointing out the limited validity of a preliminary search using only one particular transcribed form of the name. [编辑] 參見一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
|


