Direct multidisplay for web document repositories Gu, Zhong
dc.contributor.department Electrical and Computer Engineering 2020-11-22T06:48:23.000 2021-02-26T09:05:54Z 2021-02-26T09:05:54Z Mon Jan 01 00:00:00 UTC 2001 2001-01-01
dc.description.abstract <p>As the popularity of the Internet grows, the information on the Internet is increasing as well. Search engines are important tools to help people retrieve information of interest from the huge amount of documents. However, currently used search engines return long lists of URLS. Users have to click each URL to download the actual document and check its content and click the back button to access other URLs if the answer is not found in the current URL. This process is both labor intensive and time consuming. Multibrowser, a program that addresses this problem, is presented in this thesis. Multibrowser combines the advantages of multidisplay and direct display to present a more efficient user computer interface. First, the system downloads the actual documents according to the list of URLs returned by a standard search engine and saves the documents on the local disk. Second, the system converts the documents into n-gram vectors and clusters them into three groups according to the n-gram vectors. Then each document is assigned a color according to its position in relation to the cluster centroids. Also, each paragraph is linked with other paragraphs which have similar contents. Last, the documents are presented to the users using Multidisplay, where each document has a corresponding color bar. Users can look through the content of several documents at the same time and see the similarity among them by the colors bars; users can then retrieve the most similar paragraphs to a certain paragraph by clicking its "find similar" link. In addition, the investigation of hash tables for our text processing shows that the hash table size has to be chosen very carefully to avoid undesirably large collision probability. Some good values are suggested based on our experiments.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 22243
dc.identifier.contextkey 20252395
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath rtd/21244
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 22:35:54 UTC 2022
dc.subject.keywords Electrical and computer engineering
dc.subject.keywords Computer engineering
dc.title Direct multidisplay for web document repositories
dc.type article
dc.type.genre thesis
dspace.entity.type Publication
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff Computer Engineering thesis Master of Science
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
1.44 MB
Adobe Portable Document Format