Direct multidisplay for web document repositories

dc.contributor.author Gu, Zhong
dc.contributor.department Department of Electrical and Computer Engineering
dc.date 2020-11-22T06:48:23.000
dc.date.accessioned 2021-02-26T09:05:54Z
dc.date.available 2021-02-26T09:05:54Z
dc.date.copyright Mon Jan 01 00:00:00 UTC 2001
dc.date.issued 2001-01-01
dc.description.abstract <p>As the popularity of the Internet grows, the information on the Internet is increasing as well. Search engines are important tools to help people retrieve information of interest from the huge amount of documents. However, currently used search engines return long lists of URLS. Users have to click each URL to download the actual document and check its content and click the back button to access other URLs if the answer is not found in the current URL. This process is both labor intensive and time consuming. Multibrowser, a program that addresses this problem, is presented in this thesis. Multibrowser combines the advantages of multidisplay and direct display to present a more efficient user computer interface. First, the system downloads the actual documents according to the list of URLs returned by a standard search engine and saves the documents on the local disk. Second, the system converts the documents into n-gram vectors and clusters them into three groups according to the n-gram vectors. Then each document is assigned a color according to its position in relation to the cluster centroids. Also, each paragraph is linked with other paragraphs which have similar contents. Last, the documents are presented to the users using Multidisplay, where each document has a corresponding color bar. Users can look through the content of several documents at the same time and see the similarity among them by the colors bars; users can then retrieve the most similar paragraphs to a certain paragraph by clicking its "find similar" link. In addition, the investigation of hash tables for our text processing shows that the hash table size has to be chosen very carefully to avoid undesirably large collision probability. Some good values are suggested based on our experiments.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/rtd/21244/
dc.identifier.articleid 22243
dc.identifier.contextkey 20252395
dc.identifier.doi https://doi.org/10.31274/rtd-20201118-208
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath rtd/21244
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/98611
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/rtd/21244/Gu_ISU_2001_G8.pdf|||Fri Jan 14 22:35:54 UTC 2022
dc.subject.keywords Electrical and computer engineering
dc.subject.keywords Computer engineering
dc.title Direct multidisplay for web document repositories
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
thesis.degree.discipline Computer Engineering
thesis.degree.level thesis
thesis.degree.name Master of Science
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Gu_ISU_2001_G8.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format
Description: