NDLI: A Multi-Threaded Semantic Focused Crawler

Content Provider	SpringerLink
Author	Bedi, Punam Thukral, Anjali Banati, Hema Behl, Abhishek Mendiratta, Varun
Copyright Year	2012
Abstract	The Web comprises of voluminous rich learning content. The volume of ever growing learning resources however leads to the problem of information overload. A large number of irrelevant search results generated from search engines based on keyword matching techniques further augment the problem. A learner in such a scenario needs semantically matched learning resources as the search results. Keeping in view the volume of content and significance of semantic knowledge, our paper proposes a multi-threaded semantic focused crawler (SFC) specially designed and implemented to crawl on the WWW for educational learning content. The proposed SFC utilizes domain ontology to expand a topic term and a set of seed URLs to initiate the crawl. The results obtained by multiple iterations of the crawl on various topics are shown and compared with the results obtained by executing an open source crawler on the similar dataset. The results are evaluated using Semantic Similarity, a vector space model based metric, and the harvest ratio.
Starting Page	1233
Ending Page	1242
Page Count	10
File Format	PDF
ISSN	10009000
Journal	Journal of Computer Science and Technology
Volume Number	27
Issue Number	6
e-ISSN	18604749
Language	English
Publisher	Springer US
Publisher Date	2012-11-15
Publisher Place	Boston
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	eLearning semantic focused crawler semantically expanded term ontology Artificial Intelligence (incl. Robotics) Data Structures, Cryptology and Information Theory Computer Science Information Systems Applications (incl. Internet) Software Engineering Theory of Computation
Content Type	Text
Resource Type	Article
Subject	Theoretical Computer Science Computational Theory and Mathematics Computer Science Applications Software Hardware and Architecture

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Personalized Semantic Based Blog Retrieval

Ontology-based semantic cache in AOKB

Discovering High-Quality Threaded Discussions in Online Forums

An Ontology-Based Approach for Semantic Conflict Resolution in Database Integration

GPP-Based Soft Base Station Designing and Optimization

A Semantic Searching Scheme in Heterogeneous Unstructured P2P Networks

Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces

Social Network-Aware Interfaces as Facilitators of Innovation

Introduction to the Six Leading Editors

A Multi-Threaded Semantic Focused Crawler

Similar Documents

Personalized Semantic Based Blog Retrieval

Ontology-based semantic cache in AOKB

Discovering High-Quality Threaded Discussions in Online Forums

An Ontology-Based Approach for Semantic Conflict Resolution in Database Integration

GPP-Based Soft Base Station Designing and Optimization

A Semantic Searching Scheme in Heterogeneous Unstructured P2P Networks

Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces

Social Network-Aware Interfaces as Facilitators of Innovation

Introduction to the Six Leading Editors

A Multi-Threaded Semantic Focused Crawler