|Author||Chaudhuri, Anirban Ray ♦ Singh, Debnath ♦ Nasipuri, Mita ♦ Basu, Dipak Kumar|
|Source||Inflibnet's Institutional Repository|
|Subject Domain (in DDC)||Computer science, information & general works ♦ Data processing & computer science ♦ Library & information sciences|
|Subject Keyword||Indian Scripts ♦ Desktop Publishing ♦ Page Layout Analysis ♦ Optical Character Recognition ♦ Document Reconstruction ♦ Encoding Standard ♦ Indian Language|
|Abstract||The transformation of a scanned paper document into an editable form suitable for further processing such as desktop publishing or archiving in a digital library is a complex process. It requires solutions to several problems – document analysis by acquiring knowledge of document layout by a Page Layout Analyzer (PLA), followed by document recognition, which mainly comprises text recognition by Optical Character Recognition (OCR). Besides these two, another important problem is document reconstruction by transforming content into an electronically editable format by keeping the original layout intact. Core OCR modules exist on different Indian scripts, but no such document reconstruction system is available for Indian scripts. The document reconstruction system reported in this paper is the first of its kind on Indian scripts and it addresses document reconstruction for Bengali document images. The system makes use of the knowledge of both document layout extracted by a PLA in a graphical user interface (GUI) and the results of text recognition steps performed by OCR for transformation of paper documents into Rich Text Format.|
|Education Level||UG and PG|
|Learning Resource Type||Article|
Ministry of Human Resource Development (MHRD) under its National Mission on Education through Information and Communication Technology (NMEICT) has initiated the National Digital Library of India (NDL India) pilot project to develop a framework of virtual repository of learning resources with a single-window search facility. Filtered and federated searching is employed to facilitate focused searching so that learners can find out the right resource with least effort and in minimum time. NDL India is designed to hold content of any language and provides interface support for leading Indian languages. It is being arranged to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular form of access devices and differently-abled learners. It is being developed to help students to prepare for entrance and competitive examination, to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. The pilot project is devising a framework that is being scaled up with respect to content volume and diversity to serve all levels and disciplines of learners. It is being developed at Indian Institute of Technology Kharagpur.
NDL India is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with respective organization from where the contents are sourced and NDL India has no responsibility or liability for these. Every effort is made to keep the NDL India portal up and running smoothly. However, NDL India takes no responsibility for, and will not be liable for, the portal being unavailable due to technical issues or otherwise.
Ministry of Human Resource Development (MHRD), through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDL India) pilot project.
For any issue or feedback, please write to firstname.lastname@example.org
COVID-19 Lockdown not to affect your study. Study through National Digital Library of India (NDLI).