Tuesday, October 4, 2016

Full text indexing with Dspace

Media Filters in Dspace are responsible to extract text for full-text searching. Media Filters for HTML Text, PDF text, Word file, Power Point enabled by default with Dspace. 

Run media filter from command line
Media filter can run from command line. It will generate index for searching.  

sudo su
/dspace/bin/dspace filter-media

Run media filter as cronjob
Cron is a time based job scheduler with Linux. Media filter command can add to cron job. Media filters can run daily or weekly as cronjob.

Open Applications > Accessories > Terminal
Apply following commands,

crontab -e

Add following line.

@daily /home/dspace/bin/dspace filter-media

Save and close the file.

Media filter run at midnight (convenient for 24x7 running servers). User can change cronjob running time. See the examples of cronjob from here.

Reference

No comments:

Post a Comment