Regain search engine on Glassfish app server
“Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (currently HTML, XML, Excel, Powerpoint, Word, PDF and RTF). A TagLibrary eases integrating search results in your JSP based web page.” You may download the production version from http://regain.sourceforge.net/ or beta version from http://www.assembla.com/spaces/regain2/documents
Our NAS division inTranet web site needed a search look up like Google search. I came across with Regain Search engine after done some research. It is pretty simple to install on a server (Solaris box). Regain does not do any database search, but if you have a link that requires to go to database, Regain will crwal up the link store that file metadata in the index. Hibernate search or Lucene search does the full scale database search, I have not used them yet but that’s what I hear.
Here are few steps to install Regain on Glassfish: (Unzip regain_v1.5.0-preview-80717-1556_server.zip)
- Create a directory under ../SUNWappserver/domains/domain1
- Name it regain
- Copy the crawler directory from c:\regain\runtime (after unzipped) and paste it to domain1/regain/
- Under crawler dir, you will find CrawlerConfiguration.xml
- Modify the xml file with your domain name etc.
- Create a directory and call it searchindex under crawler
- Now the fun part is to build indexing
- Change directory (cd) to domian1/regain/crawler dir and run this following command
- java -jar regain-crawler.jar or, java -jar regain-crawler.jar –help (more options)
- Copy
conf dir, it's sub directories and the xml file(all three -> conf\regain\SearchConfiguration.xml ) from the downloaded zip-file (can be found underregain\runtime\search) and paste 'em directly under domian1/application
- Modify the SearchConfiguration.xml file mainly line 74, 80 and 83
- Deploy the regain.war file via app server’s beautiful admin gui.
- Modify the web.xml (domain1\applications\j2ee-modules\regain\WEB-INF). You have to specify the regain webapp where to look for search configuration file.
<!-- The location of the configuration file --> <context-param> <param-name>searchConfigFile</param-name> <param-value>../conf/regain/SearchConfiguration.xml</param-value> </context-param>
Regain webapp’s web.xml map to SearchConfiguration.xml and SearchConfiguration.xml know where to find searchindex dir to bring up the query results. (Three steps).
7. Open up a browser and type http://yourdomainname/contextName, i.e. http://axous2.abc.aaa.info/search
Happy searching!
Home page: http://regain.sourceforge.net/
Open Source Full Text Search Engines Written In Java: http://www.manageability.org/blog/stuff/full-text-lucene-jxta-search-engine-java-xml
24 Comments »
Leave a comment
-
Archives
- November 2009 (1)
- July 2009 (1)
- June 2009 (5)
- May 2009 (2)
- April 2009 (4)
- March 2009 (2)
- January 2009 (3)
- December 2008 (1)
- November 2008 (2)
- October 2008 (1)
- September 2008 (3)
- August 2008 (2)
-
Categories
-
RSS
Entries RSS
Comments RSS
starting at step 4, things are not too clear… in fact I cant find the log file and I’m getting error:
Error message: Writing results failed
Would you have the complete paths for 4, and give an example for 5. Finally, by pointing domains/domain1/applications/j2ee-modules/regain/WEB-INF/web.xml to../conf/regain/SearchConfiguration.xml, is that to say that it will look in domains/domain1/applications/j2ee-modules/regain/conf/regain for the SearchConfiguration.xml file?
Full paths would help in figuring this out if possible.
I’m on a solaris box, so my full path starts with /opt/glassfish/domains …
From the downloaded zip-file in the directory regain\runtime\search, you will find a conf dir. Copy that conf and paste it /opt/glassfish/domains/domain1/applications
Inside your conf dir, you should have regain dir and inside the regain dir, you should have SearchConfiguration.xml
Here are code snippet from line 71 to below looks like:
<!-- The search index 'main' --> <index name="main" default="true"> <!-- The directory where the index is located --> <dir>/app01/SUNWappserver/domains/domain1/regain/crawler/searchindex</dir> </index> <!-- The search index 'example' --> <index name="example"> <!-- The directory where the index is located --> <dir>/app01/SUNWappserver/domains/domain1/regain/crawler/searchindex</dir> <rewriteRules> <rule prefix="/web/docs/" replacement="http://ato.abc.aaa.info"/> </rewriteRules> </index>You have to explicitly define the searchindex directory location
Hope this help. We are loving it in here at OKC. Regain ROCKS! We were going to pay to buy Google Mini and Regain saved our soul (SOS).
That worked out good. All I had to do is move the conf folder 2 directories up.
Now I’m having to redo the index though. I used the file:// index and that works ok, but I end up with links to documents like this:
http://my.website.com/search/file/%24/%24export/home/user/folder/my+documents/document.pdf?index=main
instead of http://my.website.com/user/folder/my+documents/document.pdf?index=main
I switched to indexing with http instead of file, but then I have a problem: it doesn’t follow the links within the jsp pages.
Did you have any problems like that?
No, I do not. It crawls no matter .jsp or .html
Review your configuration file and make sure no where you said, dont crawl .jsp
Thanks
Hobi
would you mind posting your crawlerconfiguration.xml too? I’ve asked about my problem on the regain forum and no answer in two weeks…
Thanks for any help.
CrawlerConfiguration.xml
Hope this help
Thanks
Hobi
what config do you have in the crawlerconfiguration.xml file?
argh… in the startlist start section…. the brackets were processed as html…
So, is that truely all you have in your crawlerconfiguration, no startlist and start parse=true index=true with a http url?
If I use a file:// url it searches and index but the links in the search result are wrong. If I use http:// it doesn’t index at all, that’s why I was curious to see what your whole startlist section of crawlerconfiguration.xml looked like.
Hi
I m developing a search engine using regain..i got struck at your 6th point..i.e.
Deploy the regain.war file via app server’s beautiful admin gui.
That means the from where i will get this .war file?
and how to run it?
please if possible mail me the screen shots..i am in great urgent..
thanks for understanding.
When you download the regain zip file, it comes with a regain.war file. It should be under regain1.5\regain\runtime\search\webapps\regain.war
You can use Glassfish admin tool to deploy the war file just like any other webapp.
http://blogs.sun.com/Ludo/resource/dsccdeploy.png
Hope this help.
Hobi
Thanks for your reply..
i have deployed the regain.war file and i changed the
searchConfigFile
../conf/regain/SearchConfiguration.xml
in web.xml(domain1\applications\j2ee-modules\regain\WEB-INF)
i am getting this error:
Error message: Writing results failed
i can see the whole crawling jobs when i run the “java -jar regain-crawler.jar”
still it is not able to get the results
Here are the links of my aplication:
I am indexing my local file folder.
crawling config: http://img186.imageshack.us/img186/9914/crawlingconfig.jpg
configuring searchindex: http://img365.imageshack.us/img365/8254/config.jpg
Deploying: http://img119.imageshack.us/img119/5633/deploy.jpg
i dont know how to post it correctly..so i uploaded..
can you sortout my problem..
Thanks for your reply..
When i deployed the war file i am getting the
“Error message: Writing results failed”
even i provided the correct indexing paths.
Please check directory permission and the blacklist.
Dear HobiOne,
I am thankful to you for prompt replies.
i have checked the directory permission and the blacklist.they are perfect i think.
can you check the comment# 13 (my screen shots).Let me know any thing goes wrong.I got crawling info also when i run java -jar regain-crawler.jar
If possible can you give your mailid..i will send my entire code.
or else send me any sample application to my mailid(abbhooshan@gmail.com).
Thanks
Here is my CrawlerConfiguration.xml
http://pastebin.com/ff94fbd0
and SearchConfiguration.xml
http://pastebin.com/f2245783f
Hope these help.
Hobi
Configuration for the regain crawler:
http://pastebin.com/f6cb28c4c
SearchConfiguration.xml
http://pastebin.com/f13ef38c0
web.xml
http://pastebin.com/f71b50ad6
can you please check my code and let me know anything went wrong?
btw..
thanks for sending your code
Hi Hobi
at last i got one wonderful blog..can you post how to connect to the database using regain??
i want to index the mysql database.
cheers
CrawlerConfiguration.xml:
http://pastebin.com/f6cb28c4c
SearchConfiguration.xml
http://pastebin.com/f13ef38c0
web.xml:
http://pastebin.com/f71b50ad6
can you please check my code and correct me..
btw..thnx for posting ur code
bye
I am indexing the local drive.
SearchConfiguration.xml
http://pastebin.com/m2feabe9c
CrawlerConfiguration.xml
http://pastebin.com/d109f4f0e
web.xml
http://pastebin.com/d64d5895
still i am getting the same error..
Hi
thanks for your help..at last i can index my sites.
How to start indexing after an interval in server version?
meaning auto indexing after 1day,after 1 week like…
i have seen the desktop version is having that.
Where to configure for server edition?
thanks
Can’t find the regain2 bits -the link in your article is 404
Hi can you tell how to configure max results display in search.
That means it is giving 10 results as default can i customize it