转载请说明出处:
1. 安装jdk,tomcat
sudo apt-get install tomcat7 openjdk-7-jdk openjdk-7-jre
2. 到下载最新的solr。注意别下到源代码包
solr-4.9.0.tgz
sudo mv solr-4.9.0.tgz /mnt
cd /mnt && sudo tar -xvf solr-4.9.0.tgz
sudo cp -r solr-4.9.0/dist/solr-4.9.0.war /var/lib/tomcat7/webapps/
sudo cp -r solr-4.9.0/example/solr /mnt/
cd /mnt/solr
sudo mkdir data
sudo chmod a+w data
sudo vim collection1/conf/solrconfig.xml
将<dataDir>${solr.data.dir:}</dataDir>改为<dataDir>${solr.data.dir:/mnt/solr/data}</dataDir>
sudo cp -r /mnt/solr-4.9.0/example/lib/ext/* /usr/share/tomcat7/lib/
sudo cp -r /mnt/solr-4.9.0/example/resources/log4j.properties /usr/share/tomcat7/lib/
sudo vim /etc/tomcat7/Catalina/localhost/solr.xml
加入例如以下内容
<?xml version="1.0" encoding="UTF-8"?
>
<Context docBase="/var/lib/tomcat7/webapps/solr.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/mnt/solr/data" override="true" />
</Context>
service tomcat7 restart
正常重新启动之后在/var/lib/tomcat7/webapps文件夹下会添加一个solr文件夹
改动/var/lib/tomcat7/webapps/solr/WEB-INF/web.xml将
<!--
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/put/your/solr/home/here</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
-->
改为
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/mnt/solr</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
service tomcat7 restart
浏览器中输入127.0.0.1:8080/solr就OK了
增加中文分词器
1. 到 https://github.com/chenlb/mmseg4j-solr中点击download中的链接去下载 mmseg4j-solr-2.2.0-with-mmseg4j-core.zip
2. sudo mv mmseg4j-solr-2.2.0-with-mmseg4j-core.zip /mnt/
3. sudo unzip mmseg4j-solr-2.2.0-with-mmseg4j-core.zip
4. sudo mv mmseg4j-*.jar /var/lib/tomcat7/webapps/solr/WEB-INF/lib/
5. 參照README.md在
/mnt/solr/collection1/conf/schema.xml文件的types节点中加入例如以下内容(在文件末尾且</schema>之前)
<fieldType name="textComplex" class="solr.TextField" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicdata-path="dic"/>
</analyzer>
</fieldType>
<fieldType name="textMaxWord" class="solr.TextField" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicdata-path="dic"/>
</analyzer>
</fieldType>
<fieldType name="textSimple" class="solr.TextField" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicdata-path="dic"/>
</analyzer>
</fieldType>
6. 然后sudo mkdir /mnt/solr/dic
7. service tomcat7 restart
浏览器中输入127.0.0.1:8080/solr就OK了
8. 使用curl命令上传数据
当前文件夹下有个software.doc文件,为其创建索引
curl " :8080/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@software.doc"
curl " :8080/solr/update/extract?literal.id=/mnt/WorkStation/software.doc&commit=true" -F "myfile=@/mnt/WorkStation/software.doc"
以下有/mnt/WorkStation/test.txt文件,内容为
春晓春眠不觉晓,处处闻啼鸟。夜来风雨声,花落知多少。
为其创建索引
curl " :8983/solr/update/extract?literal.id=/mnt/WorkStation/test.txt&commit=true" -F "myfile=@/mnt/WorkStation/test.txt"
然后在浏览器中查询结果例如以下
中文分词
警告处理
SolrResourceLoader Can't find (or read) directory to add to classloader: ../../../contrib/extraction/lib
产生该问题是要是相应的文件无法找到。处理方法:
将相应的库拷贝过去
sudo cp -r /mnt/solr-4.9.0/contrib /mnt/solr/
sudo cp -r /mnt/solr-4.9.0/dist /mnt/solr/
然后改动/mnt/WorkStation/Solr/solr/collection1/conf/schema.xml。将一下相应的路径改成你的路径(能够用绝对路径)。上一步已经将相应的文件夹复制到了/mnt/solr中,所以这里改成以下的路径
<!--
<lib dir="../../../contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="../../../contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="../../../contrib/langid/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="../../../contrib/velocity/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-velocity-\d.*\.jar" />
-->
<lib dir="../contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="../contrib/langid/lib/" regex=".*\.jar" />
<lib dir="../dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="../contrib/velocity/lib" regex=".*\.jar" />
<lib dir="../dist/" regex="solr-velocity-\d.*\.jar" />
错误处理
在为pdf或者其它文件建立索引时可能会出现以下的问题,导致这些问题的主要原因也是相应的jar文件没有找到
SimplePostTool: WARNING: Response: <?
xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="error"><str name="msg">lazy loading error</str><str name="trace">org.apache.solr.common.SolrException: lazy loading error
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:257)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:490)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:421)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:540)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:613)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
... 20 more
Caused by: java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)
... 24 more
</str><int name="code">500</int></lst>
</response>
处理办法是将相应的jar文件复制到solr的server文件夹中
sudo cp -r /mnt/solr-4.9.0/contrib/extraction/lib/* /var/lib/tomcat7/webapps/solr/WEB-INF/lib/
重新启动tomcat7就OK了