I need to extract text from a few document types (.doc .docx .pdf and .txt primarily) from email attachments. The application is running on Google App Engine. Apache Tika does exactly what I need it to, but I'm running to a SecurityException when it tries to create temporary files on GAE. I know GAE does not support this.
Is there a way to force Tika to use memcache or some other storage besides temporary files? Are there any other document parsers which might handle this without temporary files? Some of the libraries that Tika uses will only work with a File, while others are happy with an InputStream. Could you be hitting that?
以上就是Apache Tika text extraction on Google App Enginer的详细内容，更多请关注web前端其它相关文章！