Searching within files
Permalinkthx for your help!

Jordanlev wrote a howto on something like your requirement:
http://www.concrete5.org/documentation/how-tos/developers/how-to-in...
You may also find this one useful, as it does it for products (so maybe use a similar technique for files)
http://www.concrete5.org/documentation/how-tos/developers/modify-si...
These addons may also do some of what you are looking for (they search files, but not file content)
http://www.concrete5.org/marketplace/addons/image-file-search/...
http://www.concrete5.org/marketplace/addons/document_library/...
Meanwhile, I've programmed my own solution.
After uploading the file, the contents of the PDF is read and written to the database. And the search looks for matches in these fields and shows them separately in the list (with a direct download link to the file). In addition, the search can be limited by filesets.
If you want to know the details, please let me know.
private function pdf2string($sourcefile) { $fp = fopen($sourcefile, 'rb'); $content = fread($fp, filesize($sourcefile)); fclose($fp); $searchstart = 'stream'; $searchend = 'endstream'; $pdfText = ''; $pos = 0; $pos2 = 0; $startpos = 0; while ($pos !== false && $pos2 !== false) { $pos = strpos($content, $searchstart, $startpos); $pos2 = strpos($content, $searchend, $startpos + 1); if ($pos !== false && $pos2 !== false){ if ($content[$pos] == 0x0d && $content[$pos + 1] == 0x0a) {
I managed to get round it using two server-side approaches combined with the php shell_exec function. For PDFs I'm using the pdftotext utility from the xpdf package, and for Word files I'm using a headless install of OpenOffice combined with the unoconv command line util. They can both output to stdout, so it's easy to get the parsed text back into php. This is on a Linux (CentOS) server so I'm not how how cross-platform this approach is, but it works well for me.
I'll probably write up the steps into a howto to share with the community.