Hi all,
I’m trying to read MS word contents using fread or file_get_contents …
It works fine using both.
But my problem is explained in the attached file.
I want to ignore non English characters because they are always converted into strange chars.
This is my code :
function parseWord($userDoc) {$fileHandle = fopen($userDoc, "r");
$line =mb_convert_encoding( @fread($fileHandle, filesize($userDoc)) , "UTF-8");
$lines = explode(chr(0x0D),$line);$outtext = "";foreach($lines as $thisline){$pos = strpos($thisline, chr(0x00));if (($pos !== FALSE)||(strlen($thisline)==0)){} else {$outtext .= $thisline." ";}}$outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-
\r @\/\_\(\)]/","",$outtext);return $outtext;}
Can someone help?