Weird question mark/diamond symbols in html source code?

General support questions
eddiegi686
Posts: 45
Joined: 2015/09/07 19:54:41

Re: Weird question mark/diamond symbols in html source code?

Post by eddiegi686 » 2015/10/03 18:35:54

If you're mainly uploading files created on a windows system I imagine that will work okay for you.

Quoting the Apache documentation here:
AddDefaultCharset should only be used when all of the text resources to which it applies are known to be in that character encoding and it is too inconvenient to label their charset individually. One such example is to add the charset parameter to resources containing generated content, such as legacy CGI scripts, that might be vulnerable to cross-site scripting attacks due to user-provided data being included in the output. Note, however, that a better solution is to just fix (or delete) those scripts, since setting a default charset does not protect users that have enabled the "auto-detect character encoding" feature on their browser.
I would suggest using the correct meta attribute in HTML for the character set you are using, rather than setting this in the web server globally.

This link explains how to do this: http://www.w3schools.com/html/html_charset.asp

I believe the normal charset used on Linux systems is either ISO-8859-1 or UTF-8. On Windows systems it is probably ANSI (Windows-1252) or UTF-8.

According to that page:
If a browser detects ISO-8859-1 in a web page, it defaults to ANSI, because ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters.
Ultimately it probably won't change the web page you're displaying if you already specify the charset to use in HTML.

I agree with TrevorH though, using special ANSI characters in your HTML should be avoided. It might be possible to use the awk or sed commands to update all your web pages at once from the command line. I would suggest doing that rather than defaulting the server to ANSI mode.

SeijiSensei
Posts: 12
Joined: 2015/10/05 21:25:41

Re: Weird question mark/diamond symbols in html source code?

Post by SeijiSensei » 2015/10/07 23:38:43

The HTML code you displayed looks suspiciously like it was written in Microsoft Word or some other Microsoft product. Styles that begin with "mso-" come from Microsoft Office. The page was probably composed in Word on a machine using Windows-1252. My guess is those characters are tabs or some similar type of control character that did not translate to the UTF-8 character set. If you have any control over the authoring of these pages, you should consider using a platform-agnostic editor on a machine with UTF-8 as its character set. I wouldn't be surprised if those pages were authored on an XP machine. Current Windows uses UTF-8 now, I believe.

You can use the iconv utility to convert from one character set to another. For instance, to convert a page from Windows-1252 to UTF-8 you would run the command:

Code: Select all

iconv -f WINDOWS-1252 -t UTF-8 -output.html input.html
To convert all the files in a directory with .html extensions, cd to the directory, then run

Code: Select all

mkdir utf8
for f in *.html
do
    iconv -f WINDOWS-1252 -t UTF-8 -output utf8/$f $f
done
The converted files will reside in the utf8 subdirectory.

To see a list of the encodings iconv supports, use the command "iconv -l".

Post Reply

Return to “CentOS 7 - General Support”