Weird question mark/diamond symbols in html source code?

General support questions
sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/02 05:11:40

I'm setting up a CentOS 7 virtual server in a Server 2008 R2 install via Hyper-V to hopefully self-host a few of my websites. I've got the httpd service running, vsftpd finally working, and I'm just using a local user account for ftp transfers (port 21 is disabled at my router firewall).

Anyway, I've used CoreFTP to connect to both my current webhost and this new CentOS 7 install. I transferred files for a very small site directly between them for a test. It works, except it has inserted this question-mark-inside-a-black-diamond symbol wherever various forms of punctuation were used. Apostrophes, quotes, etc were all changed to one or more of this symbol. It's in the source of the html and if I look at the html file in vi, it is shown there too.

Any ideas why this could be happening?

And on a side note, how do I force CoreFTP (or any other FTP client for that matter) to copy the contents of a directory when I do a copy/paste from one server to this CentOS server? When I tried, it only made the new directory and I had to copy the files within the directory afterward. I haven't had this problem before with any webhost, just with this new CentOS install.
Attachments
Capture.PNG
Capture.PNG (12.88 KiB) Viewed 9393 times

User avatar
TrevorH
Site Admin
Posts: 33220
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Weird question mark/diamond symbols in html source code?

Post by TrevorH » 2015/10/02 11:10:20

I don't think it "inserted" them, they are characters that were present that your current codepage has no symbol for so it displays that to show that it's undisplayable. Most likely they are UTF-8 characters. I suspect that you should be using some form of HTML special character symbol not directly inputting such characters - like this list for example http://www.w3schools.com/html/html_symbols.asp
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Re: Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/02 18:19:47

TrevorH wrote:I don't think it "inserted" them, they are characters that were present that your current codepage has no symbol for so it displays that to show that it's undisplayable. Most likely they are UTF-8 characters. I suspect that you should be using some form of HTML special character symbol not directly inputting such characters - like this list for example http://www.w3schools.com/html/html_symbols.asp
I'm confused though. If that's the case, why does it work fine on my current webhost? The html files seem to actually change as soon as they are transferred to the new CentOS server. I don't think it is a rendering issue - it seems the source code itself has been changed, as evidenced by opening up one of the HTML files in vi and seeing the same funny characters.

For comparison, attached is a snip of the source code of the same page on my current webhost.
Attachments
Capture.PNG
Capture.PNG (6.77 KiB) Viewed 9362 times

User avatar
TrevorH
Site Admin
Posts: 33220
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Weird question mark/diamond symbols in html source code?

Post by TrevorH » 2015/10/02 19:18:40

Simple enough to check: grab a checksumming program like md5sum and run it on your files in both places. If the checksum is the same then so are the files.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Re: Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/02 19:59:49

TrevorH wrote:Simple enough to check: grab a checksumming program like md5sum and run it on your files in both places. If the checksum is the same then so are the files.
Good call. Using Hashtab on Windows and md5sum on CentOS, the hashes are indeed different. The file is transforming itself when it is uploaded, somehow!

EDIT: Oh, this is interesting. If I transfer the file (with the question mark characters) from CentOS BACK to Windows, those things disappear. So then, it must be a problem that CentOS has with the unicode character set I'm using or something? Without changing every file on every one of my websites, how can I fix this so that those characters will display properly from my CentOS install?

User avatar
TrevorH
Site Admin
Posts: 33220
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Weird question mark/diamond symbols in html source code?

Post by TrevorH » 2015/10/02 21:52:08

Use the correct html entity for the underlying unicode.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Re: Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/02 22:18:51

TrevorH wrote:Use the correct html entity for the underlying unicode.
I appreciate the help, though I'm not sure what you mean. Is "html entity" the same thing as charset? How do I know which charset to use? Do you have any instructions for changing charsets for apache in centos7? Seems like instructions are hard to come by for 7 online... everything seems to be for 6 or before, and lots of syntax seems to have changed since then.

EDIT: I also apologize for being a complete noob about this. Others would probably know what you mean and how to fix it. I'm doing a bunch of reading online but so much of it doesn't apply to this version of centos. I'm trying here! I appreciate your patience.

User avatar
TrevorH
Site Admin
Posts: 33220
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Weird question mark/diamond symbols in html source code?

Post by TrevorH » 2015/10/02 22:57:27

If that is really UTF-8 in the text then it's not really valid to serve that up via a web server. You should use the HTML codes for the characters you want to send as per the page I linked to in my original reply. So a copyright symbol can be written directly in UTF-8 as © but to serve it up in a web page you should really use © instead.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Re: Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/02 23:10:36

TrevorH wrote:If that is really UTF-8 in the text then it's not really valid to serve that up via a web server. You should use the HTML codes for the characters you want to send as per the page I linked to in my original reply. So a copyright symbol can be written directly in UTF-8 as © but to serve it up in a web page you should really use © instead.
Thanks. Ok, that makes sense, but I don't want to do that. I would potentially have many hours of work changing text on various pages. The pages work fine on my current webserver. How can I change this new CentOS install to serve these pages correctly? Or are you saying that it would be impossible? And if that is the case... why does it work on my current webserver (which is also linux-based)?

sgtspike
Posts: 11
Joined: 2015/10/02 05:01:16

Re: Weird question mark/diamond symbols in html source code?

Post by sgtspike » 2015/10/03 04:19:51

I have edited /etc/httpd/conf/httpd.conf with the following:

Code: Select all

AddDefaultCharset windows-1252
This has actually fixed the problem! I am not sure what consequences might come with making that change - will I run into problems in the future?

Post Reply