How to get new file magic file?

Issues related to applications and software problems
Post Reply
kdejaeger
Posts: 1
Joined: 2022/12/20 17:22:30

How to get new file magic file?

Post by kdejaeger » 2022/12/20 17:28:19

Hello, I have a centos 7.9.2009 with 'file' command v 5.11. Because this is very old, file --mime-type does not recognise docx correctly. It says the mime type is 'application/msword', which is not correct, it shoudl be 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'.

I tried updating the /etc/magic file and put in under $HOME/magic from another opensuse installation but get a lot of errors.

How can I get an up to file file/magic handling on this distribution?

User avatar
jlehtone
Posts: 4530
Joined: 2007/12/11 08:17:33
Location: Finland

Re: How to get new file magic file?

Post by jlehtone » 2022/12/21 09:08:48

The standard answer is that one does use the packages that the distro has, or one does use a different distro.

While Red Hat does backport features to Enterprise Linux https://access.redhat.com/solutions/57665
the RHEL 7 (and hence CentOS 7) was released 2014, does no longer receive feature updates, and will die "soon" (June 2024).

The section in EL9's version of magic that seems to be about msooxml:
man file does mention $HOME/.magic -- not $HOME/magic

Code: Select all

#------------------------------------------------------------------------------
# $File: msooxml,v 1.13 2019/11/27 13:12:55 christos Exp $
# msooxml:  file(1) magic for Microsoft Office XML
# From: Ralf Brown <ralf.brown@gmail.com>

# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
#   archive.  The first member file is normally "[Content_Types].xml".
#   but some libreoffice generated files put this later. Perhaps skip
#   the "[Content_Types].xml" test?
# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
#   file of ePub or OpenDocument, we'll have to scan for a filename
#   which can distinguish between the three types

0		name		msooxml
>0		string		word/		Microsoft Word 2007+
!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document
>0		string		ppt/		Microsoft PowerPoint 2007+
!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation
>0		string		xl/		Microsoft Excel 2007+
!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
0		string		visio/		Microsoft Visio 2013+
!:mime application/vnd.ms-visio.drawing.main+xml

# start by checking for ZIP local file header signature
0		string		PK\003\004
!:strength +10
# make sure the first file is correct
>0x1E		use		msooxml
>0x1E		regex		\\[Content_Types\\]\\.xml|_rels/\\.rels|docProps
# skip to the second local file header
# since some documents include a 520-byte extra field following the file
# header, we need to scan for the next header
>>(18.l+49)	search/6000	PK\003\004
# now skip to the *third* local file header; again, we need to scan due to a
# 520-byte extra field following the file header
>>>&26		search/6000	PK\003\004
# and check the subdirectory name to determine which type of OOXML
# file we have.  Correct the mimetype with the registered ones:
# https://technet.microsoft.com/en-us/library/cc179224.aspx
>>>>&26		use		msooxml	
>>>>&26		default		x
# OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file
>>>>>&26	search/6000	PK\003\004
>>>>>>&26	use		msooxml	
>>>>>>&26	default		x		Microsoft OOXML
>>>>>&26	default		x		Microsoft OOXML


Post Reply