Metadata
In order to make a collection of articles, it is necessary to know their
or metadata in the language of information scientists.
Metadata Effort
The effort needed to provide this extra information
can seem like
a particular burden when faced with
a backlog
of publication information.
Metadata Effort for New Users
This is exactly what a new user of an eprint archive faces
- those users who are feeling the most daunted about
- using an unfamiliar piece of software
have a particular burden facing them.
Some institutions may choose to provide mediation support for users in this task
Metadata vs Web Search
When you make your files available on the Web, you don't
need to fill out Google form describing its contents.- Instead, Google
automatically discovers the existence of the file and automatically
indexes all the words in the document.
- This makes the process of entering
new information on the Web much easier,
- but means that the search for a document is quite naive.
Locating References
By contrast, a search for an academic paper is
usually performed against
- With
Google, it is impossible to tell whether a word is
- part of the name of the author of the article,
- of one of the articles in the bibliography,
- or just a random part of the text.
The string "1994" could be the year that
the article was published or the page number.
Purpose of Metadata
The purpose of accurate metadata is to make searches accurate,
- so that you
can be confident that the article that you are provided is the article that
you are looking for,
- not just one that sounds like it.
- EPrints tries to
minimise the information you are required to enter
- (and different sites will have different requirements for different purposes)
- but the following pages describe some of the
information that is required:
Author names
Author names are perhaps the most important piece of metadata
about an article because the surname of the first author is one of the
most significant distinguishing pieces of information about
a paper.
- As a self-archiver, it is likely that the name is either yours
or that of one of your colleagues.
- Although it sounds patronising to emphasise
it, please make sure you know how to spell this name!
Author Name Problems
In particular, please be consistent with
- initials — how many names do you have? Which initials do you record on your papers? Make sure you use the same initials in the paper and the metadata.
- prefixes — are you known as "de Souza" or "deSouza"
- do you use diacritical marks or an ASCII sanitized spelling?
It is important to think asbout these things, especially if you have
delegated the responsibility of entering your papers to a secretary or student.
A Rose By Any Other Name
The consequences of carelessness are that searches against your name (against
what people think is your name) will not return all your papers.
-
In other words, you will look less successful than you are.
Paper Title
If possible, cut and paste the title of the paper directly from its
contents.
- There may be some problematic issues regarding the formatting
- (for example, sub- and super-scripts in a chemistry article,
- or italic formatting for a mathematical expression).
It is important to realise
that the metadata is a database record
- - its purpose is searching, not printing.
Paper Title
The best course of action is to provide text that reflects the
meaning of the title, without trying to duplicate its appearance.
- However
it may be common in some communities to use explict markup
- (e.g. physicists would naturallly put the TeX makup for mathematics;
- it is unlikely that philosophers would use explicit RTF codes).
Publication status
This one piece of information marks the difference between an article
that has successfully been through the peer-review process and those
that have not (or not yet).
Please ensure that this information is added
after a paper has been accepted!
Year, issue number, page number
These three numbers are all significant in distinguishing
citations of similar sounding papers.
- The drawback with them is that
they are not known until well after the paper has been accepted for publication
- and hence a long time after the eprint has been deposited.
Three vital statistics
Please make sure that you return to the eprint record and add this information
when it becomes available.
- If your institution uses the eprint archive to
drive its publication auditing or to automatically produce staff CVs, then
this information will be crucial
- (and will save you the effort of having to provide it in other contexts).