Monthly Archives: April 2011

Update: COMPUTER–AIDED EXPLORATION OF LITERARY STYLE AND MACHINE TRANSLATIONS

Computer-Aided-Criticism Software  and Machine-Translation Software—Problems and Potential

Summary

Understanding any text in all its subtlety is a prerequisite when translating from one language to another (and exceedingly desirable in literature appreciation).  Like human translators machine translation software should have this capacity. Computer systems have proven to be very poorly suited to a refined analysis of the overwhelming complexity of language. State-of-the-art computer software used in machine translation purporting to do just that still leave much to be desired, as one can easily verify by having translations done by any of the many machine translation tools available on the internet. Since text interpretation is the common denominator, machine-translation software is similar to the one used in computer-aided-criticism, if not identical.

My arguments highlight lesser known problems encountered in computer-aided criticism and may serve to foster a better understanding of present-day machine translation capabilities and its undeniably huge potential in the future, be that in 50 years or more. Machine-translation software and related analytical software are still in their infancy and just like children, they deserve our indulgence. They are bound to become better and better over time. There are different approaches to machine-translation processing of written human language in translation software. I favour Google’s method because – be tolerant with my oversimplification – they try to replicate what the brain does by using as many human translations as possible as sample translations for their database together with other methods (algorithmic mathematical languages). In the long run, this combination of methods is likely to render better-quality translations which will then be indistinguishable from human translations. By that time, this improved software will also be able to decode and translate connotational meaning.

For the time being, it is beyond the grasp or analytic capability or interpretational power of machines, be they computer-aided-criticism software, translation machine programs, grammar correction and summarizing software, to consistently distinguish between such shades of meaning let alone connotational meaning. However, machine translations will become better in time, if not human-like. AI experts think that low-level, algorithmic mathematical languages will be succeeded by high-level modular, intelligent languages which, in turn, will finally be replaced by heuristic machines. They would be capable of learning by mistakes, through trial and error, of operating like human beings.

Background information

Although I wrote the main body of the text almost 30 years ago, it is surprisingly up-to-date due to the fact, that comparatively little progress has been made in this field. In January 1983, with only a modicum of theoretical background on computers and linguistics, I planned to write this essay as a fervent riposte the editor of the American journal Psychology Today, which had published a two-part series called, “The Imagination Extenders,” in November and December 1982. I never sent the letter and apparently no one else did, probably because computer software was only just beginning to emerge and therefore hardly anyone was capable of spotting the weak points in Mr Bennett’s arguments.  Here is an excerpt from the original article from Psychology Today:

Computer-Aided Exploration of Literary Style – A tool to a better understanding of literature?
In the two articles, the question is posed whether computers will be able to extend our imagination the way telescopes and microscopes extend our vision. Philosopher Daniel C. Bennett of Tuft University (U.S.) says they will in two ways: 1) by spreading the range of our senses and 2) by enlarging the amount of our concepts. He speculates that there are hundreds of telling patterns in the number system suitable for computer analysis and suggests that computers be used to study, among other things, literary style. He says that, being a rather clanking and statistical affair, analysis of word choice and style is a delight mainly to pedants and he wonders whether the subtle, subliminal effects of rhythm and other stylistic devices – often quite beneath the conscious threshold of their authors – can perhaps be magnified and rendered visible or audible with the help of a computer. The features the computer would heighten could be abstract patterns, biases of connotations and evocations or intangible meaning – not matter.
Incredible as it may sound, Mr Bennett’s bold claims went unchallenged. Not one single letter to the editor was subsequently published. I wish to emphasize that it is not my “mission”, but my objective to contribute to the discussion about machine intelligence from a practician´s point of view with years of experience in analysing texts – in the traditional manner.

Under close Scrutiny — Computer-Aided Exploratory Analysis of Literature

Literature appreciation by just reading for pleasure is one way of gaining meaning from a piece of art – formal literature analysis is another. Owing to the works of Jung and Freud, as well as novel approaches towards language from the new fields of neuro-linguistics and psycholinguistics, literature analysis can be highly rewarding, especially if combined with the notion of contemplation. An understanding of the interaction of the many literary devices and techniques is the more academic way of finding out what a writer says and how he says it. Therefore, style, which is the object of my exploration, can be an important clue to understanding “meaning” in a piece of literature. What is style then? Style is the outward reflection of the intrinsic sum- total of everything a writer is at the moment of writing. Style flows from an author’s character in its broadest sense and from his life experience.

Not only does a writer express ideas of which he is aware but he also reveals subconscious ideas and conflicts. Very often, he has no knowledge as to why he chooses a particular word over another – a word that may arrive at the threshold of his consciousness like a shooting star from a vast cosmos of subconscious beliefs, suppressed desires, cherished ideals, primordial instincts, mechanized scripts and from the plane of archetypal symbols before it is clad in reason and logic.

How would a computer know in what way style contributes to meaning?

Style is as individual as a fingerprint. No two styles are ever the same and very often, the same word or outward shell, the same sound pattern has an entirely different meaning when used by another writer or in a different context. How then, would it be possible for a computer to analyse style? Even if it should be possible to programme a computer to enable it to recognize hundreds of literary devices and to make generalizations from particular examples, how would it recognize or process the fingerprint of an entirely different writer who uses language in a new and original way?

How can a computer extract meaning from a writer’s three-dimensional web of associative meaning created by the power of one single word, if the computer knows only the husk or dictionary definition of a word but not its contextual essence, its personal and private elements, its fugitive associations and flashing connotations lived and experienced by the author?
The silent speech of metaphorical language, the body language of language (my own term, but I may be wrong there) of imagery and symbolism cannot be expressed in digital numbers or in any other form in the number system since there is an infinite number of possibilities of combining words and creating meaning in ever new groupings and juxtapositions. This problem is further aggravated by the fact that words do not mean the same to different people. Since no two contexts or situations in which words are learnt and used are ever the same, no two meanings, or in this case, interpretations, can ever be the same. One could argue that a computer would alleviate this problem in that it could be an impartial judge as to what meaning a particular word should have in all cases at all times. Yet, this solution would be unacceptable as language would be manipulated, unnatural and bland. Apart from this dumbing down of words, this would smack too much of George Orwell’s “Newspeak”. However, one does need to invoke fiction in order to fully understand the impact such action would have. ( The “bias and sensitivity guidelines” used by pressure groups in the US educational system afford a glimpse of what may be in store. Added in2006)

Mr Bennett speculates that subconscious notions expressed through the medium of style may be made visible or audible. Would this not signify that a writer’s most secret thoughts, sometimes even unknown to himself, could be projected onto a monitor? Moreover, could this rendering of a colour-coded graphic display representing conscious and unconscious thought-patterns or associative configurations be interpreted and fully understood? Would we need another expert telling the literature “expert” what the particular graphic display unveils? Who would decode the meaning encoded in the colour-graph? Who would interpret the computer’s interpretations of the, for instance, “delicate effects of sound”? The meaning to be unearthed from the colour-graphic display on the monitor would be as enigmatic and complex as literature is to many people.

In order to establish personality profiles, psychologists attempt to read a person’s subconscious mind by analysing his speech pattern, his choice of words i.e. his preferences. But it will never be possible to penetrate a person’s subconscious mind and read the pictures, the language in which the subconscious mind “thinks” or communicates. Mental, invisible images, the evocations of the conscious, semi-conscious and unconscious mind cannot be recorded. Abstract ideas, mental pictures produced by evocations and connotations flowing from the composite elements of style, or even from a single word, are not subject to the law of mathematics and cannot be caged in the number system.

Literature analysis is allegedly a delight mainly to pedants, says Daniel Bennett. Does this over-generalization not contain a number of dangerous and narrowing assumptions and suggestions? Could it not be misconstrued to mean that a more profound analysis or appreciation is tedious, done only by pedants and that anyone in his right mind should never attempt to appreciate and enjoy literature by taking a closer look at it than usual – and that an “expert” analysis should be left to the computer? Are such bold claims not preparatory to reshaping and simplifying human cognitive and intuitive abilities, leading us into a yes-or-no-response-Brave New World?

There is more to appreciating literature than counting words. It is an essential characteristic of the appreciation process that through the reading experience itself, by meeting with an author’s ideas, content and substance gain a quality they would otherwise not possess. This is because a reader brings in his own thoughts, his experience and his feelings. Marlon Brando, who started his career by playing Shakespeare on Broadway, said in an interview that unless the reader gave something to it, he would not take anything from a book or poem. One could not fully understand what a writer was writing about unless oneself had some corresponding depth, some breadth of assimilation. Computer-aided analysis of literary style may completely leave out the reactions of the reader. The responsibility would be shifted to the “expert” computer who would do all the thinking and linking. Will human and humane feelings in literature analysis be entirely discarded and computer-encoded responses become the controlled measure of all literary works?

How would a computer “communicate” with a piece of art? Admittedly, it could be fed with a few individual images and then be programmed to boil them down into generalisations which it would apply whenever it encountered a digital approximation of meaning pre-programmed or assigned to a particular word or combination of stylistic devices. If a more sophisticated programme should ever exist, it might even be able to match two or three literary devices from among the thousands of possible combinations and relate to a particular phrase, sentence, or paragraph. But how would the computer attribute sense to what it finds out? In its binary interaction with a piece of literature, the computer would compare its rigid, predetermined and static programme with the real world of natural experience and communication processes inherent in a piece of literature. How would a computer know, for instance, that a horse and its metaphorical or symbolic meaning in one piece of art might not be the same in another? In what way would a computer know why a particular word has been chosen over another and why certain words have been placed side by side to create a certain effect which may be lost if one word is exchanged for another? Not only must a great number of dictionary definitions and some of the most common examples of collocation be fed into a computer but it must also be enabled to distinguish non-compatible synonyms. However, the computer must also be able to “intuit” an author’s conceptual understanding of his private and personal usage of any word, even if his meaning varies only in very subtle degrees from common usage. Would it be sufficient to feed dictionary definitions into a computer, which are only a short abstraction of real-life usage? Would it suffice to give the computer knowledge about a sandbox-life which the computer could never relate to experiences of his own?

One has to concede that word choice is an intrinsic part of style. Still, how does the mere counting of words cast light on meaning? A key -word may be used deliberately or, sometimes, without the author’s awareness. It may be used only once and would gain significance in its context; it may be used soberly or passionately, prudently and meticulously, it may be used with missionary fervor or calculatingly only once, while another, quite insignificant word may appear frequently. This is because sometimes, even in this rich English language, there is a lack of other words that express the author’s intention adequately. How does one programme the computer to know that quantity or frequency does not equal quality or essence?

Technically, all stylistic devices of sound could be made visible on a monitor. Yet, would the computer “compute” the “right and only” meaning to them? Could it relate meaning created by specific sound patterns distributed over several pages to the main theme, to the pivotal points of that particular piece of art or could it judge them to belong to secondary ideas only? More importantly, would the computer be able to assimilate many other literary devices, such as symbolism, irony, puns, hyperbole, metonymy, oxymoron and above all, conceptual metaphors, which may all run parallel to meaningful sound-patterns, into a coherent context? Could the computer make intelligent distinctions between several possible interpretations that belong to the realm of surface meaning? More importantly, could it delve into deeper regions and soar into higher spheres by evocations and connotations created by sound patterns and other literary devices sandwiched onto sound? Would the computer be able to synthesize meaning from several layers of literary devices, from among the hundreds of composite elements of style, from the multi-level flow of sound and imagery?

How can the most common of hundreds of literary devices other than those describing sound – for instance, metaphors, similes, oxymoron and symbols – be made visible on a computer’s output device, since most stylistic devices do not function through sound? Literary devices, or rather stylistic devices (since most authors could not care less what they are called) are the author’s medium of expressing their ideas. They are their musical instruments with which they make images audible and they are their palettes transposing sound into images. Their artistic reflections, observations, contemplations or speculations, after having gone through the alembic of their inner worlds, become a unified piece of art and very often, the symbolic language used by authors renders their work seemingly unintelligible, requiring technical, perhaps arcane, and sometimes-abstruse knowledge on the part of the reader. Even the most dedicated readers of literature may not be able to understand a work of art in its entirety and some may not be able to see beyond its storyline. At the most, they may notice the pleasant effects that can be created by sound. How can a computer programmer, whose forte is probably not the appreciation of literature, programme a computer “to see” beyond sound, to read between the lines, or to recognize a sustained sound-pattern and describe its effects?
Above all, how does one programme the integrating principle which distinguishes between nonsense and sensible ideas, and which may, through flashes of insight or intuition, arrive at new ideas? That integrating principle has not been found yet – that all-important assimilation and joining-device that is capable of attributing sense to an infinite number and variety of external stimuli and to the internal, invisible, silent and yet ever-changing world of thought-configurations.

COMPUTER-AIDED LITERATURE STUDIES REVISITED IN 2011

Still Begging the Question after 28 Years

After almost 28 years, there is still no progress in the field of computer-aided textual content analysis. Computer systems have proven to be very poorly suited to a refined analysis of the overwhelming complexity of language. Conclusions drawn by people working in this area are equivalent to crystal-ball-gazing. Due to the absence of any results, the future role of computer-aided criticism is often still invoked. Up until now, computer-supported analysis of texts has not yielded any important or new results which cannot be obtained by close reading. Therefore, computerized textual research has not had a significant influence on research in humanistic disciplines. Many explanations as to why there are no useful applications with regard to the subject matter sound like feeble attempts to justify the use of computers in this field at all costs. Catchphrases used to this end are “a shift in perspective needed”, “asking new questions deviating from traditional notions of reading texts”, and “the need for new interpretive strategies” or “a modified reader response”. They all refer to hitherto unknown structures not readily apparent which are hoped to contain vital elements of literary effects. If they cannot be recognized by humans, are they important at all? This would be tantamount to assuming that authors create a “subconscious” pattern over sometimes even several hundreds of pages. What are we actually missing?

Stylistics and reader response seem to be treated as two different approaches and both methods are deemed “problematic” when it comes to assessing the literary effect measured in the text itself or in its supposed impact on the reader. The author’s intent expressed in his communication with the reader and meaning are deemed more difficult to quantify than matters of phonetic patterns and grammatical structures.

Authorship studies are one area where computers can be used effectively even though very little analysis is performed by the computer itself. Very small textual particles and word clusters selected and indexed by humans are run through computers to establish an “authorial fingerprint”. Thus, complex patterns of deviations from a writer’s normal rates of word frequency are measured. There are other patterns which can be used in such authorial tests like compound words and relative clauses.

Interestingly, professional translators hardly use machines to do their translations with. I know from my experience that it is harder to edit a machine-rendered translation than translating it again from scratch. However, translators sometimes use computer-aided translation devices such as Trados or Word Fast, which contain the complete memory of all translations a translator has ever done with this tool. With such a CAT tool (Computer-Aided-Translation), one can determine the number of words to be translated and hence suggested as one goes through the translation. As to the quality of such translations, very little research has been done in this field but I surmise that the resulting style may sound “bitty” if other than technical texts are translated.
It would be interesting to see how computer-aided-criticism software and machine-translation software cope with the hundreds of “Local Englishes”, and, being one of the major subject matter of this weblog, with “German English ” or  “Local English, German Version” in particular. Would this be too taxing a task as it is often for human translators when they need to waste a great deal of time guessing the meaning of idiosyncratic word coinages and very private grammar “novelties”? It must be a formidable task to write software that can handle the frequently faulty and incomprehensible English one finds in these hundreds of Local English varieties.

More Crystal Ball Gazing?

When I searched the Internet for the latest development in computer-aided exploratory textual analysis of literature, I was surprised to find that I had not been wide off the mark in my assessment 28 years ago. As to the future development, AI experts think that low-level, algorithmic mathematical languages will be succeeded by high-level modular, intelligent languages which, in turn, will finally be replaced by heuristic machines. They would be capable of learning by mistakes, through trial and error, of operating like human beings.

Advertisements

Update: Dictionary of Local Englishes – New entries

For more dictionary entries and a more detailed description of the terms “Local Englishes, German Version” and “German English”, please go to go to Page:

A Dictionary of “Local Englishes, German Version”


Introduction

 Creeping in through the back door

Hardly noticed or acquiescently accepted by native speakers of English and non- native speakers alike, the many Local English versions, and in this particular instance “German English”, are often substandard, frequently unnatural or unidiomatic and therefore hard to understand, apart from being in many cases incomprehensible balderdash. All too frequently, words have become ambiguous catch-alls which have been emptied of dictionary meaning so that they might fit any experience the speaker would not take the trouble to define. However, one must admit that the latter is probably true for all languages.

Elevating mutilated, hard-to-understand and difficult English to the ranks of Standard Englishes

Local English coinages aggravate this situation. New words that sound and look English but which aren’t English are being invented with reckless incompetence and flaunted as indelible evidence of the true standard of German English in all media.

Some may call it a mongrel language, a bastardized, distorted and degenerated version of native-speaker Standard English while others may claim that it is mankind’s next step on the evolutionary ladder. The latter group of people will find this a welcome and useful guide to “enriching” the English language. The uncritical proponents of the doctrine “anything goes” may even think that Local Englishes are the definitive aid to increasing the word power and language proficiency of learners and students at all levels.

A university lecturer once wrote in her description for her “Varieties of English” university course. “It is not wrong English. It is just different…” Palliating, extenuating and explaining away the deficiencies of Pidgin-English-like Local Englishes is the easy way of dealing with this problem, thus elevating Pidgin English to the ranks of Standard Englishes. Conversely, the majority of native speakers are in all likelihood completely unaware of this downgrading of their tongue.

A sort of barely elevated Pidgin which sounds like incompetent patchwork

I began my humble collection of “different”, yet often hilarious English words about 10 years ago. That was about at the time when you would still find people criticizing openly the sort of English spoken in Germany. Today, you will be hard put to find realistic assessments like the following, made by Klaus Reichert the president of the “Deutsche Akadamie für Sprache und Dichtung” (Society for Language and Poetry) in a newspaper interview almost ten years ago: “What we take for English is often only a sort of barely elevated Pidgin which sounds like incompetent patchwork.” Mr Reichert`s noble intention to stem the “foreign infiltration” of the German language has, unfortunately, failed completely for reasons beyond his control.

All entries in this mock-dictionary are fully documented by either downloads, print-outs, screen-shots, newspaper clippings or copies of original fliers, brochures, etc.

New entries:

reservated

If you happen to see a taxi cruising any of the streets in Germany with a notice up in one of the windows saying „Reservated“, it will be no use hailing this particular taxi. Yet, many a social-worker-type-teacher of English may be inclined to say that it is “close enough” to the intended “Reserved”. According to the native-speaker radio presenter reporting this item of news, it was not correct English, but good enough for a German. Come to think of it, the expression “good enough” is sometimes seen on websites regarding the performance of a particular software, meaning that a particular software is not as good as it should or could be but good enough for a certain segment of the market.

“brain-up”

The “brain-up” initiative was launched in Germany in 2004 in search of Germany´s top universities. This overzealous drive at establishing new standards of excellence for Germany’s elitist universities has led to this award-winning Denglish coinage. To all those who expected a power booster pill wrapped in blue sugar-coating to help people sustain the peak of cerebral passion, it must have been vastly disappointing.

“Get out”

Announcements on Hannover’s trams are now made in impeccable English. But this was not always the case. More than 20 years ago, it was quite different when the then novel system of automated announcement was introduced. When reaching the last stop, passengers were asked to “get out”. Admittedly, you could expect to find a terse expression of this sort in Germany but this one was over the top. The effect was hilarious, caused quite a stir and was reportedly widely in local newspapers. When the culprit was asked about his word choice, the German translator said apologetically that it said so in a dictionary.

“…a few steps and you are in the green”

This word for word translation from German could be found on a website describing the location of a hotel. It was situated adjacent to a city-park on one side and a large city forest on the other. It only proves that such monstrous examples of German English are no longer confined to teachers` lounges and translators` offices but can be shared globally. In the meantime, the hotel in question has a website that was completely revised by a native speaker.

“…and follow the restrictions of the HACCP”

What a nasty nuisance these limitations are! I found this bit on a website when translating another. This example confirms the axiom than non-native speaker English all too often lacks the subtlety necessary to express issues that are of great concern. The HACCP (Hazard Analysis Critical Control Point) is an internationally recognised scheme of food safety standards. Here are some suggestions for a more responsible translation:

“…and abide by the rules and regulations laid down in the HACCP directives.”
or:
“ …and we meet the demands laid down in the HACCP regulation.”
or:
“… and our procedures are in compliance with the food and safety requirements of the HACCP regulation.”

„Please check your coats and bags“ (notice in a library)

Do not panic when you read something to this effect in Germany. You are not expected to check your grooming. Neither are you required to have a quick glance to see whether your bag’s body has a lovely sheen and is otherwise spotlessly clean.
In German English, this is used for „check in your coats and bags“.

“Please put out the television”

Again, no need to be alarmed. No need to go looking for a fire blanket or fire extinguisher. In all probability, your host’s television set is not on fire. In German English, it simply means “Switch off the telly.”

More highlights of “Local Englishes“

In case of fire, do your utmost to alarm the hotel porter. ( Vienna)

Please leave your values at the front desk. (Paris)

Customers who find the waitresses rude ought to see the manager. (Kenia)

Patrons are requested not to have children in the bar. (Norway)

The lift is being fixed for the next day. During that time we regret that you will be unbearable. (Bucharest, Romania)

You are invited to take advantage of the chambermaid. (Japan)

All the above highlights are taken from:
http://www.caterersearch.com/Home/ (2005)

Please to try the tarts. They are ready for you on the trolley.

This is from a flyer enclosed with the menu in a luxury hotel in Egypt.
(The Book of Mistaikes, Gyles Brandeth, First Futura edition 1982 (UK),
Copyright©) Macdonald & Co (Publishers) Ltd and Victorama Ltd.

Pending the outcome of nationwide discussion:

“She’s got a knuckle in her eye” (from the original song lyrics)

“Knuckle” is the bone of contention here. An extensive internet search with about 10 search engines did not yield one single collocation with “got a knuckle in * eye”. All results found are from the Local English domain “de” and refer to the song in question.  It would be easy to dismiss this coinage as easily classifiable as local gibberish. There are, however, two problems. The song writer is American but the term in question doesn’t seem to be native-speaker English.  Apart from that, many Germans had problems with this passage as the transcripts of the lyrics discussed in internet forums show. Before he official song lyrics had been published,
fans had replaced “knuckles” with “luck” and thus changed the meaning: “She’s got luck all in her eye”.

The official version goes like this: “She’s got a knuckle in her eye”. It appears that to those fans who had preferred “luck”, “knuckles” did not make much sense.
If there was some cock-up when the song was recorded, then we will never learn. Who would confess to being a bungling bunch of beta-performers? If it was really meant to be “She’s got a knuckle in her eye”, then the Local English in Germany will be enriched by a “meaningful and important” coinage.

And for good measure another bone of contention a bit further down in the song lyrics:

“He drops a pause”

An internet search did not yield any results at all on native-speaker domains. Nonetheless, in this case one could argue that the song writer has used his “artistic” licence. Incidentally, there was a similarly heated discussion going on on the German domain “de”. The point of this discussion was, again a novelty again due to devoted fans trying to transcribe the song before the lyrics had officially been published. The alternative to “He drops a pause” was “He drops a puss”, again due to bad pronunciation on the part of the singer.

For more dictionary entries go to Page:

A Dictionary of “Local Englishes, German Version”

at: https://sanchopansa.wordpress.com/a-dictionary-of-local-english-german-version/