Home | Papers | Blog

Wybo is a sociologist and PhD/DPhil student at the Oxford Internet Institute.

He studies online social behaviour. Especially: How social media affect protest movements such as the Arab Spring and Occupy Wallstreet. More...

Also: MSc in Social Science of the Internet, MA in Digital Humanities (distinction), 3 BAs (firsts/cum laude) in History, Philosophy of Information Science and Information Science

Papers feed

Articles tagged "Linguistics"

Applying Language Technology to Detect Shift Effects

(There is a later paper on this)

This paper discusses an application of a technique to tag a corpusautomatically and to detect syntactic differences between two varieties of FinnishAustralian English, one spoken by the first generation and the other by the second generation.

The technique utilizes frequency profiles of trigrams of part-of-speech categories as indicators of syntactic distance between the varieties.

The paper examines potential shift effects in language contact. Results show that some interlanguage features in the first generation can be attributed to Finnish substratum transfer. Other features are ascribable to more universal properties of the language faculty or to “vernacular” primitives.

A Measure of Aggregate Syntactic Distance

(There is a later paper on this)

We compare vectors containing counts of trigrams of part-of-speech (POS) tags in order to obtain an aggregate measure of syntax difference. Since lexical syntactic categories reflect more abstract syntax as well, we argue that this procedure reflects more than just the basic syntactic categories.

We tag the material automatically and analyze the frequency vectors for POS trigrams using a permutation test.

A test analysis of a 305,000 word corpus containing the English of Finnish emigrants to Australia is promising in that the procedure proposed works well in distinguishing two different groups (adult vs. child emigrants) and also in highlighting syntactic deviations between the two groups.

Filled Pauses as Evidence of L2 Proficiency:

Finnish Australians Speaking English

(There is a later paper on this)

e paper discusses the application of the technique described here to detect the linguistic sources of the syntactic variation between two groups, the ‘Adults’, who had received their school education in Finland, and the ‘Juveniles’, who were educated in Australia.

The main – and perhaps expected – finding was that second language learners pause more often.

Also see this paper for more details.

Automatically Extracting Typical Syntactic Differences from Corpora

We develop an aggregate measure of syntactic difference for automatically finding common syntactic differences between collections of text.

With the use of this measure, it is possible to mine for differences between, for example, the English of learners and natives, or between related dialects.

It enables us to find not only absence or presence, but also under- and overuse of specific constructs and allows for testing hypotheses for statistical significance.

Our earlier publications on it are: crude version of the method, testing it, applying it, and applying it a second time.

Detecting Syntactic Contamination in Emigrants:

The English of Finnish Australians

(There is a later paper on this)

The paper discusses the application of the technique described here to detect the linguistic sources of the syntactic variation between two groups, the ‘Adults’, who had received their school education in Finland, and the ‘Juveniles’, who were educated in Australia.

The results show that some features described here as ‘contaminating’ the interlanguage of the Adults can be best attributed to Finnish substratum transfer. Other features in the data may also be ascribed to more ‘universal’ primitives or universal properties of the language faculty.