Tim's World of Web: October 2013

Tuesday 29 October 2013

MSc Web Science - Week 4

A big event this week was the University IT, Science and Engineering Careers Fair. As well as picking up more free pens than I'll ever need and explaining to reps what Web Science is (variations on 'making the web better for future generations'), I had a very interesting chat with someone on the Sagentia stall. The stall featured a data collection device for use in the oil exploration industry that handled 15TB of data per 24 hours using the SEGD format. These devices are regularly collected from rigs by helicopter and are delivered to an oil industry organisation. The data is analysed for sale to the oil industry, and is stored in underground silos.

The collection process is called Permanent Reservoir Monitoring (see also ORC site), is directly related to exploring Life of Field issues, and is vital in the development of new methods of extracting oil from 'exhausted' fields (only 30% of oil is currently extracted - gaining an additional 1% is extremely profitable). This may be useful for my Independent Disciplinary Review project, which is looking at methods of approaching the open sharing or data by industry.

Quantitative Research Methods

We explored hypothesis testing in more detail this week. I'm still trying to get to grips with the link between theory and the practical use of SPSS software. My main insights this week are: that the null hypothesis means there's no difference in variance in the sample, and the alternative hypothesis means that the variance is different.

There are 4 basic steps to testing the hypothesis:
1. Specify Hypothesis test and level of significance (‘different’ = 2 sided, more or less = 1 sided)
2. Select random sample (mean, standard deviation, sample size)
3. Calculate test statistic using random sample
4. Make a decision - based on significance level, comparison with z-value and/or p-value.

If the p-value is less than significance level (alpha), reject the null hypothesis.

I also checked out the open source program PSPP to see if I could use it to replace the licensed SPSS program we're currently using. PSPP is pretty good, and the results are the same (to two decimal places instead of 3), but PSPP does not do graphing (yet) - so I'll stick with SPSS for the time being.

Computational Thinking

This week we looked at programming languages from the early days (including PDP11 among many others) and were put into groups for the assessed teaching and public presentation projects.

Independent Interdisciplinary Review

I'm continuing to read up on anthropology for the assessed project and have been particularly interested in theory related to reciprocity and self-regulated systems (cybernetics).

Hypertext and Web Text for Masters

We had a packed programme of study this week - exploring the historical antecedents for and different approaches to hypertext, This including Paul Otlet's development of the Mundaneum. His masterwork, Traite de Documentation (Otlet, 1934) is not yet available in English, but translations of some of his work have been published. There were also brief overviews of the work of Wilhelm Ostwald, who developed the concept of linking literature to small units of recorded knowledge (‘monos’) that could be arranged and linked with other units and was instrumental in establishing the Die Brucke Institute in Munich - place to find all knowledge (and invented a paper size system - A4 etc).

Also under consideration were American contributions to hypertext including Vannevar Bush (human thought works on links between concepts), Doug Englebart's first computer mouse (developed at the Augmentation Research Center and demonstrated at the Mother of All Demo's in 1968) and Ted Nelson's Project Xanadu (and his Dream Machines).

We also looked at hypertext systems: HES/Fress (1967), ZOG (1975), Knowledge Management System (KMS, 1983), Hyperties (1983), Intermedia (1985), NoteCards (1985) and Hypercard (bundled with Apple Mac, 1987).

Useful links:
Akscyn's Law
Jeff Conklin, 1987. ‘Hypertext: An introduction and survey’. IEEE Computer, 20(9), pp.17-41 Available at: http://www.ics.uci.edu/~andre/informatics223s2007/conklin.pdf
Cal Lee, 1999. Where Have all the Gophers Gone? Why the Web beat Gopher in the Battle for Protocol Mind Share. University of Michigan, School of Information. Available at: http://www.ils.unc.edu/callee/gopherpaper.htm

Types of hypertext systems include:

Macro Literary Systems - large online libraries
Problem Exploration Tools - problem solving, early authoring and outlining, ‘mind mapping on steroids’
Structured Browsing Systems - single machine front-end
General Hypertext Technology - platforms that allow experimentation

Foundations of Web Science

For the rest of the semester we will be reading (in great detail), talking and writing about the social construction of science.

Monday 21 October 2013

MSc Web Science - Week 3

"Catch up, cats and kittens. Don't get left behind..." (Monkberry Moon Delight, Paul McCartney, 1971)

This line from a song has been going through my head all week - an annoying 'earworm' but also an important note to myself to get the balance of home, study and work right. So this weeks' post is brief and to the point - no prevaricating around the bush.

Quantitative Research Methods
This week I found out about hypothesis testing and t-testing (for small samples). I dislike Powerpoint-driven lectures, but in this module the tutors use them to build a narrative about using different approaches to analysing datasets, and (for me) it works reasonably well.

Computational Thinking
This week we moved into the self-study phase and I spent much of the time working out the deadlines and requirements for the assessed work. Something that is not easy to do, as module information is presented in different ways in different locations, and it isn't always clear if it's up to date.

Independent Interdisciplinary Review
I will be looking at corporate policy on open data via the disciplines of Anthropology and Economics - for more details see my blog post.

Hypertext and Web Text for Masters
More talk and Powerpoint. I am very much looking forward to working either individually or in groups on this topic.

Foundations of Web Science
We had a very brief group-centred debate on "Do Artifacts Have Politics". I took notes and acted as rapporteur for our group, and uploaded our outcomes to the class wiki (we're 'Team Alpha').
I've also been thinking about pop culture representations of the technological-determinism, social-determinism debate, sparked by watching the 2009 film, Cloudy with a Chance of Meatballs during the week. The film neatly rehearses some of the arguments and presents some well-drawn popular stereotypes involved in technological development, as well as playing out myths about the unforeseen consequences of technology. Another film on the topic -The Man in the White Suit, also comes to mind, so this could be a useful theme to follow.

Digital Literacy Student Champions
I produced an online resource and ran 3 x 30 minute Wordpress Workshop sessions with third years taking the 'Arab World' Curriculum Innovation Programme module.

Sunday 13 October 2013

MSc Web Science: Week 2

Firing up the 'Pi's/Tim O'Riordan 2013/CC BY 2.0 UK

The highlight of the week was definitely the Raspberry Pi session; an hour long workshop that was added to the end of classes on Thursday. The Raspberry Pi is a tiny and cheap computer that has been developed to provide a budget platform for practicing programming. Because it's cheap (about £30), it can be used to run dedicated processes, like controlling a pyrotechnic display, or carrying out passive surveillance.

The workshop provided an excellent adjunct to the introduction to writing Python script earlier in the day, as the task involved writing a few lines of code that enabled a Raspberry Pi module to interact with an external object (a jelly baby) via two wire probes. Essentially, the jelly baby completed a circuit which triggered the playing of an audio file of someone singing. Charming, and slightly weird.

The team behind the Erica the Rhino project were also on hand to provide some inspiration for projects that we have been asked to undertake later in the year. At the moment I'm trying to come up with a project that involves recording video in public places, but which doesn't compromise privacy.

Quantitative Research Methods (QRM)

This week I had my first experience of using SPSS software to explore datasets, and in class we moved on to consideration of two of the basic concepts covered by this module: Confidence Intervals (CI) and the Central Limit Theorum (CLT). CI is the range of values within which it is expected the true value of a population will lie (within a degree of confidence, e.g 95%). Because the mean varies depending on each sample that's taken from a given population, we need to construct a range of values to provide confidence. This is done by taking sample means from the population many times (e.g. 1000). The CLT states that regardless of distribution of the variable in the population, the results of these multiple samples will be normally distributed.
There is no truly object way of defining confidence, but using this method we can show that the true value lies between two values. Confidence is based on the sample size - essentially the bigger the sample the better the confidence. However there are diminishing returns beyond sample sizes of around 1400 - at least I think so.

Computational Thinking

In the lab we undertook some basic programming using the Python GUI. This is a very popular, informal, flexible, dynamical language used by Google, and others, to control their internal systems. It's also a useful stepping stone to Java programming.
The programming exercise involved developing, in stages, a 'Hangman' program.

Independent Interdisciplinary Review

A further discussion on what is required for this module included a recent MSc Web Science student, taking us through his experience of writing for this module. His IIR explored changes in how Intellectual Property is understood on the Web via the disciplines of Economics and Law.
The task for this week was to choose a topic and disciplines, and add a blog post to the COMP6044 blog site - providing an outline, justification and bibliography. I have decided to look at corporate data sharing through Anthropology and Economics lenses.

Hypertext and Web Text for Masters

This week we were introduced to HyperText Markup Language (HTML), Extensible Markup Language (XML), Cascading Stylesheets (CSS) and Extensible Stylesheet Language Transformations (XSLT) - all of which originate from Standard Generalized Markup Language (SGML). XML adds flexibility to HTML and contains elements (in hierarchy), attributes (label elements), entities (contain document fragments), and DTD (document type definitions).
We also touched on the latest version of HTML, HTML5, which includes new more appropriate tags and recognises new structures that are useful for search engines and usability. Browser adoption of HTML5 is patchy, but it is gaining ground.
Stylesheet languages (e.g. CSS and XSL) ‘separate concerns’ and allows users to concentrate on content, as layout and design are defined elsewhere. CSS attaches ‘missing’ semantics, complexity and processing instructions in XML. CSS decorates, but does not build - XSLT does both.
From past exam papers it looks like there will be a question that requires a reasonably thorough understanding of XML.

Foundations of Web Science

This week we discussed the development of the Web from a social shaping perspective (as opposed to a technological determinist viewpoint). We were presented with a list of technology and social developments - from the discovery of electricity, wireless telegraphy, the Cold War, to the development of the social web - and asked to discuss them, and how we shape our technology, in small groups. A number of areas that have been overlooked so far include the failure of Soviet attempts at networking, the importance of Federal funding of the National Science Foundation to the development of the early Web, and the enduring fascination with celebrity which drives much of the social web. However, and possibly significantly, an early attempt to classify all the world's knowledge, the Mundaneum, was mentioned.
We also continued to explore our personal use of the Web. Most of the class use mobile devices for traversing the Web, and start browsing early in the day.
We were asked to examine our individual Web use and communicate our understanding of it via a diagram - and add this to the class wiki. Many in our class produced interesting 'infographics' (some using the infogr.am online infographic tool) which classified their Web use by actvity (e.g. 'work', 'leisure', 'diy'). I found that my attempts to classify my own activity beyond family communication or personal interests seemed to be arbitrary and unhelpful. Does adding complexity to this area help? Probably not. Can my simplified Dial-e framework (stimulate, analyse, investigate, create) be used to categorise Web activity? I think so - but I need to explore this further.

Digital Literacy Student Champions

I met with Lisa Bernasek and arranged to run short 'Wordpress 101' sessions during three of her "The Arab World (in and) Beyond the Headlines" classes. The purpose is to get her 60 students publishing within days and with confidence. My aim is to run these sessions so that all participants will have started a draft of their first post by the end, and would have gained a clear appreciation of what they can do as authors within Wordpress.

Friday 4 October 2013

MSc Web Science: Week 1

First day of Web Science/Tim O'Riordan 2013/CC BY 2.0 UK

Some notes on my first week on the MSc Web Science course at Southampton.

Quantitative Research Methods (QRM)

The tutor, Nikolaos Tzavidis, posted the notes and slides for the first two lectures on Blackboard and emailed the class 4 days ahead to let us know. I like Nikos!
As a dyed in the wool qualitative researcher I am a little suspicious of quantitative methods, but this module is very much geared towards the absolute beginner - and I am being won over.
In the first class we were introduced to mean, modes and median scores; categorical (nominal and ordinal) and continuous sampling and the concepts of whole population and sample-based research (I may have got some of the terminology wrong there).
The good news is that, although we are doing some maths at the start (exploring Normal Distribution Curves), we don't have to memorise it all. The bad news is that we have to learn a new program - SPSS - which will help us find the answer to everything.
There are some problems in getting hold of the readings for this module. The text book on SPSS (Discovering Statistics using SPSS) is reference only and can't leave the library. We've been told to read chapters 1 to 4, but I haven't had time to spend in the library. I can photocopy one chapter to take away and have a whopping £26 on my photocopy credit - but my card has been locked out of the system! Also need to read chapters 1-7 of Diamond and Jefferies' Beginning Statistics: an Introduction for Social Scientists. Fortunately one of the readings (Quantitative Data Analysis in Education) is available online.
I'm also reading up on some old school research methods as described in Martin Mayers' 1958 publication: Madison Avenue U.S.A.. The book gives a very thorough account of the problems of selecting true samples, and of getting truthful responses from interviewees.

Computational Thinking

This module is run by Les Carr and Hugh Davis, and the early message is that they hope to stimulate our interest by teaching basic computer architecture, and through introducing us to Python programming using Raspberry Pi's. The assessed components are two group projects ( a presentation and a 6th Form teaching activity), and a blog-style article
This module, along with the rest of the Electronics and Computer Science (ECS) modules is not supported by Blackboard - but by the ECS's own intranet.
On Friday Hugh took us through 'computers 101' - on which we will not be assessed. He covered lots of useful stuff: transistors, logic gates, bit comparators, 1 bit algebraic logic units and memory.
One of the readings for this module is Broadshears' Computer Science Overview, which is available online in pdf format (yay!).
At the moment I'm considering using Compenium L D as a tool for designing the learning activity.

Independent Interdisciplinary Review

The title is self explanatory. In this module I am required to study two disciplines that I have no previous experience of but which are relevant to my interests, and produce a 12 page report with accompanying poster that demonstrates my understanding of the primary concepts underlying both disciplines (the ontologies, basic theories and methodologies) and draw them together to tackle a problem. The idea is to use this exercise to "pilot interdisciplinary engagement".
Starting with a description of the question (e.g. "How might corporations be encouraged to open their data?"), I will explain why I have chosen the two disciplines (e.g. Anthropology and Economics), describe each discipline and how they might approach the problem and conclude with a suggestion of how the two approaches could be brought together.
I need to decide what my approach is by week 3, and a weekly blog outlining my study of each discipline is also required.

Hypertext and Web Text for Masters

This is the biggest class (about 120 students) containing some undergrads. Les undertook a straw poll on online usage, only 3 owned up blogging regularly, and 4 to uploading videos to YouTube.
I learned that:

The host address 86.2.3.1 is better known under its Domain Name System (DNS) name: www.google.com.
The meaning of status codes (e.g. 200 = OK)
Uniform Resource Identifiers (URIs) should be persistent.
There are 5 stars of linked data.
Web architecture is made up of 3 key parts: identification, interaction, and formats.
Les sang us a song about this, to the tune of Mary, Mary, Quite Contrary...

TimBL, TimBL, very nimble,
How does your linked Web grow?
With URLs and HTMLs,
And GET and POSTS all in a row.

Nice.

Foundations of Web Science

This is about the social impact of the web. In the first two classes we looked at the how we use the web, looked at categories of behaviour and interconnectedness (including an exploration of the World Trend Map).

3 months personal browsing (produced by ECS History Visualiser)

In the last class on Friday Les asked us to consider what sites we visit a lot, what we value, and what is significant, and to use the History Visualiser to present out browsing activity over the previous 7 days. Grabbing browsing history in Chrome isn't straightforward and requires a third party application - Chrome History View - to export a list that's usable in History Visualiser.
Unsurprisingly my visualisation shows a lot of activity on Google (I use Drive and search a lot), Facebook (I have two channels and use it to communicate with family, friends - and my new WS buddies), YouTube (I post a lot of videos), and the University site.
What I value is the ability to find out things very quickly, and test validity through 'informal triangulation'. For example, on Friday I received a message from client with a problem DVD who needed a quick response from me. Using Google search, I was able to find other people who had the same problem, gauge the issues' importance and check - and double check - the solution, before getting back to my client within 30 minutes. This would not have been impossible without the web.
There's quite a large reading list for this module, but I've started with a book from my collection: Ed Krol's 1992 ground-breaking, The Whole Internet User's Guide and Catalog. This book was published on the cusp of the development of the Web as we know it today, it does not mention HTTP or HTML, and refers to the Web as "the newest arrival from the Internet's toolshop" and "probably the most flexible tool for prowling around the Internet".
While this has much arcane interest, the chapters on the development of the Internet, ownership and management are fascinating.
The central argument presented in this module is that the Webs' success is based firmly on academic freedom and the willingness of the academics involved in the project to freely share their ideas. While this is undoubtedly true, my interest in finding ways to enhance sharing on the Web tells me that this altruistic motivation is not universally shared by all of academia.
My thesis is that the central motivations for the Web's development stem from a particularly American attitude to the rights of government, a belief in the efficacy of free trade, and a freedom of capital which encouraged venture investors to give early support to current Web mainstays. The prevalent attitude to government in the US is one of "we've paid for it - we own it", which leads to the Federal government sharing data and artefacts that have been created through the application of taxpayers money. This is not a universally shared attitude - on the photo sharing site Flickr, compare US military's sharing of images with the British MOD, or the Library of Congress' attitude to its collections with the UK's National Archive. The position to government ownership exemplified by Crown Copyright in the UK does not exist in the US, and the attitude that sees a large amount of Federal government stuff "go back to the people" was I believe vital to the early stages of the Web's development. Although proposed and developed by Tim Berners-Lee, a Brit working on a European science project (CERN), the Web as we know it today could not have existed without this very American liberal mindset.
I admit that I may have overstated my case here, but I'll see if it takes me somewhere useful over the next few weeks.

Digital Literacy Student Champions:

I've registered to run workshops about using online media to support teaching and learning, and have one booking already! To run a Wordpress blogging workshop in two weeks time.