从无处看世界:大数据的文化意识形态(翻译贴)

avatar 76733
lillian0
2878
1
欢迎指导批评!非专业!
thenewinquiry.com

从无处看世界:大数据的文化意识形态
View From Nowhere:On the cultural ideology of Big Data

“无论什么历史年代里,科学的走向取决于我们如何理解科学” --- Sandra Harding,《谁的科学?谁的知识?》(1991)

[align="left"]“What science becomes in any historical era depends on what we make of it” —Sandra Harding, Whose Science? Whose Knowledge? (1991)

一直以来,对于知识的不安全感和其急切想要掌握终极理论而因自身认识论只能导致对世界认识越发缺乏的破灭感,深深困扰甚至定义了现代性。新知识和新认知方法在出现的同时也带来了新的非知识(nonknowledge),新的不确定因素和谜团。基于推演和可证伪性的科学方法实际上更适合产生问题而不是解决它们。比如说,爱因斯坦关于空间曲率和量子力学下运动的理论既带来了新知识,也让前所未有的新非知识进入我们的想象范围。
Modernity has long been obsessed with, perhaps even defined by, its epistemic insecurity, its grasping toward big truths that ultimately disappoint as our world grows only less knowable. New knowledge and new ways of understanding simultaneously produce new forms of nonknowledge, new uncertainties and mysteries. The scientific method, based in deduction and falsifiability, is better at proliferating questions than it is at answering them. For instance, Einstein’s theories about the curvature of space and motion at the quantum level provide new knowledge and generates new unknowns that previously could not be pondered.
[/align][align="left"] 因为理论对于我们世界观的破坏力和它建立这个世界观的力量一样大,对于产生知识的集体狂热同时创造了和这种狂热程度一般的徒劳感,我们需要在这紧张的气氛里宣泄——哪怕仅仅只是一个瞬间,我们也希望体验那种对于某种事物确切把握的感觉。在现代社会里,大数据的出现满足了大家需要宣泄的心理。
Since every theory destabilizes as much as it solidifies in our view of the world, the collective frenzy to generate knowledge creates at the same time a mounting sense of futility, a tension looking for catharsis — a moment in which we could feel, if only for an instant, that we know something for sure. In contemporary culture, Big Data promises this relief.
[/align][align="left"]
[/align][align="left"] 如名字所示,大数据是关于“大”的理论。很多大数据的支持者声称利用大规模数据库前所未有的海量信息可以揭示全新的真理。而且大数据之“大”也暗示着质的不同:当数据累积到一定数量时,数据变成了大数据,很多新兴的公司和关于大众市场的社会科学书籍将之称为“知识的革命”。因为其不同于一般科学对于信息的简单收集,大数据被吹捧为全新的知识,是社会生活的新启蒙运动。当然这一切都是因为“大”。
As the name suggests, Big Data is about size. Many proponents of Big Data claim that massive databases can reveal a whole new set of truths because of the unprecedented quantity of information they contain. But the big in Big Data is also used to denote a qualitative difference — that aggregating a certain amount of information makes data pass over into Big Data, a “revolution in knowledge,” to use a phrase thrown around by nytimes.com and big-data-book.com. Operating beyond normal science’s simple accumulation of more information, Big Data is touted as a different sort of knowledge altogether, an Enlightenment for social life reckoned at the scale of masses.
[/align][align="left"] 就像其他类似的推理性科学( inferential sciences ),比如演化心理学(evolutionary psychology)和流行神经科学(pop-neuroscience),大数据可以被用于给任何猜想涂上科学的外衣,并给出一些看似权威的数字——大到可以让任何说法都像真的一样。因此,大数据不但在整个工业界非常流行(它的名字就是“预测性分析”),并且在学术界、企业或者政府研究里也有大量的拥趸。大数据也促进了“数据新闻业”(data journalism)的崛起, 比如FiveThirtyEight、Vox和其他越来越多的分析网站(explainer sites)的出现。它还转移了这些行业的重心,这一切不但是因为它宏伟的认识论断言,也要归功于大数据研究充足的资金。像推特(Twitter)最近就公布,它将投资1000万美金在“社交机器”大数据实验室上。
As with the similarly inferential sciences like evolutionary psychology and pop-neuroscience, Big Data can be used to give any chosen hypothesis a veneer of science and the unearned authority of numbers. The data is big enough to entertain any story. Big Data has thus spawned an entire industry (“predictive analytics”) as well as reams of academic, corporate, and governmental research; it has also sparked the rise of “data journalism” like that of FiveThirtyEight, Vox, and the other multiplying america.aljazeera.com. It has shifted the center of gravity in these fields not merely because of its grand epistemological claims but also because it’s well-financed. Twitter, for example recently blog.twitter.com that it is putting $10 million into a “social machines” Big Data laboratory.
[/align][align="left"] 用“正确的”方法收集足够的数据就可以提供一个客观的、公正的现实图景,这种理性主义空想其实是一个我们熟悉的老旧概念:实证主义。这种方法是如此理解世界的:只要我们保持价值中立,特别是超然的不带任何立场,我们就可以认识和解释这个社会。这个术语来自于奥古斯·孔德(August Comte)的《实证哲学(Positive Philosophy)》(1830-1842)。在实证主义的意义下,他也重新创造了“社会学”这个词。当西方社会学开始变成一门学科时(这意味着它拥有系所,能提供就业岗位,有很多定期刊物,举办学术会议),Emile Durkheim,这个学科的另一个创建者,相信它将可以起到“社会物理学”的功能,为我们描绘一种“社会事实”(social facts)——就像我们进行物理实验测量一样。从现在看来,这是一个非常自大的观点——这门学科目标是为我们的社会生活提供一个宏观的、普遍的理论;随着社会学越来越致力于经验性的数据收集,这个观点也越来越根深蒂固。
The rationalist fantasy that enough data can be collected with the “right” methodology to provide an objective and disinterested picture of reality is an old and familiar one: positivism. This is the understanding that the social world can be known and explained from a value-neutral, transcendent view from nowhere in particular. The term comes from Positive Philosophy (1830-1842), by August Comte, who also coined the term sociology in this image. As Western sociology began to congeal as a discipline (departments, paid jobs, journals, conferences), Emile Durkheim, another of the field’s founders, believed it could function as a “social physics” capable of outlining “social facts” akin to the measurable facts that could be recorded about the physical properties of objects. It’s an arrogant view, in retrospect — one that aims for a grand, general theory that can explain social life, a view that became increasingly rooted as sociology became focused on empirical data collection.
[/align][align="left"] 一个世纪之后,大部分社会学家重新将这门学科定位于认识社会的复杂性,而不是去探寻一种普世的人类社会解释。随着思想的转变,社会学实证主义也就被抛弃了。但是大数据的到来却复活了这种社会物理学幻想——一种全新的数据驱动技术将用纯粹的算术处理能力去描绘这种“社会事实”。
A century later, that unwieldy aspiration has been largely abandoned by sociologists in favor of reorienting the discipline toward recognizing complexities rather than pursuing universal explanations for human sociality. But the advent of Big Data has resurrected the fantasy of a social physics, promising a new data-driven technique for ratifying social facts with sheer algorithmic processing power.
[/align][align="left"] 因为实证主义许诺的回报太过于诱人了,所以即使其流行度时高时低,但却从未绝迹。这个简单道理幻想的魔力——我们将可以站在超越各种可能将社会撕裂的权力和议程分歧之上看这个世界——实在是太强大,太有“钱途”了。其实如何令人信服地宣传自己构建的社会模型是准确的,和如何成功推销任何东西(从一个政治立场,一个产品,到自己的权威性)是一样的。虽然大数据被包装成一种等价于权力的知识,实际上,却依赖于早已存在的力量将其数据等同化为知识。
Positivism’s intensity has waxed and waned over time, but it never entirely dies out, because its rewards are too seductive. The fantasy of a simple truth that can transcend the divisions that otherwise fragment a society riven by power and competing agendas is too powerful, and too profitable. To be able to assert convincingly that you have modeled the social world accurately is to know how to sell anything from a political position, a product, to one’s own authority. Big Data sells itself as a knowledge that equals power. But in fact, it relies on pre-existing power to equate data with knowledge.
[/align]
[align="center"]***[/align][align="left"] 并非所有的数据科学都是关于大数据的。如同其他研究领域一样,数据科学实践者们的道德高低、意图、谦逊程度,以及对于自身方法论局限性的认识程度是千差万别的。在此批评“大数据”(的所谓客观性、公正性)对于主流文化思想的渗透,并不是说所有的数据研究都是没有价值的。(比如说,新的数据与社会研究所( Data & Society Research Institute)采取新的测量方法于大数据组研究上。这是可取的。)但是数据科学的实证主义倾向——它的客观性传说和政治中立性——比其他研究都更加明显。这些趋势很有可能将数据科学转变成为一种合理化技术工业方法在生产设计和数据收集上的意识形态工具。
Not all data science is Big Data. As with any research field, the practitioners of data science vary widely in ethics, intent, humility, and awareness of the limits of their methodologies. To critique the cultural deployment of Big Data as it filters into the mainstream is not to argue that all data research is worthless. (The new datasociety.net, for instance, takes a measured approach to research with large data sets.) But the positivist tendencies of data science — its myths of objectivity and political disinterestedness — loom larger than any study or any set of researchers, and they threaten to transform data science into an ideological tool for legitimizing the tech industry’s approach to product design and data collection.
[/align][align="left"] 我们不能脱离数据科学和大众媒体公司之间强大的纽带关系来理解大数据研究。这是大数据那居高临下的无处视角意识形态(view-from-nowhere ideology)最为清晰的地方;也是算法,数据库,和风险资本相结合的地方。Facebook研究组是现在声名狼藉的情绪操纵研究(这个研究因其过于宽松的伦理标准和智力上的傲慢而广受谴责)的幕后黑手,绝非偶然。(其中一个研究者认为大数据的潜能和显微镜的发明相当。)
Big Data research cannot be understood outside the powerful nexus of data science and social-media companies. It’s where the commanding view-from-nowhere ideology of Big Data is most transparent; it’s where the algorithms, databases, and venture capital all meet. It was no accident that Facebook’s research branch was behind the now infamous emotional manipulation study, which was widely condemned for its medium.com and intellectual medium.com. (One of the authors of the study nytimes.com Big Data’s potential was akin to the invention of the microscope.)
[/align][align="left"] 同样浸淫着大数据幻想的还有一本叫做《数据灾难》(Dataclysm)的书。这本书集合了OkCupid主席Christian Rudder早先在博客上发表的对于他的服务器所记录的各种异常数据的观察。Rudder由此宣称“我们将要步入人类沟通研究的重大变革”。他的字里行间里同样充满了Facebook研究组那种傲慢。《数据灾难》的副标题是“我们是谁(当我们认为没有人在注意我们的时候)”。自鸣得意地认为当收集到足够的数据,我们将可以见到超越研究人员甚至是研究对象主观性的不为人知的(丑恶)事实——大数据可以揭示即使是亲身体验的人也不知道的人类社交性和欲望。
Equally steeped in the Big Data way of knowing is Dataclysm, a new book-length expansion of OkCupid president Christian Rudder’s earlier blog-posted observations about the anomalies of his dating service’s data set. “We are on the cusp of momentous change in the study of human communication,” Rudder proclaims, echoing the Facebook researchers’ hubris. Dataclysm’s subtitle sets the same tone: “Who we are (when we think no one is watching).” The smirking implication is that when enough data is gathered behind our backs, we can finally have access to the dirty hidden truth beyond the subjectivity of not only researchers but their subjects as well. Big Data will expose human sociality and desire in ways those experiencing it can’t.
[/align][align="left"] 因为像在OkCupid这种平台上收集数字数据——所有界面被动地记录各种关于用户行为的信息——是自动进行的。按照复杂的先验理论来说,这似乎是不偏不倚的。数字,就像Rudder在书里不断提到的,不会跑掉,就在原处等着大家去使用它们得到自己想要的结论。的确,因为数据数量很大,它们反映了很多“事实”。根据OkCupid上所有关于用户爱情,性和美的数据,Rudder声称他可以“道破现在仍不为人知的空虚与脆弱”。
Because digital data collection on platforms like OkCupid seems to happen almost automatically — the interfaces passively record all sorts of information about users’ behavior — it appears unbiased by messy a priori theories. The numbers, as Rudder states multiple times in the book, are right there for you to conclude what you wish. Indeed, because so many numbers are there, they speak for themselves. With all of OkCupid’s data points on love and sex and beauty, Rudder claims he can “lay bare vanities and vulnerabilities that were perhaps until now just shades of truth.”
[/align][align="left"] 对于Rudder和其他科技公司的新实证主义者来说,大数据总是站在更大数据的阴影之下。他们总是假设因为人们可以在今天收集到比昨天更多的数据,那么明天必然收集到比今天更多的数据。这是一种会将我们推向无限接近于“纯粹”数据形式的扩张:终有一天,我们每天的活动将以数据的形式被记录;由此,我们可以从中得到一种是我们能掌握一切事情因果的方法。在Rudder的书里,他不厌其烦地指出他所拥有的数据的规模,力量和无限潜能,让读者们深深明白这些数据是如何越来越“大”的。这种根深蒂固的实证主义幻想——我们将会在不久的将来完全解释这个宇宙——使得采取侵犯隐私式的数据收集方式变成一种道德权利。
For Rudder and the other neo-positivists conducting research from tech-company campuses, Big Data always stands in the shadow of the bigger data to come. The assumption is that there is more data today and there will necessarily be even more tomorrow, an expansion that will bring us ever closer to the inevitable “pure” data totality: the entirety of our everyday actions captured in data form, lending themselves to the project of a total causal explanation for everything. Over and over again, Rudder points out the size, power, and limitless potential of his data only to impress upon readers how it could be even bigger. This long-held positivist fantasy — the complete account of the universe that is always just around the corner — thereby establishes a moral mandate for ever more intrusive data collection.
[/align][align="left"] 但是为什么Rudder会如此深信他拥有的数据会有探究事实的能力,并且认为他无视现有的研究者伦理准则是正当的,关键还在于他相信通过被动收集得到的数据完全排除了研究者偏见。在Rudder和其他认为可以在没得到对方许可的情况下对其进行人数字化人体实验的新实证主义者看来,轮询(polling)和其他现有的收集大规模数据的方法的问题在于,它们是产生测量误差的来源。任何受到过足够训练的社会科学家都会承认,一个问题如何措词,由谁提问,都会影响整个调查问卷的效果。Rudder相信,利用大数据我们可以将数据收集过程中遇到的种种问题通通解决而得到更加真实的结果。例如,现在只要从Google搜索里收集数据就可以得到想要的结果,再也不需要研究者对研究对象进行任何形式的询问了。Rudder是这么形容的“不需要问题,也不需要开口问,答案自然就有”。
But what’s most fundamental to Rudder’s belief in his data’s truth-telling capability — and his justification for ignoring established research-ethics norms — is his view that data sets built through passive data collection eliminate researcher bias. In Rudder’s view, shared by other neo-positivists that have defended human digital experimentation without consent, the problem with polling and other established methods for large-scale data gathering is that these have well-known sources of measurement error. As any adequately trained social scientist would confirm, how you word a question and who poses it can corrupt what a questionnaire captures. Rudder believes Big Data can get much closer to the truth by removing the researcher from the data-collection process altogether. For instance, with data scraped from Google searches, there is no researcher prodding subjects to reveal what they wanted to know. “There is no ask. You just tell,” Rudder writes.
[/align][align="left"] 这是为什么Rudder相信他不需要提前得到他网站用户的许可,就可以人为地操纵用户的配对比例,又或者是从某些网络互动中移除用户的照片。为了尽可能获得不受“污染的”数据,用户是不能被询问是否同意授权的,因为他们不能知道自己身处在实验室之中。
This is why Rudder believes he doesn’t need to ask for permission before experimenting on his site’s users — to, say, artificially manipulate users’ “match” percentage or systematically remove some users’ photos from interactions. To obtain the most uncontaminated data, users cannot be asked for consent. They cannot know they are in a lab.
[/align][align="left"] 当调查研究领域几乎将重点放在对自身方法局限性的理解和表达时,Rudder却选择忽略它们来应对大数据工作过程中可能(这种可能性是非常大的,甚至大于常规方法)遇到的系统性测量错误。他辩解到“有些时候,计算机运用盲算法(blind algorithm)去观察数据。”然而OkCupid收集数据的方法却让Rudder的说法大打折扣:OkCupid的政策和程序员们对于特定的文化理解决定了如何收集数据。大数据实证主义短视地认为只要是计算机被动收到的数据就是客观的。但是计算机自己是记不住任何东西的,记住的是人。
While the field of survey research has oriented itself almost completely to understanding and articulating the limits of its methods, Rudder copes with Big Data’s potentially even more egregious opportunities for systematic measurement error by ignoring them. “Sometimes,” he argues, “it takes a blind algorithm to really see the data.” Significantly downplayed in this view is how the way OkCupid captures its data points is governed by the political choices and specific cultural understandings of the site’s programmers. Big Data positivism myopically regards the data passively collected by computers to be objective. But computers don’t remember anything on their own.
[/align][align="left"] 这种对计算机如何工作的幼稚观点和人们早期对摄影的观点差不多;当时人们认为这种新技术预示着我们人类视觉将会被我们创造的可以观察到我们自身观察不到的照相机所取代。这其中最出名的例子是Eadweard Muybridge的“飞奔的马”摄影系列展览。但是与此同时,Shawn Michelle Smith在他的《在视线的边缘:摄影与不可见》(At the Edge of Sight: Photography and the Unseen)里解释到,在早期摄影里,摄影师常常将自己对种族、性别和性特定的和不为人知的理解添加进自己的照片里。这所谓的超越人类视觉的视觉实际上不过充满了各种文化上的有色眼镜——而这正是人们宣称通过摄影可以避免的。
This naive perspective on how computers work echoes the early days of photography, when that new technology was sometimes represented as a vision that could go beyond vision, revealing truths previously impossible to capture. The most famous example is Eadweard Muybridge’s series of photographs that showed how a horse really galloped. But at the same time, as Shawn Michelle Smith thenewinquiry.com in At the Edge of Sight: Photography and the Unseen, early photography often encoded specific and possibly unacknowledged understandings of race, gender, and sexuality as “real.” This vision beyond vision was in fact saturated with the cultural filter that photography was said to overcome.
[/align][align="left"] 其他社交媒体平台也同样充斥着这些东西 :如何设计这些网站,收集什么样的数据,如何收集这些数据,如何整理和储存数据,如何查询数据,为什么这些数据充斥着政治、利益和不安全感。社会科学研究人员从他们学生时期开始就一直受到这样的训练:如何辨认使用什么方法,并采用相应的技巧降低或者至少是表达出结果中存在的偏差。与此同时,Rudder却对这些方法指导新手们(first-year methods instructor)一个惊天的消息,“只要你使用正确的分析方法使手头上数据组的鲁棒性足够大,你根本不需要对数据提出问题,数据就会告诉你任何东西”。
Social-media platforms are similarly saturated. The politics that goes into designing these sites, what data they collect, how it is captured, how the variables are arranged and stored, how the data is queried and why are all full of messy politics, interests, and insecurities. Social-science researchers are trained to recognize this from the very beginning of their academic training and learn techniques to try to mitigate or at least articulate the resulting bias. Meanwhile, Rudder gives every first-year methods instructor heart palpitations by claiming that “there are times when a data set is so robust that if you set up your analysis right, you don’t need to ask it questions — it just tells you everything anyways.”
[/align][align="left"] Evelyn Fox Keller在《反思性别与科学》( Reflections on Gender and Science)书中描述实证主义如何通过将研究人员与数据区分开来实现客观性和中立性。大数据,正如Rudder一直急切主张的,包含了这种区分。这也引向了或许是大数据隐含的意识形态里最危险的后果:研究对文化中种族、性别、性有重大影响的研究者们将会拒绝承认他们是如何将未阐明甚至是无意识的理论,自己特定的社会立场来夹杂进自己研究里。这重蹈了它们之前存在的偏见,并且同时用这些数据是客观性正确的说法隐藏了起来。
Evelyn Fox Keller, in Reflections on Gender in Science (sic. Reflections on Gender and Science), describes how positivism is first enacted by distancing the researcher from the data. Big Data, as Rudder eagerly asserts, embraces this separation. This leads to perhaps the most dangerous consequence of Big Data ideology: that researchers whose work touches on the impact of race, gender, and sexuality in culture refuse to recognize how they invest their own unstated and perhaps unconscious theories, their specific social standpoint, into their entire research process. This replicates their existing bias and simultaneously hides that bias to the degree their findings are regarded as objectively truthful.
[/align][align="left"] 通过将探究真理的能力从研究人员身上转移到不言而喻的数据上,大数据含蓄地鼓励研究人员无视概念性框架,诸如交集性(intersectionality)或者关于社会分类这样的概念可能会不利于而不是加强我们的理解的讨论。并且我们没有理由相信那些掌握着大数据的人们(通常是科技公司里的人员和他们所附属的研究人员)完全不受偏见影响。他们,像其他人一样,有着对这个社会特定的偏好——知道什么样的数据可以解释什么样的现象,也知道数据应该如何被使用去解释。正如Danah Boyd 和Kate Crawford在《大数据的关键问题》(thenewinquiry.com)里指出的“不管数据的规模如何,大数据总会受制于其自身局限性和人的偏见。如果没有正确理解并且总结这些偏见和局限性,我们得到的只能是某种曲解”。(regardless of the size of a data, it is subject to limitation and bias. Without those biases and limitations being understood and outlined, misinterpretation is the result.)[/align][align="left"]By moving the truth-telling ability from the researcher to data that supposedly speaks for itself, Big Data implicitly encourages researchers to ignore conceptual frameworks like intersectionality or debates about how social categories can be queered rather than reinforced. And there is no reason to suppose that those with access to Big Data — often tech companies and researchers affiliated with them — are immune to bias. They, like anyone, have specific orientations toward the social world, what sort of data could describe it, and how that data should be used. As Danah Boyd and Kate Crawford point out in “thenewinquiry.com,”[/align]

[align="left"]regardless of the size of a data, it is subject to limitation and bias. Without those biases and limitations being understood and outlined, misinterpretation is the result.[/align]

[align="left"] 这种短视使得Rudder写下这样的东西,“对于性别差异研究最理想的数据来源不是那些表面上用户性别不相干的地方,而是在那些用户性别是男是女无所谓的地方。我选推特(Twitter)做为最理想的试验地。”,完全无视不同性别在推特(Twitter)使用上的差异。纵观《数据灾难》(Dataclysm)全书,尽管Rudder的态度是他的工作完全与自己的数据分开的,他的政策却是一直在干预它们:不但在他自己提及大脑科学和演化心理学的解说里,也体现在他如何挑选测量变量和如何将它们安排在自己的分析上。
This kind of short-sightedness allows Rudder to write things like “The ideal source for analyzing gender difference is instead one where a user’s gender is nominally irrelevant, where it doesn’t matter if the person is a man or a woman. I chose Twitter to be that neutral ground” without pausing to consider how gender deeply informs the use of Twitter. Throughout Dataclysm, despite his posture of being separate from the data he works with, Rudder’s politics are continually intervening, not merely in his explanations, which often refer to brain science and evolutionary psychology, but also in how he chooses to measure variables and put them into his analyses.
[/align][align="left"] 在一个因为种族、阶级、性别和其他重要因素而分化的社会里,知识怎么可能是中立客观的?正当前《连线》杂志主编Chris Anderson在文章里宣告感谢大数据“终结了理论”的时候,Kate Crawford、Kate Miltner和Mary Gray就在开始纠正大家的观点了——大数据本身就是理论!大数据的支持者只是没有意识到而已!
In a society deeply stratified on the lines of race, class, sex, and many other vectors of domination, how can knowledge ever be said to be disinterested and objective? While former Wired editor-in-chief Chris Anderson was describing the supposed “end of theory” thanks to Big Data in a widely heralded archive.wired.com, Kate Crawford, Kate Miltner, and Mary Gray were ijoc.org that view, pointing out simply that “Big Data is theory.” It’s merely one that operates by failing to understand itself as one.
[/align]
[align="center"]*** [/align][align="left"] 实证主义已经出现很长一段时间了,对它的批评从一开始就存在。一些研究方法论者认为Sandra Harding的《谁的科学?谁的知识?》主张一种新的“强有力的”客观性。这种客观性将包括研究者的社会立场在内的因素看做一种特色,而非是一种缺陷;这样就允许了观点多样性的存在,而不是一味地追求那种错误的自认为中立的观点(false view from nowhere)。Patricia Hill Collins在《黑人女性思想》里提到,“偏袒和非普世性是一种需要被倾听的状态”。
Positivism has been with us a long time, as have the critiques of it. Some research methodologists have addressed and incorporated these critiques: Sandra Harding’s Whose Science? Whose Knowledge? argues for a new, “strong” objectivity that sees including a researcher’s social standpoint as a feature instead of a flaw, permitting a diversity of perspectives instead one false view from nowhere. Patricia Hill Collins, in Black Feminist Thought, argues that “partiality and not universality is the condition of being heard.”
[/align][align="left"] 大数据却采取了另一种方法。非但不承认方法论中的偏袒性,它的辩护者还使用了一些新的伎俩去粉饰传说中的普世客观性。为了逃避对于立场的追问,他们靠牺牲研究人员来吹捧大数据。通过对测量者和研究者专业水平的贬低(Rudder在书中不断提及自己低劣的统计学水平),大数据的支持者狡猾地将权威性的来源转移到大数据身上。如此,探讨真理的能力再也不与分析方法相关,而单纯地取决于接触到数据的数量和质量。
Big Data takes a different approach. Rather than accept partiality, its apologists try a new trick to salvage the myth of universal objectivity. To evade questions of standpoint, they lionize the data at the expense of the researcher. Big Data’s proponents downplay both the role of the measurer in measurement and the researcher’s expertise — Rudder makes constant note of his mediocre statistical skills — to subtly shift the source of authority. The ability to tell the truth becomes no longer a matter of analytical approach and instead one of sheer access to data.
[/align][align="left"] 实证主义幻想有赖于接触数据的机会的不公平性。为什么科学可以如此长久以来将自己标榜为道德的和政治中立的?因为拥有看穿它本质能力的人在人群中的比例分配得太不合理了。随着越来越多人从不同文化观点进行科学实践,先前科学内在的政治偏见就不断被暴露出来。现在越来越多人接受了优质的教育,研究人员也采纳了更加先进的研究方法,实证主义者已经不能再为他们的实证主义幻想编造依据了。
The positivist fiction has always relied on unequal access: science could sell itself as morally and politically disinterested for so long because the requisite skills were so unevenly distributed. As scientific practice is increasingly conducted from different cultural standpoints, the inherited political biases of previous science become more obvious. As access to education and advanced research methodologies became more widespread, they could no longer support the positivist myth.
[/align][align="left"] 然而,大数据的文化意识形态尝试逆转这个形势:将权威性(或多或少地)从大众化研究专业知识转移到只有少数人可以获得的专有的、受到控制的数据上。(Molly Osberg在她为The Verge网络媒体所写的《数据灾难》的书评里指出,Rudder是如何解释他如何通过个人关系从其他技术公司的行政人员身上获取大部分信息的)当数据被称赞它可以自然而然地反映事实,研究人员应该降低他们自己的方法在研究中的重要性的时候,我们应该这么理解:这是一种使接触数据的权限变得更加值钱,更加稀罕的努力。当然,宣传这些数据是如此有价值,如此有权威性的人,通常也是拥有这些数据并且靠贩卖获取数据权限赚钱的人。
The cultural ideology of Big Data attempts to reverse this by shifting authority away from (slightly more) democratized research expertise toward unequal access to proprietary, gated data. (Molly Osberg points out in her theverge.com of Dataclysm for the Verge how Rudder explains in the notes how he gathered most of his information through personal interactions with other tech companyexecutives.) When data is said to be so good that it tells its own truths and researchers downplay their own methodological skills, that should be understood as an effort to make access to that data more valuable, more rarefied. And the same people positioning this data as so valuable and authoritative are typically the ones who own it and routinely sell access to it.
[/align][align="left"] 数据科学不一定要成一种精英式的实践。我们应该寻找一种更好理解的并且可以忍受大数据的“小”(因为这强调了我们瞬息万变的社会生活中有很多错综复杂的事物是没有办法反应在数据库的数据里的)的大众化方式处理大数据组。我们不能让实证主义加在大数据上的外饰让我们忽略了它真正有价值的研究潜能。
Data science need not be an elitist practice. We should pursue a popular approach to large data sets that better understands and comes to terms with Big Data’s own smallness, emphasizing how much of the intricacies of fluid social life cannot be held still in a database. We shouldn’t let the positivist veneer on data science cause us to overlook its valuable research potential.
[/align][align="left"]
但是对于大数据来说,想要被用于真正改善我们的社会和这个世界,研究者们仍需要与上文所说的那种使我们过度投资、高估大数据的文化意识形态相斗争。像《数据灾难》(Dataclysm)和其他大公司,或者是商业数据科学里的无处视角(view from nowhere),必须脱下它的伪装,因为那不过是我们所熟悉的一种有缺陷的不公正的立场而已。
But for Big Data to really enhance what we know about the social world, researchers need to fight against the very cultural ideology that, in the short term, overfunds and overvalues it. The view from nowhere that informs books like Dataclysm and much of the corporate and commercialized data science must be unmasked as a view from a very specific and familiar somewhere.
[/align]
1条回复