Computer Science

add news feed

post a story

One of the key problems in crowdsourcing is the issue of quality control. Over the last few years, a large number of methods have been proposed for estimating the quality of workers and the quality of the generated data. A few years back...
One of the key problems in crowdsourcing is the issue of quality control. Over the last few years, a large number of methods have been proposed for estimating the quality of workers and the quality of the generated data. A few years back, we have released the Get Another Label toolkit, which allowed people to run their data through a command-line interface, and get back estimates of the worker quality, estimates of how well the data have been labeled, and identify the data points that have high uncertainty and therefore may require additional attention. The next step for the Get Another Label was to get it ready to work in more practical settings. The GAL toolkit, assumed that we have all the labels assigned by the workers, we process them, and get the results. In reality, though, most tasks run in an incremental mode. The task is running over time, new data arrive, new workers arrive, and the "load-analyze-output" process was not a good fit. We wanted to have something that gives back estimates of worker quality on the fly, and again on-the-fly identifies the data points that need most attention. Towards this goal, over the last few months we have been porting the GAL code into a web service, called Project Troia. You can load the data as the crowdsourced project runs and get back the results immediately. This allows for very fast estimation of worker quality, and also allows the quick identification of data points that either meet the target quality, or require additional labeling effort. Supports labeling with any number of discrete categories, not just binary. Supports labeling with continuous variables. Allows the specification of arbitrary misclassification costs (e.g., "marking spam as legitimate has cost 1, marking legitimate content as spam has cost 5"). Allows for seamless mixing of gold labels and redundant labels for quality control. Estimates the quality of the workers that participate in the task and returns the estimates on-the-fly. Estimates the quality of the data that are returned back by the algorithm and returns the estimate of labeling accuracy on-the-fly. Estimates a quality-sensitive payment for every worker, based on the quality of the work done so far. If you are interested in the description of the methods implemented in the toolkit, please take a look at the paper "Quality-based Pricing for Crowdsourced Workers". Our experiments indicate that when labeling allocation happens following the suggestions of Project Troia, we achieve the target data quality with almost optimal budget, and workers are fairly compensated for their effort. (For details, see the paper :-) Special thanks to Tagasauris, oDesk, and Google for providing support for developing the software. Needless to say, the API is free to use, and the source code is available on Github. We hope that you will find it useful.
about 1 hour ago
What is good reviewing?
What is good reviewing?
about 6 hours ago
In the last decade, NAND-flash in the form of solid state drives has revolutionized the storage sub-system. With two orders of magnitude less latency compared to magnetic disks, SSDs have changed how applications perceive storage. Persis...
In the last decade, NAND-flash in the form of solid state drives has revolutionized the storage sub-system. With two orders of magnitude less latency compared to magnetic disks, SSDs have changed how applications perceive storage. Persistent random access memory technologies like phase change memory promise to create a similar revolution for the memory sub-system. PRAM promises to be byte-addressable, non-volatile and at the same time scalable to an order of magnitude better capacity when compared to DRAM. Additionally, it promises to work at a latency close to DRAM’s – atleast for reads. Researchers of various software systems fields have stepped up efforts to embrace this impending new technology. In this paper, we review the current state-of-the-art and comment on what the future holds. We start by reviewing how the storage technologies like filesystems and databases can exploit PRAM’s byte-addressability. We then review how virtual memory managers must evolve to exploit the non-volatility of PRAM. We describe these new models of using memory and storage systems via PRAM, and suggest future research directions. We conclude the paper by discussing some of the shortcomings of the new technology and how systems must evolve to tackle them early on. We draw these conclusions from our experience with building systems for flash.
about 13 hours ago
Code.Org has just released a new video promoting computer science that may be especially effective for creating broader public awareness among policy leaders and parents. The video, Code - the new literacy is shorter than the previous ...
Code.Org has just released a new video promoting computer science that may be especially effective for creating broader public awareness among policy leaders and parents. The video, Code - the new literacy is shorter than the previous Code.org video and is focused specifically on the importance of computer science knowledge. The video includes new footage from high tech industry leaders such as Bill Gates and Mark Zuckerberg and Code.org founder Hadi Parvoti notes that, like the previous release from Code.org, it is intended as an advocacy tool to help raise public consciousness about how critical these skills are for all students. "This short 2-min video is focused on computer science education as a matter of literacy. It is a great tool for engaging administrators and policy makers to pitch the case for teaching CS to all students, especially at an early age." Code.org has stepped up to take a leadership role on state-level advocacy to ensure that all students have access to rigorous computer science in schools. CSTA is part of a community of CS education organizations working with Code.org on this critical initiative. Chris Stephenson CSTA Executive Director
1 day ago
For each of these, are they frauds? The Turk was a chess playing ``computer'' (around 1770) that was later discovered to be cheating--- a human made the moves. As Ken Regan knows well, we now have the opposite problem- humans who chea...
For each of these, are they frauds? The Turk was a chess playing ``computer'' (around 1770) that was later discovered to be cheating--- a human made the moves. As Ken Regan knows well, we now have the opposite problem- humans who cheat by having a computer make the moves. Note that the Turk still played an excellent game of chess and hid the human element. This IS an achievement--- just not the one people wanted. Fraud? Yes I once heard a rumor (NOTE- this may not be true, that's why its called a rumor) that Hybrid cars get good gas mileage NOT because of the battery but because in their effort to get good mileage they rethought other things like the aerodynamics and how the gas powers the car. If I buy a hybrid car that gets 45 miles and hour but then find out that it gets this NOT because of the battery, but because of really really good enginnering- was I cheated? My sense is NO since I wanted good gas mileage. I may wonder why I need to replace the battery, or even if I need to. Fraud: I'll say NO but its certainly debatable. Someone sells a single-purpose quantum computer to factor numbers and it works REALLY WELL but later it is discovered that it didn't use quantum at all(!)---it instead used a new classical algorithms (e.g., an extension of the Number field Sieve)--- would the buyers consider themselves cheated? If the buyers were people who just want to factor really large numbers then perhaps they wouldn't care. If the device was meant to fool granting agencies or venture capatilists to fund more quantum, then it is fraud. One may wonder why the device-maker didn't just apply for funding in crypto. If the buyer is an academic who then writes an article about how quantum computing is finally practical, when the truth is discovered he may have his credibility (unfairly?) tarnished. What if someone had a quantum computer that factored really well but was advertised as a really good classical algorithm that used hard number theory? Somehow that seems very funny to me as a scenario so I won't even ponder fraud or not. I have heard that the current quantum computers that do such miraculous things as factor 15 (darling says `factor 15? I could do that without breaking a sweat') or find R(3) (I always thought it was 6 and now I know!) may not be ``really quantum'' . This is problematic since nobody really wants to factor 15 or find R(3)--- that is, there is no analog to the people who want good gas mileage or the people who want to factor large numbers in my two examples above. These devices are JUST for demonstration purposes. If its not quantum, its not demonstrating anything. Fraud? Yes, but are they really fooling anyone?
1 day ago
NLP
I feel a bit odd doing my "what I liked at NAACL 2013" as one of the program chairs, but not odd enough to skip what seems to be the most popular type of post :). First, though, since Katrin Kirchhoff (my co-chair) and I never got a chan...
I feel a bit odd doing my "what I liked at NAACL 2013" as one of the program chairs, but not odd enough to skip what seems to be the most popular type of post :). First, though, since Katrin Kirchhoff (my co-chair) and I never got a chance to formally thank Lucy Vanderwende (the general chair) and give her flowers (or wine or...) let me take this opportunity to say that Lucy was an amazing general chair and that working with her made even the least pleasant parts of PCing fun. So: thanks Lucy -- I can't imagine having someone better to have worked with! And all of the rest of you: if you see Lucy and if you enjoyed NAACL, please thank her!I also wanted to really thank Matt Post for doing the NAACL app -- lots of people really liked it and I hope we do it in the future. I'm at ICML now and constantly wishing there were an ICML app :).Okay, with that preface, let's get down to what you came for. Below is the list of my (complete) list of favorite papers from NAACL 2013 (also indexed on Braque) in no particular order:Relation Extraction with Matrix Factorization and Universal Schemas (N13-1008 by Sebastian Riedel; Limin Yao; Andrew McCallum; Benjamin M. Marlin)Very cool paper. The idea is to try to jointly infer relations (think OpenIE-style) across text and databases, by writing everything down in a matrix and doing matrix completion. In particular, make the rows of this matrix equal to pairs of entities (Hal,UMD and UMD,DC-area) and the columns relations like "is-professor-at" and "is-located-in." These entity pairs and relations come both from free text and databases like FreeBase. Fill in the known entities and then think of it as a recommender system. They get great results with a remarkably straightforward approach. Reminds me a lot of my colleague Lise Getoor's work on multi-relational learning using tensor decompositions.Combining multiple information types in Bayesian word segmentation (N13-1012 by Gabriel Doyle; Roger Levy)I guess this qualifies as an "obvious in retrospect" idea -- and please recognize that I see that as a very positive quality! The basic idea is that stress patterns (eg trochees versus iambs) are very useful for kids (who apparently can recognize such things at 4 days old!) and are also very useful for word segmentation algorithms.Learning a Part-of-Speech Tagger from Two Hours of Annotation (N13-1014 by Dan Garrette; Jason Baldridge)Probably my overall favorite paper of the conference, and the title says everything. Also probably one of the best presentations I saw at the conference -- I can't even begin to guess how long Dan spent on his slides! I loved the question from Salim in the Q/A session, too: "Why did you stop at two hours?" (They have an ACL paper coming up, apparently, that answers this.) You should just read this paper.Automatic Generation of English Respellings (N13-1072 by Bradley Hauer; Grzegorz Kondrak)This paper was the recipient of the best student paper award and, I thought, really great. It's basically about how English (in particular) has funny orthography and some times it's useful to map spellings to their pro-nun-see-ey-shuns, which most people find more useful than . It's a bit more of a bunch of stuff glued together than I usually go for in papers, but the ideas are solid and it seems to work pretty well -- and I'd never even thought this would be something interesting to look at, but it makes complete sense. Best part of presentation was when Greg tripped up pronouncing some example words :). Linguistic Regularities in Continuous Space Word Representations (N13-1090 by Tomas Mikolov; Wen-tau Yih; Geoffrey Zweig) This is a paper that makes my list because it made me think. The basic idea is that if you do some representation learning thingamajig and then do vector space algebra like repr("King") - repr("man") + repr("woman") you end up with something that's similar to repr("Queen"). It's a really interesting observation, but I'm at a loss for why we would actually expec
1 day ago
NLP
We pushed a new SPM release to production this morning and it’s loaded with goodies.  Here is a quick run-down of a few interesting ones. The slightly longer version can be found in SPM Changelog: PagerDuty integration. If you are ...
We pushed a new SPM release to production this morning and it’s loaded with goodies.  Here is a quick run-down of a few interesting ones. The slightly longer version can be found in SPM Changelog: PagerDuty integration. If you are a PagerDuty user, your alerts from SPM can now go to your PagerDuty account where you can handle them along with all your other alerts. Ruby & Java libraries for Custom Metrics.  We open-sourced sematext-metrics, a Ruby gem for sending Custom Metrics to SPM as well as sematext-metrics for doing the same from Java. Coda Metrics & Ruby Metriks support.  We open-sourced sematext-metrics-reporter, a Coda’s Metrics reporter for sending Custom Metrics to SPM from Java, Scala, Clojure, and other JVM-based apps, and we’ve done the same for Metriks - the Ruby equivalent of Coda’s Metrics library. Puppet metrics. We begged James Turnbull to marry Puppet and SPM and write a Puppet report processor thats sends each of the metrics generated by a Puppet run to SPM, which he did without us having to buy him drinks….yet. Performance.  We’ve done a bit of work in the layer right behind the UI to make the UI a little faster. CentOS 5.x support.  Apparently a good number of people still use CentOS 5.x, so we’ve update the SPM client SPM to work with it.  You can grab from SPM Client page. - @sematext
1 day ago
New ScientistImperial College London (ICL) researchers have developed a system that enables users to update software without any concerns of causing downtime or introducing bugs. The system employs the unused cores in multicore micropro...
New ScientistImperial College London (ICL) researchers have developed a system that enables users to update software without any concerns of causing downtime or introducing bugs. The system employs the unused cores in multicore microprocessors to make the update process invisible to the user. Whenever an update is available, the system leaves the old version of the software running on one core, enabling users to continue accessing it, while running the update in parallel on an unused core. The execution of the two programs is synchronized in such a way that only the most reliable, dependable parts of the programs run, limiting the damage from an introduction of new bugs. "You end up with what we call a multi-version application," says ICL's Cristian Cadar. "These run in parallel and their behavior is combined in such a way as to increase overall reliability and security. But it looks and feels exactly the same to users." He notes the system worked well in tests, and can be used for larger systems, apps for smartphones, and server applications.From "Update Your Software Without Stress or Disruption" New Scientist (06/12/13) Paul Marks View Full Article - May Require Free Registration
1 day ago
BBC NewsNewcastle University professor Sugata Mitra in February won a $1 million award to set up a series of cloud-based schools, and described his vision of the first "school in the cloud" at the recent Technology, Entertainment, and De...
BBC NewsNewcastle University professor Sugata Mitra in February won a $1 million award to set up a series of cloud-based schools, and described his vision of the first "school in the cloud" at the recent Technology, Entertainment, and Design (TED) Global conference. "A school in the cloud is basically a school without physical teachers," Mitra says. He plans to establish five cloud schools, with three in India and two in the United Kingdom. The glass classrooms will contain many computers and one large screen, through which moderators will Skype in. Mitra's initiative is based on the hole-in-the-wall computers that he set up in India's slums in 1999. The computers were left for children to explore without any prior instruction, and Mitra says he was amazed at the skills they developed on their own. Also at TEDGlobal, Massachusetts Institute of Technology professor Anant Agarwal discussed how his edX online platform could help bring top-tier university education to students in developing countries. "Education has not changed in 500 years--we still herd children like cats into classrooms at 9 a.m.," says Agarwal, who argues a different approach is needed in the developing world.From "TEDGlobal: Cloud Schools Offer New Education" BBC News (06/14/13) Jane Wakefield View Full Article
1 day ago
ScienceComputer scientists are contributing to the advancement of the life sciences by supplementing their training with biology basics. "The combination of quantitative abilities and experimental biology skills is very valuable in synt...
ScienceComputer scientists are contributing to the advancement of the life sciences by supplementing their training with biology basics. "The combination of quantitative abilities and experimental biology skills is very valuable in synthetic biology," notes Timothy Lu with the Massachusetts Institute of Technology's Research Laboratory of Electronics. Meanwhile, Pamela Silver in Harvard Medical School's Department of Systems Biology calls the systems biology discipline a field that "seeks to understand what evolution gave us and how we got to where we are." Silver advised Harvard researcher Avi Robinson-Mosher to take a physiology course at the Marine Biological Laboratory, which gave him grounding in biology and inspired him to apply his simulation talents toward macromolecular interactions. "Systems biology offered a good combination of being able to apply my computational background to actually making things that can do something useful for people," Robinson-Mosher says. Google's Joseph Hellerstein sees the human genome project as the catalyst for the changing relationship between computation and domain science, and Google has assisted academia in tackling six big data challenges through the auspices of its Exacycle project. "Whether it is medicine, for machines through nanotechnology, in agriculture or materials, design problems require simultaneous innovation in computing and science that can only be accomplished by those with the combined skills," Hellerstein observes.From "Computer Scientists Get Wet" Science (06/14/13) Vijaysree Venkatraman View Full Article
1 day ago