Computer Science

NLP
It’s my great pleasure to announce to the world (i.e., all 4 subscribed readers to this blog) that Alex B. Fine successfully defended his thesis entitled “Prediction, Error, and Adaptation During Online Sentence Comprehension...
It’s my great pleasure to announce to the world (i.e., all 4 subscribed readers to this blog) that Alex B. Fine successfully defended his thesis entitled “Prediction, Error, and Adaptation During Online Sentence Comprehension” jointly advised by Jeff Runner and me. Alex is the first HLP lab graduate (who started his graduate studies in the lab), so we gave him a very proper send-off and roasted the heck out of him. Alex will be starting his post-doc at the University of Illinois Psychology Department in June, working with Gary Dell, Sarah Brown-Schmidt, and Duane Watson. Dr. Fine’s defending Alex’s thesis investigates syntactic expectation adaptation. Work over the last two decades has firmly established that language comprehension is experience- or, more precisely- expectation-based: comprehenders draw on previous experience in order to robustly and efficiently predict (and thereby understand) the linguistic signal. Yet sociolinguistic and variationist work has documented that speakers and writers differ in their production preferences — the same message might be realized with different phonetic, lexical, and syntactic material. This raises a question, not previously acknowledged in the literature on sentence processing: how can it be that comprehension seems to make efficient use of probabilistic beliefs about linguistic distributions if the statistics of these distributions depend on the speaker, register, style, etc.? If the systems underlying language comprehension have evolved to efficiently deal with such (subjective) non-stationarity, we might expect comprehenders to a) store and use environment-specific (e.g., speaker-specific) syntactic expectations based previous experience, b) readily generalize based on these previous experience to novel environments, and c) integrate information about novel environments with previous experience. Alex’s thesis focuses on the third prediction. In series of self-paced reading experiments, he presents evidence i) that readers implicitly adapt their syntactic expectations to converge towards the statistics of the current linguistic environment, ii) that these effects cannot be reduced to task-based learning, floor-effects, or saturation effects, iii) that the time-course of such syntactic expectation adaptation depends both the on the prior statistics in a comprehender’s life-long experience and on its statistics in the current environment, iv) that syntactic expectation adaptation can be seen as cumulative syntactic priming (for those who prefer to think about phenomena rather than theories), and v) that the strength of syntactic priming in comprehension is sensitive to the prediction error experience while processing the prime structure. If you’re interested in his work, his work on the role of prediction errors in syntactic priming has recently appeared in Cognitive Science (Fine, A. B. and Jaeger, T. F. 2013. Evidence for implicit learning in syntactic comprehension. Cognitive Science 37(3), 578–591). Another paper, summarizing the first few studies of his thesis is currently under review. Preliminary reports of other studies can be found in various CogSci proceedings from 2010-2013, e.g.: Preliminary modeling work: Fine, A. B., Qian, T., Jaeger, T. F., and Jacobs, R. 2010. Is there syntactic adaptation in language comprehension. Proceedings of ACL: Workshop on Cognitive Modeling and Computational Linguistics, 18-26.Uppsala, Sweden. Kleinschmidt, D.F., Fine, A.B., and Jaeger, T.F. 2012. A belief-updating model of adaptation and cue combination in syntactic comprehension. The 34th Annual Meeting of the Cognitive Science Society (CogSci12). Sapporo, Japan. July, 2012. More evidence for the account: Fine, A. B. and Jaeger, T. F. 2013. Syntactic priming in language comprehension allows linguistic expectations to converge on the statistics of the input. In TBA (eds.) Proceedings of the 35th Annual Meeting of the Cognitive Scien
27 minutes ago
Why the recent breakthrough is important Yitang Zhang, of the University of New Hampshire, has apparently proved a finite approximation to the famous Twin Prime Conjecture. This is a result of the first order. After ten days of progr...
Why the recent breakthrough is important Yitang Zhang, of the University of New Hampshire, has apparently proved a finite approximation to the famous Twin Prime Conjecture. This is a result of the first order. After ten days of progressively more detailed news, including today’s article in the New York Times, Zhang’s 56-page preprint has just been released on the Annals of Mathematics journal website. This is a revision of the original submission, which was said in a story in Nature last week to have needed “a few minor tweaks.” Today Ken and I want to explain important aspects of the Twin Prime Conjecture. Recall that the Twin Prime Conjecture states that there are an infinite number of primes and that are as close as possible: . Well and are closer, but that can only happen once, so the best one can hope for is primes that are within two of each other. Zhang’s beautiful result is that there are an infinite number of primes and so that and is bounded by an absolute constant. The constant is large—in the tens of millions—but it is a constant. Perhaps we should call these “Cousin Primes,” since they are not within two; I will leave the naming of them to the number theorists. Whatever you call them, his result is a huge improvement to what was previously proved, and is a singular advance. The proof is long, which is not unexpected. Ken saw the news of the paper’s release earlier today on Terry Tao’s Google+ page about the work, which gives some idea of how the proof goes. There are many links and comments in a post by Peter Woit that also mentions a recently announced proof by Harald Helfgott of the “ternary Goldbach conjecture” that every odd number above 5 is the sum of three primes. So what can we possibly add to the discussion about Zhang’s breakthrough? Nothing on the proof right now. Something, however, on why the Twin Prime Conjecture can be really useful. This is all from a personal point of view, but one that I hope you will enjoy. Let’s first take a quick look at what was known before his work, and then discuss what it may be useful for. Before Zhang One measure of the density of the primes is that the summation does not converge. That is, the sum tends to infinity as tends to infinity. The growth is slow, but the sum does diverge. In 1915, Viggo Brun used sieve methods to prove that twin primes were rarer in a precise sense: the summation over twin primes converges. Indeed his result can be improved to show that the number of twin primes less that is bounded above by Using heuristic arguments, Godfrey Hardy and John Littlewood guessed that not only are there an infinite number of twin primes, but that there density is close to what a “random” model would predict. Let be the number of twin primes less than —recall is used to denote the number of primes less than —then the Hardy-Littlewood Conjecture is that for an explicit constant Tao is on record as saying that certain approaches based on sieve theory cannot resolve the Twin Prime conjecture—see this for a short discussion. Mark Lewko, in a comment to a MathOverflow thread on Zhang’s paper, indicates that its mechanism alone cannot reduce the gap under , and it does not circumvent a more general obstacle to gaps below . However, even if Zhang’s new techniques do not overcome such general barriers, at least they push against them with a lot more oomph. Another Problem Years ago I worked on a problem that had nothing directly to do with the Twin Prime Conjecture. The question is a fundamental one about the complexity of Boolean functions. It is not classic, has not been open for a very long time, and could be trivial. But like many problems about Boolean functions it turned out to fight back hard, and the best we could do was to make a small dent in the problem. The work was joint wit
about 1 hour ago
Interactive playgrounds are technology-enhanced installations that aim to provide rich game experiences for children playing in them by combining the benefits of traditional playgrounds with those of digital games. These game experiences...
Interactive playgrounds are technology-enhanced installations that aim to provide rich game experiences for children playing in them by combining the benefits of traditional playgrounds with those of digital games. These game experiences could be attained by addressing three design considerations: context-awareness, adaptability and personalization. We propose to use social signal processing (SSP), a field of research that encompasses the automatic analysis of social behavior, to enhance current interactive playgrounds to meet these criteria. This paper surveys SSP techniques and how they can be used to automatically sense and interpret children’s social interactions during play, adapt the playground’s game mechanics to induce targeted social behavior, and learn from the sensed behavior to meet players’ expectations and desires. We discuss the challenges and opportunities faced when introducing SSP into the interactive playground.
about 6 hours ago
NLP
“We tried creating personas and it was hard. It took us months and they never got traction. Eventually we abandoned the project.” I’ve heard this dozens of times from design team managers. They all embarked on these big persona projects,...
“We tried creating personas and it was hard. It took us months and they never got traction. Eventually we abandoned the project.” I’ve heard this dozens of times from design team managers. They all embarked on these big persona projects, often with energy and excitement, only to find that energy dissipate and the project lose its momentum. Personas that don’t help make design decisions are a waste. However, it doesn’t have to be that way. These projects fail because of a perspective problem. The design teams think of making personas as a project in itself. I’ve come to the conclusion that thinking this way will lead to failure. The alternative to having personas be a project is to make them just a step inside of every project. Instead of making them once and trying to use them everywhere, we come up with a low cost way to insert them into each project as they are needed. We can divide well-done design projects into a discovery phase (where we explore the boundaries of the problem we’re trying to solve), an exploration phase (where we toy with different possible solutions), and a refinement phase (where we choose a direction and fill out the details). (Not everyone does design projects well, but the folks who do end up following these three phases. The ones who don’t, well, they skip one or more of these stages then regret it later. Or maybe they are unconsciously incompetent.) Part of the activities in the discovery phase are to gather information about the users of the design and what they’ll need. We can do that with fancy-ass research or we can do it by just collecting all our thoughts about what we already know. The more specific we can get each question, the easier they are to answer. For example, if we were building the part of a clothing e-commerce site that showed the product previews, we’d want to know how people used previews in their shopping. We can make guesses or ask our peers. Or we can go into the field and study shopping online or in stores. Now, we can group what we’ve learned about our users into behavioral categories. We might group people who love to match different pieces in one pile, while we group people who prefer to see pre-designed outfits in a different pile. We might group the folks who are matching colors to things they already own in a different pile from people who don’t trust the colors they see and will use the free 90-day return period to ship back products they don’t like. These different groups become the persona clusters. And the things people did in those groups become our scenarios. If we’ve done a good job of collecting our data and knowledge about the users, it should be quick to create these personas around this specific functionality. Less than a day, in fact. And there we have it: Detailed personas about using previews. There’s probably a ton of design decisions these personas can now help us answer. (And where they can’t, well, that points out for a little more research.) At the end of the project, when our preview module is out there and being used, well, the personas aren’t that useful anymore. But because only spent a day on them, we don’t need to “protect our investment.” We just toss them out and create new ones for the next project. There you have it: cheap and easy disposable personas.
about 9 hours ago
Building Bayesian belief networks in the absence of data involves the challenging task of eliciting conditional probabilities from experts to parameterize the model. In this paper, we develop an analytical method for determining the opti...
Building Bayesian belief networks in the absence of data involves the challenging task of eliciting conditional probabilities from experts to parameterize the model. In this paper, we develop an analytical method for determining the optimal order for eliciting these probabilities. Our method uses prior distributions on network parameters and a novel expected proximity criteria, to propose an order that maximizes information gain per unit elicitation time. We present analytical results when priors are uniform Dirichlet; for other priors, we find through experiments that the optimal order is strongly affected by which variables are of primary interest to the analyst. Our results should prove useful to researchers and practitioners involved in belief network model building and elicitation.
about 9 hours ago
Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a well known graph-based model - the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been succes...
Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a well known graph-based model - the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, which significantly limits its applicability to large databases especially for the cases that the queries are out of the database (new samples). We propose a novel scalable graph-based ranking model called Efficient Manifold Ranking (EMR), trying to address the shortcomings of MR from two main perspectives: scalable graph construction and efficient ranking computation. Specifically, we build an anchor graph on the database instead of a traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking. An approximate method is adopted for efficient out-of-sample retrieval. Experimental results on some large scale image databases demonstrate that EMR is a promising method for real world retrieval applications.
about 9 hours ago
When watching Jeopardy with Darling if I get a question correct that is NOT in my usual store of knowledge (that is NOT Ramsey Theory, NOT Vice Presidents, NOT Satires of Bob Dylan) Darling asks me How did you know that? I usually reply ...
When watching Jeopardy with Darling if I get a question correct that is NOT in my usual store of knowledge (that is NOT Ramsey Theory, NOT Vice Presidents, NOT Satires of Bob Dylan) Darling asks me How did you know that? I usually reply I do not know how I knew that. Recently I DID know and I'll get to that later, but for now the question arises: Do you know how you know what you know? As an undergrad I learned mostly from taking courses. Hence I could say things like I Know Group theory from a course I had in Abstract Algebra in the Fall of 1978 (Side Note- I know why I should care about groups from reading the algorithm for graph isom for graphs of bounded degree---in 1988). I learned a few things on my own- I learned that a graph is Eulerian iff every vertex has even degree from a Martin Gardner article. But since most of my knowledge was from courses I knew how I knew what I knew. As a grad students I still took courses but more routes to knowledge emerged. Papers! I could say things like I know the oracle constructions about P vs NP because I read the Baker Gill Solovay paper on October 23, 1981. It helps that Oct 23 is Weird Al's birthday. But even here things get a bit murky- someone TOLD ME about the paper which lead me to read it, but I don't recall who. So one more route to knowledge emerged- people telling you stuff in the hallways. I saw Anil Nerode give a talk on Recursive Mathematics and that day went to the library (ask your grandmother what a library is) and read some articles on it. This was well timed- I knew enough recursion theory and combinatorics to read up on recursive combinatorics. In this case I know exactly how I know what I know. Might be the last time. As a professor I read papers, hear talks, hear things in hallways, and learn stuff. Its getting harder to know how I know things, but to some extend I still could. Until... THE WEB. The Web is the main reason I don't know how I know things. I sometimes tell Darling I read it on the web which is (a) prob true, and (b) prob not very insightful. So- do you know how you know what you know? On Jeopardy recently the final Jeopardy question was as follows. TOPIC: Island Countries. ANSWER: No longer Western, this one-word nation has moved to the west side of the international Date Line to join Asia and Australia. BILL: What is SAMOA!? Darling wondered how I know that: DARLING: How did you know that? Is there a Ramsey Theorist in Samoa? BILL: Not that I know if, but that's a good guess as to how I knew that. Actually Lance had a blog post Those Happy Samoans about Samoa going over the international dateline and losing the advantage of having more time to work on their conference submissions. DARLING: Too bad there isn't a Ramsey Theorist there to take advantage of that! Thanks Lance!- In this one case I know how I know what I know!
about 9 hours ago
This work investigates how to automatically parse object trajectories in surveillance videos, that aims to jointly solve three subproblems: i) spatial segmentation, ii) temporal tracking, and iii) object categorization. We present a nove...
This work investigates how to automatically parse object trajectories in surveillance videos, that aims to jointly solve three subproblems: i) spatial segmentation, ii) temporal tracking, and iii) object categorization. We present a novel representation spatio-temporal graph (ST-Graph), in which: i) graph nodes express the motion primitives, each representing a short sequence of small-size patches over consecutive images; and ii) every two neighbor nodes are linked with either a positive edge or a negative edge to describe their collaborative or exclusive relationship of belonging to the same object trajectory. Phrasing the trajectory parsing as a graph multi-coloring problem, we propose a unified probabilistic formulation to integrate various types of context knowledge as informative priors. An efficient composite cluster sampling algorithm is employed in search of the optimal solution by exploiting both the collaborative and the exclusive relationships between nodes. The proposed framework is evaluated over challenging videos from public datasets, and results show that it can achieve state-of-the-art tracking accuracy.
about 10 hours ago
The forward greedy selection algorithm of Frank & Wolfe [9] has recently been applied with success to coordinate-wise sparse learning problems, characterized by a trade-off between sparsity and accuracy. In this paper, we generali...
The forward greedy selection algorithm of Frank & Wolfe [9] has recently been applied with success to coordinate-wise sparse learning problems, characterized by a trade-off between sparsity and accuracy. In this paper, we generalize this method to the setup of pursuing sparse representations over a pre-fixed dictionary. At each iteration, the proposed algorithm first automatically selects an atom from the dictionary and adds it to a working set, and then optimally adjusts the aggregation weights of the atoms in the working set. The rate of convergence of this computational procedure is analyzed. Furthermore, we extend the proposed algorithm to the setup of learning non-negative and convex sparse representation over a dictionary. Applications of the proposed algorithms to sparse precision matrix estimation and low rank subspace segmentation are explored with efficiency and effectiveness validated on benchmark data sets.
about 10 hours ago
We study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and n...
We study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers, and introduce a new metric learning approach for the latter. We also introduce an extension of the NCM classifier to allow for richer class representations. Experiments on the ImageNet 2010 challenge dataset, which contains over a million training images of thousand classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art, while being orders of magnitude faster. Furthermore, we show how a zero-shot class prior based on the ImageNet hierarchy can improve performance when few training images are available.
about 10 hours ago