Track topics on Twitter Track topics that are important to you
The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network's generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results.
This article was published in the following journal.
Name: Neural networks : the official journal of the International Neural Network Society
We introduce a general response model that allows for several simple restrictions, resulting in other models such as the extended Rasch model. For the extended Rasch model, a dynamic Bayesian estimati...
Safety assessments guard against unintended effects for human health and the environment. When new products are compared with accepted reference products by broad arrays of measurements, statistical a...
Participant dropout reduces intervention effectiveness. Predicting dropout has been investigated for Exercise Referral Schemes, but not physical activity (PA) interventions with Motivational Interview...
International collaborations among birth cohorts to better understand asthma and allergies have increased in the last years. However, differences in definitions and methods preclude direct pooling of ...
The purpose of this study is to determine whether the respiratory rate provided by the Kai Sensors RSpot 100 Non-Contact Respiratory Rate Spot Check is as accurate as that provided by the ...
In this nationwide multi center study the investigators combine the low dose chest CT scan data with QCT technology, to measure the BMD of spine, VAT and liver fat in the health check subj...
Clinical data (prostate-specific antigen [PSA] response after radiotherapy) are being used to build a mathematical model to describe the clinical results of radiotherapy for prostate cance...
The purpose of this study is to obtain data on equivalence of generic clozapine to Fazaclo (orally disintegrating tablet). Generic clozapine is the most frequently used clozapine and such ...
The adoption of bolus calculators has been limited by the slow speed of the current trial and error approach. The goal of this project is to automate the determination of patient specific ...
The determination of the concentration of a given component in solution (the analyte) by addition of a liquid reagent of known strength (the titrant) until an equivalence point is reached (when the reactants are present in stoichiometric proportions). Often an indicator is added to make the equivalence point visible (e.g., a change in color).
Signal and data processing method that uses decomposition of wavelets to approximate, estimate, or compress signals with finite time and frequency domains. It represents a signal or data in terms of a fast decaying wavelet series from the original prototype wavelet, called the mother wavelet. This mathematical algorithm has been adopted widely in biomedical disciplines for data and signal processing in noise removal and audio/image compression (e.g., EEG and MRI).
Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.
Computer-assisted interpretation and analysis of various mathematical functions related to a particular problem.
Various units or machines that operate in combination or in conjunction with a computer but are not physically part of it. Peripheral devices typically display computer data, store data from the computer and return the data to the computer on demand, prepare data for human use, or acquire data from a source and convert it to a form usable by a computer. (Computer Dictionary, 4th ed.)