Alan D. Olinsky, Kristin Kennedy and Michael Salzillo
Forecasting the number of bed days (NBD) needed within a large hospital network is extremely challenging, but it is imperative that management find a predictive model that best…
Abstract
Forecasting the number of bed days (NBD) needed within a large hospital network is extremely challenging, but it is imperative that management find a predictive model that best estimates the calculation. This estimate is used by operational managers for logistical planning purposes. Furthermore, the finance staff of a hospital would require an expected NBD as input for estimating future expenses. Some hospital reimbursement contracts are on a per diem schedule, and expected NBD is useful in forecasting future revenue.
This chapter examines two ways of estimating the NBD for a large hospital system, and it builds from previous work comparing time regression and an autoregressive integrated moving average (ARIMA). The two approaches discussed in this chapter examine whether using the total or combined NBD for all the data is a better predictor than partitioning the data by different types of services. The four partitions are medical, maternity, surgery, and psychology. The partitioned time series would then be used to forecast future NBD by each type of service, but one could also sum the partitioned predictors for an alternative total forecaster. The question is whether one of these two approaches outperforms the other with a best fit for forecasting the NBD. The approaches presented in this chapter can be applied to a variety of time series data for business forecasting when a large database of information can be partitioned into smaller segments.
Details
Keywords
John T. Quinn, Alan D. Olinsky, Phyllis A. Schumacher and Richard M. Smith
The Bryant University Mathematics Department has been collecting math placement scores and admissions data for all incoming freshmen for many years. In the past, the authors have…
Abstract
Purpose
The Bryant University Mathematics Department has been collecting math placement scores and admissions data for all incoming freshmen for many years. In the past, the authors have used these data mainly for placement in first-year classes and more recently to invite the most mathematically talented students to become mathematics majors. The purpose of this paper is to use the same data source to predict persistence in declared majors for all incoming students.
Design/methodology/approach
In order to categorize the students, the authors use cluster analysis, one of the tools of data mining, to see if students in particular majors share similar strengths based on the available data. The authors follow up this analysis by running a multivariate analysis of variance (MANOVA) to confirm that the means of the clusters are significantly different.
Findings
The cluster analysis resulted in five distinct clusters, which were confirmed by the results of the MANOVA. The authors also found how many students in each cluster persisted in their chosen major.
Originality/value
These results will help to improve counseling and proper placement of incoming freshmen. They will also be helpful in long-range planning of upper-level courses. Retention of students in their majors is an important concern for colleges and universities as it relates to planning issues, such as scheduling classes, particularly for upper classmen. This could also affect departmental requirements, such as the size of the faculty.
Details
Keywords
Alicia T. Lamere, Son Nguyen, Gao Niu, Alan Olinsky and John Quinn
Predicting a patient's length of stay (LOS) in a hospital setting has been widely researched. Accurately predicting an individual's LOS can have a significant impact on a…
Abstract
Predicting a patient's length of stay (LOS) in a hospital setting has been widely researched. Accurately predicting an individual's LOS can have a significant impact on a healthcare provider's ability to care for individuals by allowing them to properly prepare and manage resources. A hospital's productivity requires a delicate balance of maintaining enough staffing and resources without being overly equipped or wasteful. This has become even more important in light of the current COVID-19 pandemic, during which emergency departments around the globe have been inundated with patients and are struggling to manage their resources.
In this study, the authors focus on the prediction of LOS at the time of admission in emergency departments at Rhode Island hospitals through discharge data obtained from the Rhode Island Department of Health over the time period of 2012 and 2013. This work also explores the distribution of discharge dispositions in an effort to better characterize the resources patients require upon leaving the emergency department.
Details
Keywords
Gao Niu, John Quinn and Alan Olinsky
In this chapter, we applied Data Envelopment Analysis (DEA) to a group of property and casualty insurance companies' data from 2018 to 2020. The calculated relative efficiencies…
Abstract
In this chapter, we applied Data Envelopment Analysis (DEA) to a group of property and casualty insurance companies' data from 2018 to 2020. The calculated relative efficiencies were compared with selected traditionally used financial measures. We conclude that DEA and its relative efficiency calculation provide a consistent measure with selected IRIS ratios. The result and method can be used for situations when multiple ratios and change-based financial metrics provide inconsistent conclusions.
Details
Keywords
Kristin Kennedy, Michael Salzillo, Alan Olinsky and John Quinn
Managing a large hospital network can be an extremely challenging task. Management must rely on numerous pieces of information when making business decisions. This chapter focuses…
Abstract
Managing a large hospital network can be an extremely challenging task. Management must rely on numerous pieces of information when making business decisions. This chapter focuses on the number of bed days (NBD) which can be extremely valuable for operational managers to forecast for logistical planning purposes. In addition, the finance staff often requires an expected NBD as input for estimating future expenses. Some hospital reimbursement contracts are on a per diem schedule, and expected NBD is useful in forecasting future revenue.Two models, time regression and autoregressive integrated moving average (ARIMA), are applied to nine years of monthly counts of the NBD for the Rhode Island Hospital System. These two models are compared to see which gives the best fit for the forecasted NBD. Also, the question of summarizing the time data from monthly to quarterly time periods is addressed. The approaches presented in this chapter can be applied to a variety of time series data for business forecasting.
Details
Keywords
Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop
In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…
Abstract
In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).
We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.
Details
Keywords
Harold A. Records and Alan Olinsky
Businesses of the late 1990s have available a wealth of data and information that managers use to measure the health of their business and to identify problems and opportunities…
Abstract
Businesses of the late 1990s have available a wealth of data and information that managers use to measure the health of their business and to identify problems and opportunities. Unfortunately, current measures of business activity are static and do not capture the dynamic flows of business transactions as they occur. Warning signs of pending changes are frequently not seen until after the fact when the financial impact of these changes is reported. It is our proposal that business transactions and performance can and should be measured in a dynamic rather than static manner. Recent advances in computer and communications technology combined with powerful multimedia software enable the construction of algorithms and on‐screen instruments which can be used to put business transactions and performance into a dynamic visible form that is readily understood by users.
Details
Keywords
Son Nguyen, John Quinn and Alan Olinsky
We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that…
Abstract
We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that occurs with a small frequency) and hence boost the overall performance measurements such as balanced accuracy, G-mean and area under the receiver operating characteristic (ROC) curve, AUC. This oversampling method is based on the idea of applying the Synthetic Minority Oversampling Technique (SMOTE) on only a selective portion of the dataset instead of the entire dataset. We demonstrate the effectiveness of our oversampling method with four real and simulated datasets generated from three models.
Details
Keywords
Matthew Steeves, Son Nguyen, John Quinn and Alan Olinsky
The purpose of this study is to determine which quantitative metrics are most representative of investor sentiment in the US equity markets. Sentiment is the aggregation of…
Abstract
The purpose of this study is to determine which quantitative metrics are most representative of investor sentiment in the US equity markets. Sentiment is the aggregation of consumers', investors', and producers' thoughts and opinions about the future of the financial markets. By analyzing the change in popular economic indicators, financial market statistics, and sentiment reports, we can gain information on investor reactions. Furthermore, we will use machine learning techniques to develop predictive models that will attempt to forecast whether the stock market will go up or down based on the percent change in these indicators.
Details
Keywords
Son Nguyen, Phyllis Schumacher, Alan Olinsky and John Quinn
We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of…
Abstract
We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of home loan applications. During the process, we propose our undersampling algorithm to cope with the issues created by the imbalance of the data. Our technique is shown to work competitively against popular resampling techniques such as random oversampling, undersampling, synthetic minority oversampling technique (SMOTE), and random oversampling examples (ROSE). We also investigate the relation between the true positive rate, true negative rate, and the imbalance of the data.