In this project we analyze apps available for download. We display information about number of downloads and number of clicks etc...
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
from csv import reader
file1=open('AppleStore.csv')
file2=open('googleplaystore.csv')
read1=reader(file1)
read2=reader(file2)
apple=list(read1)
google=list(read2)
# data in this row is bad. Run this cell only once.
del google[10473]
Apple Column Descriptions:
"id" : App ID
"track_name": App Name
"size_bytes": Size (in Bytes)
"currency": Currency Type
"price": Price amount
"ratingcounttot": User Rating counts (for all version)
"ratingcountver": User Rating counts (for current version)
"user_rating" : Average User Rating value (for all version)
"userratingver": Average User Rating value (for current version)
"ver" : Latest version code
"cont_rating": Content Rating
"prime_genre": Primary Genre
"sup_devices.num": Number of supporting devices
"ipadSc_urls.num": Number of screenshots showed for display
"lang.num": Number of supported languages
"vpp_lic": Vpp Device Based Licensing Enabled
Google Play Columns:
The Google Play dataset has duplicate entries for some apps.
for row in google:
if row[0]=='Coloring book moana':
print(row)
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['Coloring book moana', 'FAMILY', '3.9', '974', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
The Google Play set has duplicate entries for some apps. For each duplicate, we are going to keep the data that includes the most reviews.
#create dictionary to record max reviews for each app
name_maxReviews={}
for row in google[1:]:
name=row[0]
if name in name_maxReviews:
if name_maxReviews[name]<int(row[3]):
name_maxReviews[name]=int(row[3])
else:
name_maxReviews[name]=int(row[3])
#check to make sure dictionary contains max. This app should have 974 reviews.
print(name_maxReviews['Coloring book moana'])
#now create cleaned data array.
clean_google=[]
added=[]
for row in google[1:]:
if name_maxReviews[row[0]]==int(row[3]) and row[0] not in added:
clean_google.append(row)
added.append(row[0])
974
Now we will remove all apps that are not marketed to an English speaking audience. We do this by removing ones with titles that contain more than 3 non-English characters.
#function returns True or False depending on if string contains English characters.
def is_English(title):
non_english=0
#ASCII
for let in title:
if ord(let)>127:
non_english += 1
if non_english >3:
return False
return True
# now takeout non-English apps from datasets
google_english=[]
apple_english=[]
for row in clean_google[1:]:
if is_English(row[0]):
google_english.append(row)
for row in apple[1:]:
if is_English(row[0]):
apple_english.append(row)
Let's assume our goal is to develop an app whose profit is dependent on how many users it has. To minimize risk and overhead cost, we use the following strategy: Build and add an app to Google Play. If the app has good response, develop further. After 6 months, if the app makes a profit, make an iOS version.
So let's analyze what genres are most common in the Apple and Google Play stores.
The code below displays a tally of apps in the Apple store by Genre, and also a tally of apps in the Google Play store by Genre and Category.
def freq_table(dataset,index):
counts={}
for row in dataset:
if row[index] in counts:
counts[row[index]] += 1
else:
counts[row[index]]=1
return counts
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
def sort_dict_values(dct,reverse=False):
sortd=[]
result={}
for key,val in zip(list(dct.keys()),list(dct.values())):
sortd.append((val,key))
sortd.sort(reverse=reverse)
for item in sortd:
result[item[1]]=item[0]
return result
print('Apple Genre =====================')
display_table(apple[1:],11)
print('Google Play Genre===================')
display_table(google[1:],1)
print('Google Play Category===================')
display_table(google[1:],9)
Apple Genre ===================== Games : 3862 Entertainment : 535 Education : 453 Photo & Video : 349 Utilities : 248 Health & Fitness : 180 Productivity : 178 Social Networking : 167 Lifestyle : 144 Music : 138 Shopping : 122 Sports : 114 Book : 112 Finance : 104 Travel : 81 News : 75 Weather : 72 Reference : 64 Food & Drink : 63 Business : 57 Navigation : 46 Medical : 23 Catalogs : 10 Google Play Genre=================== FAMILY : 1972 GAME : 1144 TOOLS : 843 MEDICAL : 463 BUSINESS : 460 PRODUCTIVITY : 424 PERSONALIZATION : 392 COMMUNICATION : 387 SPORTS : 384 LIFESTYLE : 382 FINANCE : 366 HEALTH_AND_FITNESS : 341 PHOTOGRAPHY : 335 SOCIAL : 295 NEWS_AND_MAGAZINES : 283 SHOPPING : 260 TRAVEL_AND_LOCAL : 258 DATING : 234 BOOKS_AND_REFERENCE : 231 VIDEO_PLAYERS : 175 EDUCATION : 156 ENTERTAINMENT : 149 MAPS_AND_NAVIGATION : 137 FOOD_AND_DRINK : 127 HOUSE_AND_HOME : 88 LIBRARIES_AND_DEMO : 85 AUTO_AND_VEHICLES : 85 WEATHER : 82 ART_AND_DESIGN : 65 EVENTS : 64 PARENTING : 60 COMICS : 60 BEAUTY : 53 Google Play Category=================== Tools : 842 Entertainment : 623 Education : 549 Medical : 463 Business : 460 Productivity : 424 Sports : 398 Personalization : 392 Communication : 387 Lifestyle : 381 Finance : 366 Action : 365 Health & Fitness : 341 Photography : 335 Social : 295 News & Magazines : 283 Shopping : 260 Travel & Local : 257 Dating : 234 Books & Reference : 231 Arcade : 220 Simulation : 200 Casual : 193 Video Players & Editors : 173 Puzzle : 140 Maps & Navigation : 137 Food & Drink : 127 Role Playing : 109 Strategy : 107 Racing : 98 House & Home : 88 Libraries & Demo : 85 Auto & Vehicles : 85 Weather : 82 Adventure : 75 Events : 64 Comics : 59 Art & Design : 58 Beauty : 53 Education;Education : 50 Card : 48 Parenting : 46 Board : 44 Educational;Education : 41 Casino : 39 Trivia : 38 Educational : 37 Casual;Pretend Play : 31 Word : 29 Entertainment;Music & Video : 27 Education;Pretend Play : 23 Music : 22 Casual;Action & Adventure : 21 Racing;Action & Adventure : 20 Puzzle;Brain Games : 19 Educational;Pretend Play : 19 Action;Action & Adventure : 17 Arcade;Action & Adventure : 16 Board;Brain Games : 15 Casual;Brain Games : 13 Adventure;Action & Adventure : 13 Simulation;Action & Adventure : 11 Entertainment;Brain Games : 8 Role Playing;Action & Adventure : 7 Parenting;Education : 7 Education;Creativity : 7 Casual;Creativity : 7 Art & Design;Creativity : 7 Parenting;Music & Video : 6 Educational;Brain Games : 6 Education;Action & Adventure : 6 Role Playing;Pretend Play : 5 Puzzle;Action & Adventure : 5 Educational;Creativity : 5 Education;Music & Video : 5 Education;Brain Games : 5 Sports;Action & Adventure : 4 Simulation;Pretend Play : 4 Educational;Action & Adventure : 4 Video Players & Editors;Music & Video : 3 Simulation;Education : 3 Music;Music & Video : 3 Entertainment;Creativity : 3 Entertainment;Action & Adventure : 3 Casual;Education : 3 Board;Action & Adventure : 3 Video Players & Editors;Creativity : 2 Strategy;Action & Adventure : 2 Puzzle;Creativity : 2 Entertainment;Pretend Play : 2 Casual;Music & Video : 2 Card;Action & Adventure : 2 Books & Reference;Education : 2 Art & Design;Pretend Play : 2 Art & Design;Action & Adventure : 2 Adventure;Education : 2 Trivia;Education : 1 Travel & Local;Action & Adventure : 1 Tools;Education : 1 Strategy;Education : 1 Strategy;Creativity : 1 Role Playing;Education : 1 Role Playing;Brain Games : 1 Racing;Pretend Play : 1 Puzzle;Education : 1 Parenting;Brain Games : 1 Music & Audio;Music & Video : 1 Lifestyle;Pretend Play : 1 Lifestyle;Education : 1 Health & Fitness;Education : 1 Health & Fitness;Action & Adventure : 1 Entertainment;Education : 1 Communication;Creativity : 1 Comics;Creativity : 1 Card;Brain Games : 1 Books & Reference;Creativity : 1 Board;Pretend Play : 1 Arcade;Pretend Play : 1 Adventure;Brain Games : 1
As you can see from above, the most common apps in the Apple store are Gaming apps followed by entertainment apps. Also looking at the list for both stores, we see that apps generally for entertainment purposes are more common than apps created for practical purposes like shopping or utilities.
I cannot make a comfortable recommendation of what kind of app should be built for the Apple store based on this information. More detailed information about number of downloads or number of users is needed. The mere existence of many gaming apps does not necessarily mean that a gaming app will attract the most users. For example Facebook is probably an app with the highest number of users, but it would fall under the social networking genre which is not even in the top five.
Next we will determine the average number of users for each genre. The apple store doesn't have information about how many users there are for an app, but it does give number of ratings submitted for each app. We will use this as an estimate.
#we do this in two ways. First way uses nested for loop.
#second way stores totals in dictionary.
#genres is at index 11, number of ratings is at index 5
genres_apple=freq_table(apple[1:],11)
for genre in genres_apple:
total=0
len_genre=0
#for this genre add up the total number of users who
#rated the app if the app is in the selected genre. Then calculate average.
for app in apple[1:]:
this_genre=app[11]
if this_genre==genre:
total += int(app[5])
print(genre+' '+str(total/genres_apple[genre]))
print('________________________________')
totals={}
for app in apple[1:]:
this_genre=app[11]
num_ratings=app[5]
# increment totals for the app's genre by number of
# user ratings.
if this_genre in totals:
totals[this_genre] += int(num_ratings)
else:
totals[this_genre]=int(num_ratings)
totals=sort_dict_values(totals,reverse=True)
for genre in totals:
print(genre+' '+str(totals[genre]/genres_apple[genre]))
Social Networking 45498.89820359281 Photo & Video 14352.280802292264 Games 13691.996633868463 Music 28842.021739130436 Reference 22410.84375 Health & Fitness 9913.172222222222 Weather 22181.027777777777 Utilities 6863.822580645161 Travel 14129.444444444445 Shopping 18615.32786885246 News 13015.066666666668 Navigation 11853.95652173913 Lifestyle 6161.763888888889 Entertainment 7533.678504672897 Food & Drink 13938.619047619048 Sports 14026.929824561403 Book 5125.4375 Finance 11047.653846153846 Education 2239.2295805739514 Productivity 8051.3258426966295 Business 4788.087719298245 Catalogs 1732.5 Medical 592.7826086956521 ________________________________ Games 13691.996633868463 Social Networking 45498.89820359281 Photo & Video 14352.280802292264 Entertainment 7533.678504672897 Music 28842.021739130436 Shopping 18615.32786885246 Health & Fitness 9913.172222222222 Utilities 6863.822580645161 Sports 14026.929824561403 Weather 22181.027777777777 Reference 22410.84375 Productivity 8051.3258426966295 Finance 11047.653846153846 Travel 14129.444444444445 Education 2239.2295805739514 News 13015.066666666668 Lifestyle 6161.763888888889 Food & Drink 13938.619047619048 Book 5125.4375 Navigation 11853.95652173913 Business 4788.087719298245 Catalogs 1732.5 Medical 592.7826086956521
Based on the average number of users, I would recommend making a social networking app or a photo & video app. These seem to attract the largest number of users.
category_counts=freq_table(google[1:],1)
averages={}
for cat in category_counts:
total=0
len_category=0
for app in google[1:]:
if app[1]==cat:
total += int(app[5].replace(',','').replace('+',''))
len_category += 1
averages[cat]= total/len_category
averages=sort_dict_values(averages,reverse=True)
for cat in averages:
print(cat + ' : '+ str(averages[cat])+' average number of installs. ')
COMMUNICATION : 84359886.95348836 average number of installs. SOCIAL : 47694467.46440678 average number of installs. VIDEO_PLAYERS : 35554301.25714286 average number of installs. PRODUCTIVITY : 33434177.75707547 average number of installs. GAME : 30669601.761363637 average number of installs. PHOTOGRAPHY : 30114172.10447761 average number of installs. TRAVEL_AND_LOCAL : 26623593.58914729 average number of installs. NEWS_AND_MAGAZINES : 26488755.335689045 average number of installs. ENTERTAINMENT : 19256107.382550336 average number of installs. TOOLS : 13585731.809015421 average number of installs. SHOPPING : 12491726.096153846 average number of installs. BOOKS_AND_REFERENCE : 8318050.112554112 average number of installs. PERSONALIZATION : 5932384.647959184 average number of installs. EDUCATION : 5586230.769230769 average number of installs. MAPS_AND_NAVIGATION : 5286729.124087592 average number of installs. FAMILY : 5201959.181034483 average number of installs. WEATHER : 5196347.804878049 average number of installs. HEALTH_AND_FITNESS : 4642441.3841642225 average number of installs. SPORTS : 4560350.255208333 average number of installs. FINANCE : 2395215.120218579 average number of installs. BUSINESS : 2178075.7934782607 average number of installs. FOOD_AND_DRINK : 2156683.0787401577 average number of installs. HOUSE_AND_HOME : 1917187.0568181819 average number of installs. ART_AND_DESIGN : 1912893.8461538462 average number of installs. LIFESTYLE : 1407443.8193717278 average number of installs. DATING : 1129533.3632478632 average number of installs. COMICS : 934769.1666666666 average number of installs. LIBRARIES_AND_DEMO : 741128.3529411765 average number of installs. AUTO_AND_VEHICLES : 625061.305882353 average number of installs. PARENTING : 525351.8333333334 average number of installs. BEAUTY : 513151.88679245283 average number of installs. EVENTS : 249580.640625 average number of installs. MEDICAL : 115026.86177105832 average number of installs.
It seems like social networking apps are popular in both Google play and Android stores. Thus I would recommend making a social networking app.